QGIS 1: Spatial analysis for beginners
NICAR24, Baltimore
Wesley Stephenson
BBC News
Callum Thomson
BBC News
@2lsnop

Links


Getting started


QGIS stands for Quantum Geographic Information System.

It is a powerful and complex piece of mapping software but journalists can get excellent results from a couple of relatively simple functions. It’s useful for data exploration as well as for producing data maps for publication. And it’s open source.

These are the full follow-along instructions from the NICAR session with a worked example.

If you are in the NICAR session, please use the PCs provided as the files are large and are already downloaded. If you are following afterwards, you will find the data and shapefiles here - please download them before starting.

By the end of this tutorial you should have made a map of urban heat islands in the city of Baltimore, complete with markers for significant cities.

It should look something like this:

Opening QGIS


There are lots of versions as it is frequently updated. You can choose between the latest release or the long-term release (LTR) which is more stable.

The version we are using here is QGIS 3.28 LTR. It’s not the most recent version, but it is stable.

TIP Be aware that QGIS projects are not always backwards-compatible so when you update, you may have to rebuild old projects.  

Finding your way around

QGIS should look something like this. It may not be exactly the same, depending on the edition and whether the panels and toolbars have been moved around, but it should have these functions. The version for Windows looks slightly different.


You need two basic ingredients for most QGIS maps - 
1) a shapefile which is the map on which you will display your data. 
2) a data file, usually a CSV, which contains the information you want to display 
(There are other options - for example you can bring in a highly detailed aerial photograph)

Adding a shapefile
When people talk about a shapefile, or you go looking for one, you actually need a collection of files which all correspond to each other. They are often found in zip files.

If you have not already done so, go to the folder for this workshop and find the sub-folder called tl_2023_24_tract and download it. Inside are all the boundary files for the tracts used in the 2020 US Census for the whole state of Maryland.

There are six files for Maryland and the extensions tell you what type they are. They include:

  • .shp the geometry drawing file. It's a "vector" file 
  • .dbf a database file with all the text based information such as the name of a county or the ID code
  • .shx an index file which ties the two together
  • .prj contains the projection information

Important: You need to keep these files in a single folder so QGIS can reference them and they must all keep the same prefix as each other. In this example, our shapefile and all its layers have the name tl_2023-24_tract, but most shapefiles will not have filenames that are this straightforward.

Go back to QGIS and click Layer > Add Layer > Add Vector Layer from the menus along the top of the page. 

It is usually the top option and the icon looks like a V.

Browse to the folder we were just looking at and select the shapefile itself, tl_2023_24_tract.shp


Hit Open, then Add. In newer versions of QGIS, a window may pop up asking you to transform between two different spatial reference systems. Click “Cancel” as we will set up the projection in a later step.

You should now see a map forming and a layer in your layers panel.
TIP You can also add a shapefile by dragging and dropping from the folder OR by browsing in from the file nav on the left hand side

***** Now save your project! *****

TIP QGIS works in projects which are temporary arrangements of the files which you have told it to look at. This means that if you move or rename a file then it will no longer appear in your QGIS project. For this reason it is a good idea to keep your files for a map all in one folder. Your map is not a permanent object until you export it in some form.

Finding your way around the shapefile

You should now have a map of all of the census tracts in Maryland.

You can easily zoom in, either with the mouse wheel when the touch zoom button is selected (PC only) or with the other zoom buttons at the top. 

Note: Don’t worry if your buttons are not in the exact same placement as the ones we show here. Different versions of QGIS tend to move things around slightly. Just get familiar with the icons themselves and where they live in the version you’re using.


Pan
to grab and move around



Zoom 
use in normal way or draw rectangle of the view you wish to zoom to



Full extent 
this is very useful if you get lost in the map. It resets to the original view



Previous & next 
 step you back/forwards through previous views
 
 

Notice as you zoom in and out the lines stay crisp. That's because a vector file is generated mathematically, rather than being an image. 

So in theory you could magnify this to any size and it would stay true. 

Projection

This also means that you can project the map in different ways - more about projections here.  

These are also known as a CRS - Coordinate Reference System.

If you need to look up what projection / CRS might be most suitable, try this website.

Set the Projection of your project by clicking Project > Properties > CRS. For the state of Maryland, we’ll choose EPSG:26918.
The projection of the map appears in the bottom right of the screen:
Looking around our shapefile

A shapefile can contain its own information - let’s see what we have got.

Identify features

This button is identify features, which reveals the information attached to an area or a point.
 
If you click on one of the counties you will see the Identify results panel open. On some versions of QGIS this is a tab, with the Layers panel, on others it is a separate floating panel. 

It tells you all the information about that particular census tract - its name (NAMELSAD) and its GEOID code which is something we are going to need shortly. 

FP codes are Federal Information Processing Standard and each US state (STATEFP), and county (or county equivalent) within it (COUNTYFP), has a unique identifier. Maryland has the code 24, and for example, Montgomery County has the code 031. State and county codes are published on the Census Bureau website.

So we have some identifiers and then we have some geographic information - about the area of land and water in each census tract, its internal point lat long etc. 

There is a way to see this information all in one go and that is by looking at the Attribute Table
You can get to this by right clicking on your layer or from the toolbar:

It shows you all the info in the map but as a table:
In the attribute table, if you click on the last but one icon, Dock Attribute Table you get the table to appear under the map.

This is very useful as you can highlight the counties in the data and find them on the map.

Styling

We’re going to add some data to this map before we style it up, but you can change the colour of the map without doing this if you want to. 

Double click on the layer name in your Layers panel and it will bring you up a menu (or right click and select properties).

Select the second item that looks like a paintbrush: Symbology


If you click on the dropdown next to Color, then Pick Color you can choose a colour from the page using the dropper-shaped selector. If you click Choose Color you can type in a hex code or HMTL identifier - a hash followed by a six character code which identifies a unique colour, e.g. #1380A1.

There are several ways to change the colour:
  • Simple: the dropdown next to Color gives you recent and standard colours
  • Full: clicking on the colour itself, gives you a full range of options.
  • Or click on simple fill and you can also change the colour there

TIP When you use QGIS regularly you can set your own palettes and save them as style files which can be shared with colleagues or reused by you. e.g. house mapping style, political parties' colours.

Choose one which you like, hit apply and OK.

Save your project.

Adding data

There are two main ways of adding data - adding point data and joining area data. We will be doing both but starting with the join.

Your data needs to be a delimited text file - most often this will be a CSV file.

It will also need to have a column that matches a column in the map - we are going to use the GEOID code for that.

In your data:
  • Make sure your first row has column headers.
  • Keep column headers short - and some versions of QGIS need single words without spaces (use underscore).
  • Try to make sure that your data is as clean and consistent as possible, any text in a numbers column can mean QGIS thinks the whole column is text.  
  • Keep symbols out of the data - e.g. don't have 100% in a cell, just use 100. 

How to add data

In the toolbar go to Layer > Add layer > Add Delimited Text Layer

Browse to find UHI_by_census_tract.csv
  • It’s a CSV so keep that file format button checked
  • Discard 0 lines and the first record has field names
  • Click detect field types so it will treat your numbers as numbers rather than text

There is nothing inherent in the file which tells you how to map it straight away so choose No geometry.
Let’s see what we have got by looking at the attribute table.

You will see that it contains urban heat island (UHI) index for each census tract within cities across the US. These figures estimate how much hotter (in °F and °C) these areas are due to the characteristics of the built environment.

What we’re looking for is a unique identifier that is common to both the shapefile and the data file: the  GEOID column!

But if you look in the data, you’ll find some values for places in the outskirts of Washington DC and they’ll show up on the map if we don’t filter them out.

So we need to filter our data so it only contains values for Baltimore.

Right click on our layer then select Filter and you get this query builder
You want to build an expression to filter the data. Double clicking puts an item in the expression box. So add city, then click either Sample or All to remind ourselves what this variable looks like.
Now that we’re satisfied that it’s the right variable, click = then double click Baltimore on the list. 

Your query should read:
"city" = 'Baltimore'

Click OK to apply. 


You will see that the layer in your layers panel now has a filter symbol and if you open the attribute table again now you will see only tracts in the Baltimore city area. 

Joining data

Let’s get this added to the map.

Double click on your map layer, tl_2023_24_tract, in the layers panel. This time choose the triangle with the dot to open the joins box, and click the + symbol to add a new join.

Join layer will probably already be selected as UHI_by_census_tract as you only have one dataset so it knows you want to choose that.

Then you need to specify which columns match each other for a join.

In our data the column is called census tract number. In the map, this matches the field called GEOID. Choose this in from the ‘Target field’ dropdown. Hit OK, then Apply.

It looks as though nothing has happened but if you use the Identify features button again on the map you can see the data has joined. 

Making a choropleth


We are finally ready to show this heat islands data on the map.

Go back to Symbology in the properties menu for the map.
And this time, instead of Single symbol at the top, we want Graduated - that’s because we have a sliding scale of continuous data. If we had categories like high, low and medium we would choose Categorised

Choose the value you want to display - in our case we want UHI_by_census_tract_urban heat island effect (temperature in degrees F).

And choose a mode of dividing it into chunks - I recommend you start off with Equal Count with four classes, which will split it into four with the same number of tracts in each bucket. You can adjust this once we get going. 

Hit Classify, then Apply and you should see your map change colour.
You can change the colour by clicking on the colour ramp bar itself, then change Color 2 to the shade you want for your highest numbers.

News organisations usually have a colour palette for heat maps like this. Here’s an excerpt from the BBC’s one:

You can specify the individual colours by double clicking on the coloured square, then selecting Color > Select Color and specifying the Hex code you want for each one.

Finally let’s fix those breaks into something a little more human-looking. 

It can help to see how your data is distributed. Go to the Histogram tab and Load Values - this shows you where the current breaks are and the range of your data. 

Now go back to the Classes tab and you can directly edit the break points. For speed, we’re going to choose Pretty Breaks, which reates recognisable class boundaries in whole numbers.

You can also play around with the other methods of classification:
  • Quantile (Equal count) - makes sure that there are the same number of counties in each bucket. If you add more classes you’ll get more detail.
  • This one is good for quickly making the map look interesting, but check that your breakpoints aren’t odd as it can look purposely manipulated
  • Pretty Breaks - Creates recognisable class boundaries in whole numbers
  • Equal Interval - creates classes which have the same range as each other. So in this case, the lowest population in any county is 1,562 and the highest is 1,077,402. If you have 5 classes then they will each have a range of one fifth of the difference = 215,168. It doesn’t show much of interest for this dataset but can be useful in others.
  • Natural Breaks (Jenks) - This tries to group data in clusters with the biggest variance between the classes 
  • Standard Deviation - This method will calculate the mean of the data, and create classes based on standard deviation from the mean.

Adding point data

To help readers orientate themselves, we can add some prominent places on to the map. This come in a list of ‘points’ - pairs of latitude-longitude co-ordinates with associated data like place names.

There are two ways of adding point data - it can be in a shape file or in a csv. They both work the same way as point data relies on lat long.

But just as we did earlier, shapefiles are added as vectors and csv data is added as a delimited text file.

Adding major cities

Add a new vector layer called baltimore_places.shp

This will create points for each of the major cities in the map area. Initially they may appear as a random colour like bright yellow:

You can adjust the colour and opacity of the marker in the Symbology section of the baltimore_places layer.

For more detailed settings, choose Simple Marker from the Symbol Settings menu.

Make the size of each marker 4mm with a colour of #ffffff (white) and a stroke size of 1mm in the colour #222222 (black).

Finally for this layer we will add the labels using this symbol. Choose Single Labels from the dropdown menu and select the field with place names in: NAME.
Again, news organisations will have their own house style for text labels - we’ll choose Helvetica 18pt in black (#222222) with a white 1mm buffer.

Save your map as an image

To create an image file for use online, on broadcast or in print, choose Project > Import/Export > Export Map to Image…

Zoom the map so that it fills the window. To set the size of the image, click Draw on Canvas and use the cross symbol to draw a box around the map. Alter the resolution depending on the intended use, then click Save. Choose a location and filename for your map.

Useful links