QGIS 1 2025: Spatial analysis for beginners
NICAR25, Minneapolis
John Walton and Daniel Wainwright, BBC News

Links


  • Download QGIS here. Please use the Long Term Release (LTR) version. This tutorial works with LTR 3.34 or 3.40.

1. Getting started 


QGIS stands for Quantum Geographic Information System.

It is a powerful and complex piece of mapping software but journalists can get excellent results from a couple of relatively simple functions. It’s useful for data exploration as well as for producing data maps for publication. And it’s open source.

These are the full follow-along instructions from the NICAR session with a worked example.

If you are in the NICAR session, please use the PCs provided as the files are large and are already downloaded. If you are following afterwards, you will find the data and shapefiles here - please download them before starting. The NICAR PCs are using QGIS Long Term Release version 3.34, but the newer LTR 3.40 should also work.

By the end of this tutorial you should have made a map of the percentage of vacant housing in each area, according to the 2020 census.

During the presidential election, the housing market and the cost of living was a big issue, so data showing which areas had the most homes sitting empty can lead to some really interesting stories. What we’re working with today is the higher level data that doesn’t tell us WHY these homes were vacant, although the Census does have that with numbers on those for sale or rent, those used occasionally and so on.

The map we will end up with should look something like this:


2. Finding your way around 


QGIS should look something like this on a Mac. The version for Windows looks slightly different but everything is in pretty much the same place. Along the top from the hand to the refresh log are the navigation tools. You can add a new project in the top left. And down in the bottom left are where we will be putting our layers. We’ll explain more about these navigation tools shortly.


You need two basic ingredients for most QGIS maps - 
1) a shapefile which is the map on which you will display your data. 
2) a data file, usually a CSV, which contains the information you want to display 
(There are other options - for example you can bring in a highly detailed satellite imagery)

3. Adding a shapefile 


If you have not already done so, go to the folder for this workshop and find the sub-folder called tl_2023_27_tract and open it. You may need to unzip it first in the folder.

These file names aren’t the most intuitive, but they’re exactly as presented by the original source.

Inside are all the boundary files for the tracts used in the 2020 US Census for the whole state of Minnesota.

When people talk about a shapefile, or you go looking for one, you actually need a collection of files which all correspond to each other. They are often found in zip files.



The extensions tell you what type of files they are. They include:

  • .shp the geometry drawing file. It's a "vector" file 
  • .dbf a database file with all the text based information such as the name of a county or the ID code
  • .shx an index file which ties the two together
  • .prj contains the projection information

Important: You need to keep these files in a single folder so QGIS can reference them and they must all keep the same prefix as each other. In this example, our shapefile and all its layers have the name tl_2023_27_tract, but most shapefiles will not have filenames that are even this straightforward.

Go back to QGIS and click Layer > Add Layer > Add Vector Layer from the menus along the top of the page. 

It is usually the top option and the icon looks like a V.

In the Vector Dataset box, click the three dots and browse to the folder we were just looking at and select the shapefile itself, tl_2023_27_tract.shp. Hit Open, then Add, then Close.

In some versions of QGIS, a window may pop up asking you to transform between two different spatial reference systems. Click “Cancel” as we will set up the projection in a later step.

You should now see a map forming and a layer in your layers panel.
TIP You can also add a shapefile by dragging and dropping from the folder OR by browsing in from the file nav on the left hand side

***** Now save your project! *****

TIP QGIS works in projects which are temporary arrangements of the files which you have told it to look at. This means that if you move or rename a file then it will no longer appear in your QGIS project. For this reason it is a good idea to keep your files for a map all in one folder. Your map is not a permanent object until you export it in some form.

4. Finding your way around the map 


You should now have a map of all of the census tracts in Minnesota.

You can easily zoom in, either with the mouse wheel when the touch zoom button is selected (PC only) or with the other zoom buttons at the top. 

Note: Don’t worry if your buttons are not in the exact same places as the ones we show here. Different versions of QGIS tend to move things around slightly. Just get familiar with the icons themselves and where they live in the version you’re using.

Pan

to grab and move around


Zoom 

use in normal way or draw rectangle of the view you wish to zoom to



Full extent 
this is very useful if you get lost in the map. It resets to the original view



Previous & next 
 step you back/forwards through previous views
 
 

Notice as you zoom in and out the lines stay crisp. That's because a vector file is generated mathematically, rather than being an image. 

So in theory you could magnify this to any size and it would stay true. 

5. Projection 


This map of Minnesota might look a bit squat because we haven’t projected it properly yet.



That means we aren’t showing how it’s meant to look on the globe. You sometimes see this if a map has a flat, straight line between the US and Canada.

You can project the map in different ways - more about projections here.  

This is also known as a CRS - Coordinate Reference System.

If you need to look up what projection / CRS might be most suitable, try this website.

Set the Projection of your project by clicking Project > Properties > CRS. For the state of Minnesota, we’ll choose EPSG:26915 which is the one that best suits the state for our purposes.

Search 26915 in the filter and click the arrow next to Projected, and then the one by Universal Transverse Mercator, Then click on the EPSG:26915 and apply.



You may notice the shape of Minnesota is a little elongated now, and looks more like what we’d see on a map.

The projection of the map appears in the bottom right of the screen:



6 Looking around our shapefile 


A shapefile can contain its own information - let’s see what we have got.

Identify features

This button is identify features, which reveals the information attached to an area or a point.
 
If you click on one of the census tracts you will see the Identify results panel open.

It tells you all the information about that particular census tract
STATEFP is the code for the state. Minnesota’s is 27. 

But to bind our map to the data we want to show on it, we need a common column, something that’s present in both our datasets. 

The column we’re going to want is GEOIDFQ. That’s a unique ID that only appears once per census tract area.

There is a way to see this information all in one go and that is by looking at the Attribute Table
You can get to this by right clicking on your layer or from the toolbar:

It shows you all the information in the map but as a table.

In the attribute table, if you click on the last icon, Dock Attribute Table you get the table to appear under the map.




Save your project.


7. Adding data 


We’re now going to bring in the data that we want to show on our map. We have data on housing occupancy - as mentioned earlier.  

Your data needs to be a delimited text file - most often this will be a CSV file.

It will also need to have a column with data that matches an equivalent column in the map data - we are going to use the GEO_ID code for that. The GEO_ID in the data file matches the GEOIDFQ column in the map data, so we can join them together to make one dataset.

How to add data

In the toolbar go to Layer > Add layer > Add Delimited Text Layer

Browse to find DECENNIALDHC2020.H1-Data.csv

  • It’s a CSV so keep that file format button checked
  • Keep number of header lines to discard as 0 and keep First record has field names checked.
  • Check/tick detect field types so it will treat your numbers as numbers rather than text

There is nothing inherent in the file which tells you how to map it straight away so choose No geometry.

Let’s see what we have got by looking at the attribute table for our DECENNIALPL2020.H1-Data layer.

It contains housing occupancy data 🏡 for each census tract within counties in Minnesota

Our map is going show the percentage of vacant homes in each census tract, but the data doesn’t have that yet.

Our map is of census tract areas. Within the data, we’ve got total figures for whole counties as well. 

The number of housing 🏡 units (or homes) is column H1_001N 

The number that are occupied 🐈‍⬛ is H1_002N

The number of vacant housing units 🫥 is column H1_003N


Some general tips re bringing In your data:

  • Make sure your first row has column headers.
  • Keep column headers short - and some versions of QGIS need single words without spaces (use underscore).
  • Try to make sure that your data is as clean and consistent as possible, any text in a numbers column can mean QGIS thinks the whole column is text.  
  • Keep symbols out of the data - e.g. don't have 100% in a cell, just use 100. 

8. Joining data 

Let’s get this added to the map - so we can start to tell a story with the data.

Double click on your map layer, tl_2023_27_tract, in the layers panel. This time choose the triangle with the dot to open the joins box, and click the + symbol to add a new join.

Join layer will probably already be selected as DECENNIALDHC2020.p1.Data as you only have one dataset so it knows you want to choose that.

Then you need to specify which columns match each other for a join.

In our data the column is called GEO_ID. In the map, this matches the field called GEOIDFQ.

So we select GEO_ID as our Join field and then we choose GEOIDFQ from the ‘Target field’ dropdown.
Hit OK, then when the box disappears, click Apply in your Layer Properties.

It looks as though nothing has happened but if you use the Identify features button again on the map you can see the data has joined. Look at the bottom of the table and the data we want is there. We’ve got some numbers of homes from our CSV.

9. Making a choropleth 


We are ready to show this housing vacancy data on the map.

We’ll do one just showing the number of homes, H1_001N.

Right click on your map layer and click Properties

Go to Symbology.
Change Single symbol at the top to Graduated - that’s because we have a sliding scale of continuous data. If we had categories like high, low and medium we would choose Categorised

We can see our fields in the drop down menu for Value, so lets choose our H1_001N.

Next we want to click Classify.

This will populate the box with some ‘breaks’ - colours based on a graduated scale. The darker the red, the more homes in that particular census tract. Click Apply and your map should change.



And look a bit like this:


Save your project again now.


10. Field calculator 


So far, so good, but also not very meaningful.

All we are really showing is the places with the most homes - which will just show population. What we really want to know is which places have the most vacant ones as a proportion or percentage.

We can use QGIS to help us do this.

Go to the attribute table of our original DECENNIALPL2020.H1-Data.

Then click on the little abacus icon to open the field calculator.

We need to give our new column a field name. I’m calling it vacant_percent.

We can leave the field type as integer because it’ll be a number.

In the box, click fields and values and double click on H1_003N, which is our raw number of vacant homes. 

That puts it in the expression box. We now click the / symbol to divide and then double click H1_001N, our total number of homes.

Finally, we click the * and type 100, to multiply this by 100 and click OK

And here we have a percentage in our data.


Now let’s change what we’re showing on the map.

Let’s double click on our map, the tl_2023_27_tract layer, and go back to our Symbology menu.

If we click the dropdown by Value, we can select our new column.

Click classify again and apply. Be patient as it may take a few seconds to do each step.

And now we have a map with breaks based on the new percentage data.

We have a better measure, but we’re not really showing it how we’d want to. Any area with a vacancy rate over 11% is the deepest red.




11. Breaking it up 


We need to pick something sensible and meaningful to divide our data into chunks - we currently have it split by Equal Count with five classes, which will split it into five with the same number of tracts in each bucket.

To do this, go back into your symbology menu and change the mode under the Classes box

Let’s fix those breaks into something a little more human-looking. 

It can help to see how your data is distributed. Go to the Histogram tab and Load Values - this shows you where the current breaks are and the range of your data. 






Now go back to the Classes tab and you can directly edit the break points. For example, we could choose Pretty Breaks, which creates recognisable class boundaries in whole numbers.

You can also play around with the other methods of classification:
  • Quantile (Equal count) - makes sure that there are the same number of tracts in each bucket. If you add more classes you’ll get more detail.
  • This one is good for quickly making the map look interesting, but check that your breakpoints aren’t odd as it can look purposely manipulated
  • Pretty Breaks - Creates recognisable class boundaries in whole numbers
  • Equal Interval - creates classes which have the same range as each other. 
  • Natural Breaks (Jenks) - This tries to group data in clusters with the biggest variance between the classes 
  • Standard Deviation - This method will calculate the mean of the data, and create classes based on standard deviation from the mean.

Here’s a version with Natural Breaks. It’s important you choose the one that works best for your data.





12. Changing the colours 



News organisations often have a house style when it comes to the colours for maps and charts. Here’s a snippet of the BBC’s:


These are often based on accessibility. 

If you want to change your colours, double click on your map layer and bring up Symbology.

Now click on the colour ramp bar itself, then change Color 2 to the shade you want for your highest numbers.

You can select this by clicking the wheel with the triangle in it. Either click on the colour you want or enter its unique code in the HTML notation, which is six digits after a #



You can also specify individual colours by double clicking on the coloured square within your classes box.

When you’re done, click Apply and OK.

13. Adding point data 

To help readers orientate themselves, we can add some prominent places on to the map. This comes in a list of ‘points’ - pairs of latitude-longitude co-ordinates with associated data like place names.

There are two ways of adding point data - it can be in a shapefile or in a csv.

Adding major cities

Add a new vector layer called USA_Major_Cities.shp

You can just drag it in from your folder.

If a box pops up saying you need to select Transformation for USA_Major_cities, just click OK.

We have points for each of the major cities in the map area. Initially they may appear as a random colour like bright yellow or orange. You might get something else.

We’ll end up with way more cities than we want as it’s for the whole of the USA and it has a lot of the more densely populated places around Minneapolis and St Paul.

Right click on the USA_Major_Cities layer and filter

If we click on ST for State and then click All, we’ll get a list of all the state codes.

If we double click ST for State, it adds it to our expression box. We can then click the equals sign and, from our values box, double click on MN for Minnesota from our list of values.

If we then click the AND button, double click Population and type > 90000 we can also specify we want just the largest cities.
Note the difference in the quotation marks here. If you use double quotation marks around the variable you select or type in yourself, in this case the MN, you’ll get an error message.

"ST" = 'MN'  AND  "POPULATION"  > 90000

This has now filtered our cities data to places in Minnesota with populations of more than 90,000.

But it’s not very clear on the map.

You can adjust the colour and opacity of the marker in the Symbology section of the USA_Major_Cities layer.

For more detailed settings, choose Simple Marker from the Symbol Settings menu.

Make the size of each marker 4mm with a colour of #ffffff (white) and stroke width of 1mm the defauly black. Stroke width may appear as “Hairline” initially.

Finally for this layer we will add the labels using this symbol. Choose Single Labels from the dropdown menu and select the field with place names in: NAME.
Again, news organisations will have their own house style for text labels - we’ll choose Helvetica 18pt in black with a white 1mm buffer.

To select the buffer, click buffer and check draw text buffer.

14. Save your map as an image 

To create an image file for use online, on broadcast or in print, zoom the map so that it fills the window. 

Right click and zoom to layers works as well.

Choose Project > Import/Export > Export Map to Image…
To set the size of the image, click Draw on Canvas and use the cross symbol to draw a box around the map. Alter the resolution depending on the intended use, then click Save. Choose a location and filename for your map.





We won’t get into this here, but you can also add titles, subtitles and legends to your graphic if you create a print layout, which you can select in the Project menu.


Useful links


  • If you want to know more about the data we’ve been using, you can find it on the Census website here.