QGIS 2: Analyzing geographic data

Wesley Stephenson & Callum Thomson BBC News

This tutorial: https://bit.ly/QGIS2-nicar2024

The aim


Build on your existing knowledge of QGIS and learn how to explore, manipulate and analyze geographic datasets to gain new insights.

This session is good for: 

Those who attended the QGIS I workshop or already know the basics of visualising geographic data in QGIS.

These are the full follow-along instructions from the NICAR session with a worked example.
Notes from the QGIS 1 workshop can be found here: https://bit.ly/QGIS1-nicar2024

Setup

We’ll be continuing on the theme of urban heat islands used in QGIS 1. We’re going to explore what factors might influence the size of the temperature differential. By the end of this session we should have tested at least one hypothesis and made the map above.

If you’re not on a NICAR laptop, you will find the data and shapefiles here - please download them before starting.

This includes a QGIS project that’s already been set up with our starting data in. Open QGIS and open the project (called QGIS_2_start_point.qgz)

There are 4 data sets pre loaded
Two visible layers:
  • baltimore_places - these are the locations and labels for places in and around Baltimore.
  • Baltimore_with_UHI - This is the choropleth map of the urban heat island effect we made in session 1
Three hidden layers (we’ll come to these later)
  • gis_osm_water_a_free_1 - this is a shapefile of the water in Baltimore
  • BMD_MULC - this has land use data in it
  • BMD_MULC_clipped - this is a clipped version of the above 

1. Removing the water

What is missing from the map above? That’s right - the harbour.
The census tracts shapefile we have been using doesn’t show the water in Baltimore and of course the harbour is one of the big features of the city. On the map we drew before, a large section of the dark pink area that represents a larger heat island effect is actually water. Let’s remove this. 

Map Layers

First switch off the labels layer baltimore_places using the check box in the Layers window. This just makes our basemap easier to see. 

Then we need to bring in a new layer which shows where the water is. We have already preloaded this into QGIS for you but if you did want to download it for yourself when you get to good wifi - you can find it here. It comes from OSM and you can download the latest data from major cities across the world from this site. You could also get water shapefiles from other sources, government websites , universities etc. Just remember to source you data in your final piece. 

Click on the check box next to the  gis_osm_water_a_free_1 layer in the Layers window. 
You should see something that looks like this:

Select the water we want

We can now see the harbour but also all other water across the city - ponds/reservoirs/wetlands. We want to just show the harbour. So let’s select just the polygons that represent the harbour. 

In the image above I have highlighted the select tools. Click the one furthest to the left, a yellow square with a mouse pointer on top of it. 

Then select the sections of water you want to show on our map. Press command or ctrl when you are clicking to select more than one area. 

Cut it out

We can now use this selection and cut this shape out of our original map. 

Navigate to Vector>Geoprocessing Tools>Difference

In the input layer we need the layer we want to cut out - so Baltimore_with_UHI
In the overlay layer we need the cookie cutter we have just selected from the water shapefile - select gis_osm_water_a_free_1 and then check the box undeneath that say Selected features only.

Then press the Run button (bottom right)

That will give you a new layer called Difference. Turn off all the layers except this one and you should see something that looks like this 

Which is great - all the census tracts in Baltimore and now you can see the harbour. But where have all the urban heat island breaks gone. The data is still there we just need to add the styling back in. 

Style it up (again)


Right click on Baltimore_with_UHI and navigate to Styles>Copy Style and click All style categories
Then right click on the Difference layer in the right hand Layers list and navigate to Styles>Paste Style and click All style categories
The difference layer should now look more like our original map. 
Lets also rename the layer. Right click on it in the Layers list on the left and go to rename layer and call it Baltimore_with_UHI_inc_harbour

2. Landuse

I think that one thing that might make the urban heat island affect less intense is green space. I want to test this hypothesis. 

Where do we get the data?

We could use Open Street Map data - this includes things like parks, grass, meadows etc but the data for Baltimore does seem to have a few gaps. 

Thankfully the EPA has a great dataset of landuse for several major cities across the US. 

You can download this data from here but to save you the hassle it is already in the project as BMD_MULC


Explore the data

Select this layer and deselect the Baltimore_with_UHI_inc_harbour layer.


Raster files

This is what is called a raster layer and the file that creates it is a geotiff file. This means rather than a having vectors which draw lines between geographical points this is an image file that is made up of pixels. If you zoom right in you can see the pixels. The file has geographic information in it so it is located in the right place on the map and each colour pixel represents a different type of landuse. 

Let’s style it

Its quite hard to see in black and white so lets style it. 

Double-click on BMD_MULC  in the layer window and then click on Symbology  on the left hand side.
In the Render type dropdown select Singleband pseudocolor
We could choose our own colours for this but let’s import them so it is the same as the EPA map. 
Click on the folder icon under the box. Open the file named BMD_MULC_tiff.txt and then press OK. Your map should look like this. 😍

Clipping the raster

If you select the Baltimore_with_UHI_inc_harbour  layer you will see that the our raster layer is much bigger. So we can clip this down so its the same size as our original choropleth. This process takes a while as the raster is a big file so we have done this for you. So deselect the BMD_MULC layer and select the BMD_MULC_clipped layer. Now you can see that they fit in the same geography. 

If you want to do this yourself you need to go to the Raster in the menu bar and navigate to Extraction>Clip Raster by Mask Layer… Your input layer should be the raster layer and the Mask layer should be the shape you want to cut out so in our case Baltimore_with_UHI_inc_harbour and then just click Run

3. Analysis of land use

So now we can see the landuse it is obvious and probably not surprising that most of the centre of the city is red - showing built up areas and the outskirts are more green showing trees and grassy areas. But what we want to look at is the percentage of each of the tracts that are green spaces. 


Counting the points

We can do this using the processing tools. 

In the top menu go to Processing>Toolbox and then in the menu that appears on the right click into Raster analysis and click on Zonal histogram. You will get a box like this

Let’s try it. In the box shown above click into the Vector layer containing zones dropdown and select Baltimore_with_UHI_inc_harbour
Then press Run

What this tool does is take each polygon in the  Baltimore_with_UHI_inc_harbour layer and creates a histogram of each of the points in raster file. So it counts all the pixels designated as forest, all those designated as impervious surfaces etc and adds them up so you can see how many of each category are in each polygon. 

And this is what we get. It has created a new layer called Output zones It doesn’t look much at the moment but that is because it hasn’t been styled.
But lets open the attribute table. Right click on the layer and go to Open attribute table in the menu. Its getting quite wide now so scroll to the right and you should see that you have some new columns called HISTO_ 

These are the number of pixels in each of the polygons assigned to each value. 

Work out the percentage

By itself this isn’t very interesting because the polygons are all different sizes and have different numbers of pixels in them so we need to work out the percentage of polygons that are green space. 

At the moment we just have numbers for each pixel value but if we look in the metadata provided with the geotiff that you download from the EPA you can see a file called BMD_MULC.xml In here you can see that these numbers map to different types of landuse like this:

Landuse number
Landuse type
10
Water
20
Impervious Surface
30
Soil and Barren
40
Trees
70
Grass and Herbaceous
80
Agriculture
91
Woody Wetlands
92
Emergent Wetlands

Using this data we can work out what percentage of each polygon is greenspace. 

Let’s work this out:

Click on the Output zones layer in the Layers list on the left hand side. Then in the top bar click the abacus icon.
 You should get a window like this:
We are going to write an expression that works out the percentage of green space in each polygon. 
  • Enter a name for your new field in the box highlighted above.
  • Change the output field type to Decimal number (real)
  • Click on the arrow next to Fields and Values (highlighted in green).
  • Double click on the first of the columns you want to include. I’d suggest HISTO_40. this will then appear in the Expression box on the left. 
  • Add a plus sign and then double click the next column you want to include I’d suggest HISTO_70
  • Keep going until you have all the columns you want to include. 
  • Put brackets around the whole expression
  • add a divides sign
  • Use the same method as above to add in all of the columns as our divisor
  • Multiply the whole thing by 100 to give us a percentage.
  • It should look somehing like this:
 ("HISTO_40" + "HISTO_70"  +   "HISTO_91" + "HISTO_92")/ ( "HISTO_10" + "HISTO_20" + "HISTO_30" + "HISTO_40" + "HISTO_70" + "HISTO_80" + "HISTO_91" + "HISTO_92" + "HISTO_NODATA" )*100
 
 Then press OK
 

Visualise it

We can visualise this data by making this into a choropleth as we did before in session 1 by using our new column. Or we could export the new data as a csv and we could perform some regression analysis on it to see how well greenspace correlates with the Urban Heat Island data. 


Look at the correlation

We could visualise this using the R plugin for Qgis. Its a bit beyond this session to teach you how to do this in detail but you can write R scripts that can be executed from within Qgis to analyse the data you have produced. So for example I wrote a script to plot the urban heat island temperature differentials against the percentage of green space in each census tract. 
This is the result and you can see that there is a negaitive correlation between the percentage of greenspace and the temperature differential, and that this accelerates once an area gets below 25% green space.

 



Note on CRS

The Coordinate Reference System (CRS) of a spatial data set such as a shape file is information about how the earth’s surface has been flattened so that it can be represented as a flat map. How this is done can affect how a map looks, and also the results of calculations carried out on the spatial data. More information is available here.
Full details of how CRS can affect spatial data processing is beyond the scope of this course, but bear in mind that different CRS are suitable for different purposes, and one CRS may not be suitable for all your needs.
For example, this QGIS project has been set up with a CRS (ESRI:26918) suitable for visualising Baltimore and the surrounding area. If you were visualising the whole US or a different part of the country you may want to use a different CRS.