Building a scatterplot map in QGIS and TileMill

image

Scatterplot maps are great way of showing geographic density and diversity, conveying more visual information than your typical color-coded chloropleth.

I like how every point represents something concrete, giving an intimate feel close up and a thoughtful aggregate take when zoomed out. (Here is one of my favorites, showing population shifts in Washington, DC.)

Last week, I made a scatterplot mapping Pittsburgh’s immigrant population, using American Community Survey data and slicing it by geographic sub-region. And it wasn’t too hard!

Requirements

  • QGIS, an open-source GIS manager similar to ArcGIS (but free!)
  • Tilemill
  • A Mapbox account (unless you’re hosting the map tiles yourself, in which case you probably know how to do this already).
  • (Optional) Leaflet.js and some coding know-how.

Getting the data

  1. Let’s start by downloading the ACS counts of foreign-born persons, the key dataset we need to power this project. Head over to the American Factfinder.
  2. I usually start by selecting my geography. In this case, it was “Census Tract - 140,” with all census tracts within Allegheny County, Pennsylvania selected. Add that to your selections.
  3. Narrow down that big list of results by typing “B05006" into the "Topic or table name" field of the "Refine your search" box. Then click "2012 ACS 5-year estimates" version — might as well be current! image
  4. Once on the data screen — it should show a long list of foreign counties with columns separating Census tracts — click “Download.” I prefer to download my data with descriptive data element names (who has time to figure out header codes?) and with annotations in a single file. Don’t bother downloading it as a .xls file — CSV is cleaner and is read just as easily by Excel.
  5. Pop open the downloaded zip file, open the spreadsheet and make sure you’ve got what you need. I took the extra step of filtering the file down to only geographic sub-regions (like Northern Europe, Middle Africa, etc.), deleting country-specific or continent-wide rows.
  6. OK! Time to get the shapes we’ll need to build this map.

Download shapefiles

  1. Bless the Census’s heart. Not only does it provide numerical data for us, it also ships out geographic shapes through the TIGER Products system. Let’s grab Allegheny County’s 2012 Census tract outlines, starting at this page
  2. Open up QGIS Desktop. Click “Add Vector Layer" from the left-hand toolbar.
  3. Click “Browse" and select the .shp file from the extracted shapefile folder you downloaded from the Census. (Mine is called "tl_2013_42_tract.shp.") Click "Open" to add the shapefile. 



  4. Bonus step: If you don’t want your randomly scattered dots to show people floating in a river, download a countywide water features shapefile from the Tiger Lines site and subtract it from your county tract shapefile. That will eliminate bodies of water — lakes, rivers, creeks, etc. — from your census tracts. (Beyond the scope of this post, I’m afraid.)
  5. Now to add the data. Click “Add Text Deliminated Layer" from the left-hand toolbar. Select the population estimates CSV you cleaned up. Make sure to click "No geometry" in the "Geometry definition" box. 
  6. Right-click the tract layer in your Layer bin and click “Properties.” From here, click “Joins.” 
  7. Click the green “+” button at the bottom of the page. You’ll then have to choose the field that links the two datasets together. In this case, it’s TractID in the CSV file and TRACTCE in the shapefile.
  8. OK! You’ve downloaded a shapefile, formatted it and associated data with each feature.

Scatter the dots

  1. First, a bit of weirdness. There’s a bug in QGIS that deactivates the scattered-dot function unless you pet it softly and hand-feed it Fancy Feast. So! 
  2. Right-click the tract layer in the layer bin and click “Save As.” Save the shapefile as a GeoJSON file. Then, re-import the new file as a new vector layer. Delete the old layer. Trust me, that was completely necessary.
  3. Open up the “Random points" box through Vector —> Research Tools. Select the population sector you want under “Use value from input field.” Make sure you select a destination for the shapefile that has a meaningful name, like “immigrants-australia.” Hit OK and let QGIS add a new layer of dots to your workspace. 


    image

  4. Rinse and repeat, selecting a new region under “Use value from input field" and saving each collection of dots to a new shapefile.

Make map tiles

  1. Open TileMill and start a new project. Uncheck “Include world layer and styles" — they’ll just get in your way. 
  2. Let’s make a basic style for all the population dots. In the style.mss pane, make a class that looks something like the following: 

    .immigrants {
    marker-width: 6;
    marker-fill:#f45;
    marker-line-opacity:0;
    marker-allow-overlap:true;
    }

  3. From the Layers screen, click “Add layer.” in the following screen, select one of your random dot shapefiles as the datasource. Make sure to give the layer the class you designated on your stylesheet — again, I used “immigrants.”
  4. Click “Save” (not “Save & Style”!) Voila! 

    image

  5. Repeat steps 2 and 3 for all the random-dot shapefiles you generated earlier. 
  6. OK, now you should have a whole mess of dots on your screen. Time to style them! Each layer (one for each shapefile you uploaded) should have a unique ID, something like “#westernafrica.” On your stylesheet, make a style for each ID that changes the marker-fill property. For example. #canada { marker-fill: #7EA7D8 } That will give you something like this: 

    image

  7. Click Export —> Upload. You may have to sign into your Mapbox account. 
  8. Give your project a nice name and description, and set the center and bounds of the visualization. You’ll also want to modify your zoom levels so you’re not exporting billions of blank tiles; I selected 11 as my lowest zoom and 16 as my highest.
  9. Click upload!

Final stop: Mapbox

  1. Log in to your online Mapbox account and hop over to “Data.” If your tiles have finished uploading, they should appear there. 
  2. Click your uploaded layer and hit “Create project.” That’ll take you to what looks like a blank screen, but never fear.
  3. Click the magnifying glass and enter the geography you’re mapping. For me, it’s “Pittsburgh.” That’ll re-center the map map over your dots. 
  4. So the dots are there, but where are the streets? Click “Style" and hit "Discard styles" under the "Color" tab. Boom! Streets reappear. 
  5. Save your project (you’ll have to re-enter your password) and head back to your main profile page. It should show up there (albeit as “Untitled project” … better fix that!). 
  6. You’re done! Click the “Share project" button to grab a permalink you can send everywhere.

Other things to do

  • Right now, your map is just a mess of dots. I found it helpful to generate a separate Mapbox layer for every cluster of interest — say, all European regions, or all Asian. That way …
  • … I could stack all these layers onto a Leaflet map and make them toggle-able! This is the route I ended up taking, instead of relying on Mapbox’s basic map. Leaflet has a helper function that makes adding Mapbox layers a snap … but you have to know some code.
  • You don’t have to use QGIS to generate your point data. Englewood is a great option for code-inclined folks.
  • Try other datasets! Maps of poverty/affluence would seem simple enough — but I’m actually thinking of mapping by profession, showing where all the mechanics, barbers and CEOs live.