The State Of The OpenStreetMap Road Network In The US


Looks can be deceiving – we all know that. Did you know it also applies to maps? To OpenStreetMap? Let me give you an example.

Head over to osm.org and zoom in to an area outside the major metros in the United States. What you’re likely to see is an OK looking map. It may not be the most beautiful thing you’ve ever seen, but the basics are there: place names, the roads, railroads, lakes, rivers and streams, maybe some land use. Pretty good for a crowdsourced map!

What you’re actually likely looking at is a bunch of data that is imported – not crowdsourced – from a variety of sources ranging from the National Hydrography Dataset to TIGER. This data is at best a few years old and, in the case of TIGER, a topological mess with sometimes very little bearing on the actual ground truth.

TIGER alignment example

The horrible alignment of TIGER ways, shown on top of an aerial imagery base layer. Click on the image for an animation of how this particular case was fixed in OSM. Image from the OSM Wiki.

For most users of OpenStreetMap (not the contributors), the only thing they will ever see is the rendered map. Even for those who are going to use the raw data, the first thing they’ll refer to to get a sense of the quality is the rendered map on osm.org. The only thing that the rendered map really tells you about the data quality, however, is that it has good national coverage for the road network, hydrography and a handful of other feature classes.

To get a better idea of the data quality that underlies the rendered map, we have to look at the data itself. I have done this before in some detail for selected metropolitan areas, but not yet on a national level. This post marks the beginning of that endeavour.

I purposefully kept the first iteration of analyses simple, focusing on the quality of the road network, using the TIGER import as a baseline. I did opt for a fine geographical granularity, choosing counties (and equivalent) as the geographical unit. I designed the following analysis metrics:

  • Number of users involved in editing OSM ways – this metric tells us something about the amount of peer validation. If more people are involved in the local road network, there is a better chance that contributors are checking each other’s work. Note that this metric covers all linear features found, not only actual roads.
  • Average version increase over the TIGER imported roads – this metric provides insight into the amount of work done on improving TIGER roads. A value close to zero means that very little TIGER improvements were done for the study area, which means that all the alignment and topology problems are likely mostly still there.
  • Percentage of TIGER roads – this says something about contributor activity entering new roads (and paths). A lower value means more new roads added after the TIGER import. This is a sign that more committed mappers have been active in the area — entering new roads arguably requires more effort and knowledge than editing existing TIGER roads. A lower value here does not necessarily mean that the TIGER-imported road network has been supplemented with things like bike and footpaths – it can also be caused by mappers replacing TIGER roads with new features, for example as part of a remapping effort. That will typically not be a significant proportion, though.
  • Percentage of untouched TIGER roads – together with the average version increase, this metric shows us the effort that has gone into improving the TIGER import. A high percentage here means lots of untouched, original TIGER roads, which is almost always a bad thing.

Analysis Results

Below are map visualizations of the analysis results for these four metrics, on both the US State and County levels. I used the State and County (and equivalent) borders from the TIGER 2010 dataset for defining the study areas. These files contain 52 state features and 3221 county (and equivalent) features. Hawaii is not on the map, but the analysis was run on all 52 areas (the 50 states plus DC and Puerto Rico – although the planet file I used did not contain Puerto Rico data, so technically there’s valid results for 51 study areas on the state level).

I will let the maps mostly speak for themselves. Below the results visualisations, I will discuss ideas for further work building on this, as well as some technical background.

Map showing the number of contributors to ways, by state

Map showing the average version increase over TIGER imported ways, by state

Map showing the percentage of TIGER ways, by state

Map showing the percentage of untouched TIGER ways, by state

Map showing the number of users involved in ways, by county

Map showing the average version increase over TIGER imported ways, by county

Map showing the percentage of TIGER ways

Map showing the percentage untouched TIGER roads by county

Further work

This initial stats run for the US motivates me to do more with the technical framework I built for it. With that in place, other metrics are relatively straightforward to add to the mix. I would love to hear your ideas, here are some of my own.

Breakdown by road type – It would be interesting to break the analysis down by way type: highways / interstates, primary roads, other roads. The latter category accounts for the majority of the road features, but does not necessarily see the most intensive maintenance by mappers. A breakdown of the analysis will shed some light on this.

Full history – For this analysis, I used a snapshot Planet file from February 2, 2012. A snapshot planet does not contain any historical information about the features – only the current feature version is represented. In a next iteration of this analysis, I would like to use the full history planets that have been available for a while now. Using full history enables me to see how many users have been involved in creating and maintaining ways through time, and how many of them have been active in the last month / year. It also offers an opportunity to identify periods in time when the local community has been particularly active.

Relate users to population / land area – The absolute number of users who contributed to OSM in an area is only mildly instructive. It’d be more interesting if that number were related to the population of that area, or to the land area. Or a combination. We might just find out how many mappers it takes to ‘cover’ an area (i.e. get and keep the other metrics above certain thresholds).

Routing specific metrics – One of the most promising applications of OSM data, and one of the most interesting commercially, is routing. Analyzing the quality of the road network is an essential part of assessing the ‘cost’ of using OpenStreetMap in lieu of other road network data that costs real money. A shallow analysis like I’ve done here is not going to cut it for that purpose though. We will need to know about topological consistency, correct and complete mapping of turn restrictions, grade separations, lanes, traffic lights, and other salient features. There is only so much of that we can do without resorting to comparative analysis, but we can at least devise some quantitative metrics for some.

Technical Background

  • I used the State and County (and equivalent) borders from the TIGER 2010 dataset to determine the study areas.
  • I used osm-history-splitter (by Peter Körner) to do the actual splitting. For this, I needed to convert the TIGER shapefiles to OSM POLY files, for which I used ogr2poly, written by Josh Doe.
  • I used Jochen Topf‘s osmium, more specifically osmjs, for the data processing. The script I ran on all the study areas lives in github.
  • I collated all the results using some python and bash hacking. I used the PostgreSQL COPY function to import the results into a PostgreSQL table.
  • Using a PostgreSQL view, I combined the analysis result data with the geometry tables (which I previously imported into Postgis using shp2pgsql).
  • I exported the views as shapefiles using ogr2ogr, which also offers the option of simplifying the geometries in one step. Useful because the non-generalized counties shapefile is 230MB and takes a long time to load in a GIS).
  • I created the visualizations in Quantum GIS, using its excellent styling tool. I mostly used a quantiles distribution (for equal-sized bins) for the classes, which I tweaked to get prettier class breaks.

I’m planning to do an informal session on this process (focusing on the osmjs / osmium bit) at the upcoming OpenStreetMap hack weekend in DC. I hope to see you there!

Tutorial: Creating buffered country POLYs for OpenStreetMap data processing


OpenStreetMap represents a lot of data. If you want to import the entire planet into a PostGIS database using osmosis, you need at least 300GB of hard disk space and, depending on how much you spent on fast processors and (more importantly) memory, a lot of patience. Chances are that you are interested in only a tiny part of the world, either to generate a map or do some data analysis. There’s several ways to get bite-sized chunks of the planet – take a look at the various planet mirrors or the cool new Extract-o-tron tool – but sometimes you may want something custom. For the data temperature analysis I did for State of the Map, I wanted city-sized extracts using a small buffer around the city border. If you want to do something similar – or are just interested in how to do basic geoprocessing on a vector file – this tutorial may be of interest to you. Instead of city borders, which I created myself from the excellent Zillow neighborhood boundary dataset, I will show you how to create a suitably generalized OSM POLY file (the de facto standard for describing polygon extracts used by various OSM tools) that is appropriate for extracting a country from the OSM planet with a nice buffer around it.

Let’s get to work.

Preparing Quantum GIS

We will need to add a plugin that allows us to export any polygon from your QGIS desktop as an OSM POLY file. We can get that OSM POLY export plugin for Quantum GIS here.

Unzip the downloaded file and copy the resulting folder into the Python plugins folder. On Windows, if you used the OSGeo installer, that might be

C:\OSGeo4W\apps\qgis\python\plugins

See here for hints where it may be for you.

The plugin should now appear in the Quantum GIS plugin manager (Plugins > Manage plugins…).
If it is not selected, do that now and exit the plugin manager.

Getting Country Borders

Easy. Download world borders from http://thematicmapping.org/downloads/world_borders.php

Unzip the downloaded file and open it in QGIS:

Geoprocessing step 1: Query

Open the Layer Query dialog by either right-clicking on the layer name or selecting Query… from the Layer menu with the TM_WORLD_BORDERS-0.3 layer selected (active).

Type “ISO2″ = “US” in the SQL where clause field and run the query by clicking OK.

Geoprocessing step 2: Buffering

The next step is to create a new polygon representing a buffer around an existing polygon. Because we already queried for the polygon(s) we want to buffer, there’s no need to select anything in the map view. Just make sure the TM_WORLD_BORDERS-0.3 layer is active and select Vector > Geoprocessing Tools > Buffer(s):

Make sure the input vector layer is TM_WORLD_BORDERS-0.3. Only the query will be affected, so we’re operating on a single country and not the entire world.

For Buffer distance, type 1. This is in map units. Because our source borders file is in EPSG:4326, this corresponds to 1 degree which is 69 miles (for the longitudinal axis, that measurement is only valid at the equator and decreases towards the poles). This is a nice size buffer for a country, you may want something larger or smaller depending on the size of the country and what you want to accomplish, so play around with the figure and compare results. Of course, if your map projection is not EPSG:4326, your map units may not be degrees and you should probably be entering much bigger values.

Select a path and filename for the output shapefile. Do not select ‘Dissolve buffer results’. The rest can be left at the default values. Push OK to run the buffer calculation. This can take a little while and the progress bar won’t move. Then you see:

Click Yes. Now we have a buffer polygon based on the US national border:

Geoprocessing step 3: Generalizing

We’re almost done, but the buffer we generated contains a lot of points, which will make the process of cutting a planet file slow. So we’re going to simplify the polygon some. This is also a QGIS built-in function.

Select Vector > Geometry tools > Simplify geometries:

Make sure your buffer layer is selected as the input. Set 0.1 (again, this is in map units) as the Simplify tolerance. This defines by how much the input features will be simplified, the higher this number, the more simplification.

Select a destination for the simplified buffer to be saved. Also select Add result to canvas. Click OK:

This dialog may not seem very promising, but it has worked. Also, I have sometimes gotten an error message after this process completes. Ignore these if you get them.

Geoprocessing step 4: resolving multipolygons

Now, if your simplified country border consists of multiple polygons (as is the case with the US) we have a slight problem. The POLY export plugin does not support multipolygons, so we need to break the multipolygon into single polygons. And even then, we will need to do some manual work if we want OSM .poly files for all the polygons. This is because the plugin relies on unique string attribute values to create different  POLY files, and we do not have those because the polygons we are using are all split from the same multipolygon. So we need to either create a new attribute field and manually enter unique string values in it, or select and export the parts to POLY files one by one and rename the files before they get overwritten.

Finale: Export as POLY

I am going to be lazy here and assume I will only need the contiguous US, so I select the corresponding polygon. After that I invoke the plugin by selecting Plugins > Export OSM Poly > Export to OSM Poly(s):

The plugin will show a list of all the fields that have string values. Select ISO2 and click Yes. Next you will need to select a destination folder for your exported POLY files. Pick or create one and push OK.

This is it! Your POLY files are finished and ready to be used in Osmosis, osmchange and other tools that use it for data processing.

By the way: you can’t load POLY files into JOSM directly, but there’s a perl script to convert POLY files to OSM files that I used in order to visualize the result.