We just wrapped up a weekend of OpenStreetMap hacking at the wonderful LinuxHotel. In this post, I am going to share a visualization idea I discussed in the car on the way here and seemed like enough fun to spend a day hacking on: using contour lines to visualize quality metrics for OpenStreetMap. I’ve been wanting to implement something like this for a while now, even though some similar efforts already exist, notably the OSM Inspector and OSMatrix.
OSM Inspector and OSMatrix
Jochen Topf’s OSM Inspector, a great tool for visualizing potential sources of error in the OpenStreetMap database. Although the Inspector is extensible and quite powerful, it focuses on individual data elements rather than providing the bird’s eye view on quality that I want to provide.
The OSMatrix tool is a recent effort by the great folks at the Geography department of the Unieversity of Heidelberg. OSMatrix provides an overlay of hexagonal cells visualizing a range of metrics for the data in each cell. It looks great and tells the kind of story I want to tell, but the tool only takes into account the current planet data, whereas I want to be able to tap into the wealth of information contained in the full history. Incidentally, the wish for better ways of managing full history data was one of the motivations for having this hack weekend in the first place.
The visualizations I want to provide should tell the story behind OpenStreetMap data from a local perspective. This goes beyond a plain time lapse visualization of the growth of the map from its inception, because OpenStreetMap is not just a database – more than anything, it’s people. Any visualization attempting to tell the story of OpenStreetMap should be about people as well as data.
Hacking my way to version contour lines
All these big plans notwithstanding, I am taking a very pragmatic approach here. My short-term goal is just to get something out there and collect feedback and generate ideas on how to take it from there, both on a technical level as well as on what to show. More than anything, I wanted something presentable at the end of a short weekend.
I decided to go with a small planet extract for the Amsterdam region, and attempt a contour lines visualization of the average version number. Here’s the approach I took in broad strokes:
- Download NL planet
- Extract Amsterdam bounding box
- Import into PostgreSQL
- Create a grid based on version numbers
- Create contour lines from grid
- Convert lines to polygons
- Creating WMS-T layer
- Putting it all together in an OpenLayers application
Let’s see how it worked out. First a screen shot of the final result to give you an idea of where we’re headed:
The Dutch OpenStreetMap servers provide a daily extract for BeNeLux (Belgium, the Netherlands, Luxembourg), available from here. The PBF format is strongly preferable to the legacy XML format: it’s smaller and processing is much, much faster, and it’s supported by the major OpenStreetMap data processing tools.
Once loaded, it takes only a single omosis command to cut an extract from a bounding box and load that into a PostgreSQL database:
osmosis --rb file="planet-benelux-latest.osm.pbf" --bb left=4.71 bottom=52.27 right=5.08 top=52.47 --sort --wp database="amsterdam" user="mvexel"
I am assuming here that you are familiar with creating a PostgreSQL database and preparing it for Osmosis usage. This page on the OpenStreetMap wiki provides more background if needed.
Next is creating a grid based on attribute data. There’s probably a plethora of ways to go about this, but I for one had never done it before. I guess I am a neogeographer after all. After looking into ways to do it in PostgreSQL, I stumbled on gdal_grid, a command line utility that is part of the GDAL suite. It exposes the grid creation functions of GDAL for command line use and seemed to do what I need. There is one snag though: it takes point data as an input source and as such can only handle the node data from OpenStreetMap. This is inherent in the process: grids are an aggregation of point data. For now, I am not going to care too much, but I will get back to this towards the end of this post.
gdal_grid offers several methods to calculate the cell values based on the individual point values. One is based on inverse square weighing, one on moving averages, and the simplest one just takes the value of the nearest point to the grid reference point (called the grid node). All methods operate on an ellipse centered at the grid node, taking into account all points within the ellipse for the calculation of the cell value. Some more background on the various methods and their math can be found here.
I experimented some with the first two methods, discarding the nearest neighbor method offhand as too coarse. The inverse square method is computationally much more intensive than the moving averages method, resulting in much longer processing times. Turning gdal_grid with inverse square weighing loose on my Amsterdam extract, which contains just over 1.1 million nodes, did not complete within the hour I was prepared to wait for it to do so. The moving averages method completed in about 15 minutes. Don’t ask for machine specs because I have no idea. Bug the sysadmins at #osm-nl for that ;P
I did a little experimenting with the ellipse size as well. As expected, bigger ellipse sizes make for a much smoother result with all extremes averaged away. It looks pretty, but does not tell me what I want to know. And bear with me: the end result is pretty in a way, too..
gdal_grid can output to any GDAL supported format, although I’m not sure all of them would make sense. I had it create a GeoTIFF using the following command:
gdal_grid -zfield "version" -l nodes PG:dbname=amsterdam -a average:radius1=0.01:radius2=0.008:angle=30 version_ways_001.tiff
The grid could be used for visualization directly, but I believe it does not tell a good story. A square grid to me suggests to much of a technical abstraction. I want contour lines because they emphasize the dynamics of the people that create OpenStreetMap. Of course this is a subjective observation, but that’s what visualization is about – telling a story involves conveying a feeling. I want people from inside the project as well as outsite to get a feeling for the map data.
GDAL also incorporates functions to generate contour lines from a grid. Luckily, these functions are wrapped in a command line tool, gdal_contour, so I can abstain from coding :). gdal_contour takes a small number of parameters, of which the nodata and interval ones are particularly relevant for how the visualization turns out. I ended up choosing an interval of 0.2, which I found generates the best signal-to-noise balance in the image.
gdal_contour -a elevation version_ways_001.tiff version_ways.shp -i 0.2
It seemed unfortunate at first that the gdal_contour tool outputs linestrings and not closed polygons. Most linestrings are closed anyway and it would be nice to be able to create color areas. I made an attempt to convert the linestrings to polygons, first in a python script using OGR functions (failed due to my lack of understanding of the OGR Python bindings), and later also in PostGIS (that worked), but ended up using the linestrings after all, for reasons that will become apparent soon
Next time I promise I will do a Mapnik stylesheet, but for now I am resorting to the AtlasStyler – Geoserver combo that I know well. AtlasStyler is a visual style editor that takes a PostGIS table or some other vector data source, and provides a nice GUI for classification, symbology and labeling. The created style can be exported as an SLD, which can be used in GeoServer.
AtlasStyler does not let you select the geometry column you want to visualize if your table has more than one. I did not notice this at first, and started an attempt to style the linestring geometries using quantile classification and a red-green color range. This came so close to what I wanted to achieve that I decided not to bother sorting out that multiple geometry columns issue.
I imported the SLD saved in AtlasStyler as a new style in GeoServer, and applied it to a newly created version lines layer (creating a layer from a PostGIS table in Geoserver is really easy, refer to the documentation for more background).
Because I chose thicker lines, the lower zoom levels pretty much look like filled polygons, while on higher zoom levels you would still be able to see the map.
Next Steps and More Ideas
This is a hacky prototype and and I chose to not make it publicly available just yet, firstly because it only covers a very small area and secondly because I will have very little time in the next week(s) to respond to comments and improve it.
The first thing I’d want to do is extend the coverage to at least the whole of NL. Also, discussing the version layer with the other OpenStreetMap Hack Weekend people gave me some ideas for further development. Let me summarize those below to conclude this post.
My initial idea was to aggregate node, way and relation versions into one visualization. Due to technical limitations I ended up visualizing only node versions – but that may be a better way anyway. Way versions tell a different story than node versions. Ways change version much less often, thus the average versions of ways are bound to be lower. Also, when a way does get a version increase it’s more likely to be a significant contribution to the map. It would thus make sense to add way versions as a separate countour lines layer instead of finding a way to incorporate them in to the node versions layer.
Also, the aesthetics of the visualization could be improved in a number of ways. First, filtering out the contour lines for version < 1 would deemphasize the holes in the data, for example large bodies of water or unused or unmapped land. As it is, the sharp red edges are distracting. Also, I’d love to smooth out the contour lines, but couldn’t seem to find a way in gdal_contour to do that. Can’t be hard though? I would also be nice to have labels on some contour lines, for example the whole integer lines. I know mapnik can do that, not so sure about Geoserver.
The first results also inspired some ideas for more visualizations. An interesting and easy one would be to visualize the time since the last edit. This would clearly show areas that have been abandoned by mappers. A somewhat more elaborate visualization would be the average number of edits in the lifespan of the feature. This would paint a clearer picture of the overall activity in a region. For that however, we need access to the full history, which is not available in the default planet. Full history is also a lot more data and on top of that – I may have mentioned this before – the current tools and data models cannot store full history in an easily accessible way.
That brings us full circle to the reason why we organized this Hack Weekend to begin with – to think about storing and retrieving OpenStreetMap history in a way that makes queries like ‘who all have contributed to this feature / this area?’ or ‘what did the map for this area look like two years ago?’. A lot of ideas were discussed to make this possible, both on the storage / database side of things as well as on the toolchain. I am glad that we got a group of people together who are all engaged with this topic on different levels, such as Jochen Topf bringing in his experience with osmium / osmjs, Peter Körner with his ongoing work on adapting the PBF format to allow history and creating a tool for making full history extracts, and Stefan de Konink with his strong background in database performance. I would like to say a heartfelt thanks to all who were there for making it a productive and fun weekend, to LinuxHotel for providing the perfect setting, and last but not least to our sponsor, OpenThesaurus, for helping to make this weekend possible!