A Look At Stale OpenStreetMap Data

Lazy people go straight here and here. But you’re not that person, are you?

The Wikimania conference is around the corner, and it’s close to home this year – in Washington, DC. DC already has a lot of resident geo geeks and mappers. With all the open, collaborative knowledge minded people in town, there is huge momentum for an OpenStreetMap Mapping Party, and I am excited to help run it! The party is taking place on Sunday, July 15 – the unconference day of Wikimania 2012. (There will  also be lots of other open mapping things going on, do check the program. The entry for the mapping party is kind of sparse, but hey – it’s a mapping party. What more is there to say?)

The question of where to direct the eager mappers quickly arose. In the beginning, that would have been an easy one as the map was without form and void. Nowadays, with the level of maturity of the map and the community OpenStreetMap has reached, it can be a lot harder. DC, with all its past mapping parties, well curated data imports and active mapping community, looks to be handsomely mapped. To pick a good destination for a mapping party requires a look under the hood.

A good indicator for areas that may need some mapping love is data staleness, defined loosely as the amount of time that has passed since someone last touched the data. A neighborhood with lots of stale data may have had one or more active mappers in the past, but they may have moved away or on to other things. While staleness is not a measure of completeness, it can point us at weak areas and neighborhoods in that way.

I did a staleness analysis for a selection of DC nodes and ways. I filtered out the nodes that have tags associated with them, and the ways that are not building outlines. (DC has seen a significant import of building outlines, which would mess up my analysis and the visualization.) And because today was procratination day, I went the extra mile and made the visualization into a web map and put the thing on GitHub. I documented the (pretty straightforward) process step by step on the project wiki, for those who want to roll their own maps, and those interested in doing something useful with OpenStreetMap data other than just making a standard map.

Below are two screenshots – one for DC and another for Amsterdam, another city for which I did the analysis. (A brief explanation of what you see is below the images.) It takes all of 15 minutes from downloading the data to publishing the HTML page, so I could easily do more. But procrastination day is over. Buy me a beer or an Aperol spritz in DC and I’ll see what I can do.

About these screenshots: The top one shows the Mall and surroundings in DC, where we see that the area around the Capitol has not been touched much in the last few years, hence the dark purple color of a lot of the linear features there. The area around the White House on the other hand has received some mapping love lately, with quite a few ways bright green, meaning they have been touched in the last 90 days.

Similar differences in the Amsterdam screenshot below the DC one. The Vondelpark area was updated very recently, while the (arguably much nicer) Rembrandtpark is pale purple – last updates between 1 and 2 years ago.

Note that the individual tagged nodes are not visible in these screenshots. They would clutter up the visualization too much at this scale. In the interactive maps, you could zoom in to see those.

As always, I love to talk about this with you, so share your thoughts, ideas for improvements, and any ol’ comment.


5 thoughts on “A Look At Stale OpenStreetMap Data

  1. A comment on the Amsterdam map since it was me who updated the Vondelpark. The Vondelpark data simply was wrong (all footpathes that allow bikes were marked as cycle pathes) – if they had been added correctly in the first place, then the map would look just as ‘stale’.
    What you describe as ‘staleness’ might also simply be correct data.

    1. That does not really make a difference. What I try to measure here is that an area, a neighborhood is actually being actively maintained. If you reverse the argument it makes more sense: if everything is perfect & complete, there is not much need for anyone to go in and edit te data anymore, so the data grows stale from perfection. I don’t think we’re quite there yet in most places, for different reasons: mappers keep adding more detail, both geometries and metadata.and more importantly: reality keeps changing, especially in urban areas. Lastly, the OpenStreetMap information model is a moving target to an extent: new insights in how to map the truth onto pairs of keys and values may lead to mappers revisiting an area and changing metadata, like you just did. Be careful though to represent other mappers’ efforts as plain wrong – it may represent a different, older interpretation of the model. Be sure to discuss large scale ‘corrections’ with the local community.

Comments are closed.