Although the OpenStreetMap data license change from CC-BY-SA to ODbL is not formally done, the redaction bot has done its thing. It was activated in mid-July after rigorous testing and months of development to go over the entire planet, square degree by square degree, and delete any tainted node, way and relation or revert it to a untainted state. (An object is tainted, simply put, when it was at some point edited by a contributor who did not agree to the license change and the new contributor terms.)
OpenStreetMap Redaction bot progress map screenshot taken on July 17th as the bot was just progressing to Switzerland and re-running London.
Although less than 1% of all data was removed in the redaction process, some areas are much more strongly affected than others, depending on the amount and previous activitiy of local ‘decliners’ as well as pre-emptive remapping of tainted features. This was made possible by tools that identified potentially tainted objects, such as Simon Poole’s Cleanmap and a JOSM plugin. (These tools have since been discontinued.)
Looking at the situation in the US, what we’re left with after the license change redaction is a map that has issues. Well, let’s say it has more issues now than it had before the redaction. Just looking at the street network, which is the one single most important dimension of OpenStreetMap if you ask me, we don’t have to look very hard to find missing street segments, entire missing streets, and messy geometry. We had messy geometry before, because of the inherent messiness of TIGER/Line data off of which the US road network is based, but the redaction process left us with some really, ehm, interesting spaghetti in places.
Messy way geometries as a result of the redaction, near Los Angeles, CA.
Missing way segment as a result of the redaction, north of Honolulu, HI
And then there’s the invisible results of the redaction: lost attributes such as speed limits, lane count, direction of flow – but I would suggest we fix the road geometries and topological integrity of the main road network first, so the OpenStreetMap data is routable and the routes make sense. In order to do that effectively, the community needs tools. Fortunately, we have some available already. There’s Toby Murray’s map showing suspicious way segments last touched by the redaction bot. Kai Krueger provided a routing grid page that calculates route times and distances between major US cities, and compares those to reference data to expose potential routing issues. And there’s also the venerable OSM Inspector tool that provides a layer showing objects affected by the redaction process.
I also took a stab at a small set of tools that I hope will be helpful in identifying the remaining issues in the road network for the US: a Redacted / Deleted Ways map, and the Remap-A-Tron.
Redacted / Deleted Ways Map
Not unlike the redaction layer in OSM Inspector, this map duo exposes features affected by the redaction process, with a few notable differences. The main difference is that it has a clear focus on the US road network data. The Redacted Ways Map differentiates between Motorway, Trunk, Primary, Secondary and lower class roads, in an attempt to make it easier for remappers to seek out and prioritize the more important roads.
The Redacted Ways Map showing different road classes in appropriate coloring for easy prioritization of remapping efforts
The Deleted Ways map complements the Redacted Ways map. It only shows the way segments that were completely obliterated by the redaction bot. The focus on the road network means that only ways with a highway tag are visualized here.
The Deleted Ways map differentiates between ways that are likely to already have been remapped, and those that still need to be remapped.
If you look at this map, you will notice that there are two different classes of deleted ways, displaued in red and green respectively. The difference is in remapping status. The red ways are probably not remapped yet, while the green ways are likely to have been remapped already. I make this distinction by periodically comparing the deleted way geometries with newly mapped way geometries in the database. Using the Hausdorff distance algorithm as the main component, I devised an algorithm that reasonably accurately predicts whether a way has already been remapped.
I came up with another way to leverage the knowledge of remappedness of the OpenStreetMap street network. Building on an idea I already had semi-developed a while ago, I built a web application that serves up one deleted way geometry overlaid onto a current OpenStreetMap basemap. Using a simple menu or handy keyboard shortcuts, you can flag the segment as already remapped, skip it for whatever reason, or load that area into JOSM or Potlatch right away to remap it.
The Remap-A-Tron serves up one non-remapped important way segment at a time for remappers to work on.
The data backend is refreshed every (US) night, so there should not be too much of a lag. Currently, the app serves up only non-remapped important segments: motorways down to tertiary class roads. There are about 1900 segments in this category as of the time of writing. If a handful of people spend an evening with the Remap-A-Tron, this should be down to zero in a matter of days. Once we complete the repairs on the important roads, I can tweak the app to include all other highways (~15K segments not remapped) and ultimately all ways (~52K segments not remapped).
I built this tool with versatility in mind. After the street remapping is done, it could easily be repurposed to identify other issues in OSM that are identifiable on the Mapnik map: self-intersecting ways, road connectivity issues, missing bridge tags, maybe even missing traffic lights? I would love to hear your ideas.
If this turns out to be useful and used, I will try and increase the coverage to the entire world. For that to happen, the Remap-A-Tron will need to find a home that is not the server next to my desk, though..