For the impatient: do you want to get to work solving highway trouble in OpenStreetMap right away? Download the Trouble File here!
Making pretty and useful maps with freely available OpenStreetMap data has never been so easy and so much fun to do. The website Switch2OSM is an excellent starting point, and with great tools like MapBox’s TileMill at your disposal, experimenting with digital cartography is almost effortless. Design bureau Stamen shows us some beautiful examples of digital cartography based on OpenStreetMap data. Google starting to charge for using their maps API provides a compelling push factor for some to start down this road, and the likes of Foursquare and Apple lead the way.
With all the eyes on OpenStreetMap as a source of pretty maps now, you would almost forget that the usefulness of freely available OpenStreetMap data extends way beyond that. One of the more compelling uses of OpenStreetMap data is routing and navigation, and things have been moving there. Skobbler has succeeded in making a tangible dent in the turn-by-turn navigation market for mobile devices in some countries, offering similar functionality as TomTom but at a much, much lower price point, using freely available OpenStreetMap data. MapQuest and CloudMade offer routing APIs based on OpenStreetMap. New open source routing software projects OSRM and MoNav show promise with very fast route calculation and a full feature set, and both are built from the ground up to work with OpenStreetMap data.
Routing puts very different, much stricter requirements on the source data than map rendering. For a pretty map, it does not matter much if roads in the source data do not always connect or lack usage or turn restriction information. For routing, this makes all the difference. Topological errors and lacking usage restriction metadata make for incorrect routes. They will direct you to turn left onto a one-way street, get off the highway for no apparent reason, even if there is no exit. That may seem funny if you read about it in a British tabloid, but it’s annoying when you’re on a road trip, and totally unacceptable if you depend on routing software for your business. So unless the data is pretty much flawless, we won’t see major providers of routing and navigation products make the switch to OpenStreetMap that some have so eagerly made for their base maps.
It turns out the data is not flawless. A study done at the University of Heidelberg shows that even for Germany, the country with the most prolific OpenStreetMap community by a distance, the data is not on par with commercial road network data when compared on key characteristics for routing. (Even though the study predicts that in a few months, it will be).
Turning to the US, the situation is bound to be much worse. With a much smaller community that is spread pretty thin geographically (and in some regions, almost nonexistent), and the TIGER import as a very challenging starting point, there is no way that any routing based on OpenStreetMap data in the US is going to be anywhere near perfect. Sure, the most obvious routing related problems with the TIGER data were identified and weeded out in an early effort (led by aforementioned CloudMade) shortly after the import, but many challenges still remain.
In an effort to make OpenStreetMap data more useful for routing in the US, I started to identify some of those challenges. Routing is most severely affected by problems with the primary road network, so I decided to start from there. Using some modest PostGIS magic, I isolated a set of Highway Trouble Points. The Trouble breaks down into four main classes:
This is the case where a road crossing over or under a highway is not tagged as a bridge, and even worse, shares vertices with the highway, as illustrated below. This tricks routing software into thinking there is a turn opportunity there when there is not. This is bad enough if there actually is an exit, like in the example, but it gets really disastrous when there is not.
Imaginary Exit Trouble
Sometimes, a local road or track will be connected to a highway, tricking routing software into possibly taking a shortcut. Repairing these is simple: unglue the shared node and move the end of the local road to where it actually ends, looking at the aerial imagery.
Service Road Trouble
The separate roadways of a highway are sometimes connected to allow emergency vehicles to make a U-turn. Regular traffic is not allowed to use these connector service ways, but during the TIGER import they were usually tagged as public access roads, again potentially tricking routing software into taking a shortcut. I repair these by tagging them as highway=service and
access=official, access=no, emergency=yes.
Rest Area Trouble
This is of secondary importance, as rest areas are usually not connected to the road network except for their on- and off-ramps. Finding these Trouble points was an unexpected by-product of the query I ran on the data. What we have here is rest areas that are not tagged as such, instead just existing as a group of ‘residential’ roads connecting to the highway features, without a motorway_link. While we’re at it, we can clean these up nicely by adding motorway_links at the on- and off-ramps, the other road features as highway=service, adding the necessary oneway=yes and identifying a node as highway=rest_area. It’s usually obvious if there are toilets=yes from the aerial image, too.
I have done test runs of the query on OSM data for Vermont and Missouri. The query is performed on a PostGIS database with the osmosis snapshot schema, optionally with the linestring extension, and goes like this:
DROP TABLE IF EXISTS candidates; CREATE TABLE candidates AS WITH agg_intersections AS ( WITH intersection_nodes_wayrefs AS ( WITH intersection_nodes AS ( SELECT a.id AS node_id, b.way_id, a.geom FROM nodes a, way_nodes b WHERE a.id = b.node_id AND a.id IN ( SELECT DISTINCT node_id FROM way_nodes GROUP BY node_id HAVING COUNT(1) = 2 ) ) SELECT DISTINCT a.node_id AS node_id, b.id AS way_id, b.tags->'highway' AS osm_highway, a.geom AS geom, b.tags->'ref' AS osm_ref FROM intersection_nodes a, ways b WHERE a.way_id = b.id ) SELECT node_id, array_agg(way_id) AS way_ids, array_agg(osm_highway) AS osm_highways, array_agg(osm_ref) AS osm_refs FROM intersection_nodes_wayrefs GROUP BY node_id ) SELECT a.* , b.geom AS node_geom, -- COMMENT NEXT LINE OUT IF YOU DON'T HAVE -- OR WANT WAY GEOMETRIES c.linestring AS way_geom FROM agg_intersections a, nodes b, ways c WHERE ( 'motorway' = ANY(osm_highways) AND NOT ( 'motorway_link' = ANY(osm_highways) OR 'service' = ANY(osm_highways) OR 'motorway' = ALL(osm_highways) OR 'construction' = ANY(osm_highways) ) ) AND a.node_id = b.id AND c.id = ANY(a.way_ids); ;
The query took about a minute to run for Vermont and about 5 minutes for Missouri. For Vermont, it yielded 77 points and for Missouri 193 points. You can download these files here, but note that I have already done much of the cleanup work in these states since, as part of my thinking on how to improve the query. It still yields a some false positives, notably points where a highway=motorway turns into a highway=trunk or highway=primary, see below.
UPDATE: This query filters out these false positives, it uses the ST_Startpoint and ST_Endpoint PostGIS functions to determine if two line features ‘meet’:
DROP TABLE IF EXISTS candidates_noendpoints; CREATE TABLE candidates_noendpoints AS SELECT DISTINCT c.node_id, c.node_geom FROM ways a, ways b, candidates c WHERE ST_Intersects(c.node_geom, a.linestring) AND ST_Intersects(c.node_geom, b.linestring) AND NOT ( ST_Intersects(c.node_geom, ST_Union(ST_StartPoint(a.linestring),ST_Endpoint(a.linestring))) AND ST_Intersects(c.node_geom, ST_Union(ST_StartPoint(b.linestring),ST_Endpoint(b.linestring))) ) ;
This query requires the availability of line geometries for the ways, obviously.
UPDATE 2: The query as-is made the PostgreSQL server croak because it ran out of memory, so I had to redesign the query to rely much less on in-memory tables. I will provide the updated query to anyone interested. I’m going to leave the original SQL up there, it was meant to convey the approach and it still does. The whole US trouble file is available as an OSM XML file from here.
I plan to make the Highway Trouble files available on a regular basis for all 50 states if there’s an interest for them. And as always I’m very interested to hear your opinion: any Trouble I am missing? Ways to improve the query? Let me know.