In the coming months, I will be working on how to measure the quality of geospatial information, and visualizing the results of quality analysis. The actual indicators for quality are still to be defined, but will be along the lines of
- spatial density – how many features of a certain type does dataset A have, and how many does dataset B have?
- temporal quality – what is the age of the data? How much time has passed since survey, publishing?
- crowd quality – what I call the ‘5th dimension of spatial data quality’. more complex (separate post will follow) –
‘Crowd Quality’ has many dimensions. It is about peer review strength: how many surveyors have ‘touched’ a feature? how many surveyors are responsible for area X? It has several consistency components as well. One is internal attribute consistency: to what extent does the data conform to a set of core attrtibutes? Another is spatial and temporal quality consistency: considering a larger region, does the data show consistent measurements for spatial and temporal quality indicators as described above?
Quality analysis is an important issue for Volunteered Geographic Information projects like OpenStreetMap, because their data is consistently strongly scrutinized: it’s open, so it’s easily accessible and it’s very easy to take cheap shots at extensive voids in the map. Because of its openness, professional users have strong reservations pertaining to the quality of the data: there is almost no barrier for entry into the OpenStreetMap community: provide a username and an email address and you’re good to go – and delete all the data for Amsterdam, for example.
In a community of 200,000, map vandalism of such magnitude will be swiftly detected and reverted, and as such should not even be the biggest concern of potential users of VGI data. Smaller acts of map vandalism, however, might go undetected for a longer period of time, if they are detected at all. Moreover, with OpenStreetMap picking up momentum as it is currently doing, there’s a lot of new aspiring surveyors joining every day. Even when they all subscribe and start adding data with the best intentions, ‘newbies’ are bound to get it wrong at first, inadvertently adding a stretch of freeway in their residential neighborhood, or unintentionally moving features around when all they want to do is add their local pub. Even if the community tends to react to map errors – inadvertent or no – swiftly and pro-actively, the concerns potential users have about the quality of the data is legitimate. VGI is anarchy, and where there is anarchy, there are no guarantuees.
The need for quality analysis also arises from within the VGI communities themselves. As a VGI project matures, contributors are likely to shift their attention to details. This can certainly be said for OpenStreetMap, where some regions are nearing or reaching completion of the basic geospatial features. A quick glance of the current map will no longer be enough to decide how and where to direct your surveying and mapping effort. Data and quality analysis tools are needed to aid the contributors in their efforts. These can be really simple tabular comparisons; in many German cities for example, OpenStreetMap contributors have acquired complete and up-to-date street name lists from the local council, which they compare to the named streets that exist in the OpenStreetMap database. This effort (Essen, Germany here) yields a simple list of missing street names which can then be targeted for mapping efforts.
More complex and versatile data quality analysis tools are being developed as well. Let me give a few examples to conclude this article and give some idea of how the results of my quality analysis research could be visualized
Not an automated data analysis tool, this web site allows for simple map bug reporting. It was designed to provide a no-barrier way to report errors on the map: you do not even need to be registered as an OpenStreetMap user to use it. It provides some indication of data quality. It can be used by OpenStreetMap contributors to fix reported errors quickly; the web site provides a link to the web-based OpenStreetMap editor, Potlatch, with every reported error automatically.
Visual comparison: Map Compare and FLOSM
An often asked question pertaining to data quality of OpenStreetMap is: How does OpenStreetMap compare to TeleAtlas or NAVTEQ, the two major commercial vendors of street data. While comparing the spatial quality is in itself not a complicated task, you need to have
access to both data sets in order to actually do it. TeleAtlas and NAVTEQ data is expensive, so not many are in a position to actually do this comparison. In the course of my research, I will certainly perform a number of these analyses, as I am in the fortunate position to have easy access to commercial spatial data.
A simple but effective way to visually compare two spatial data sets is to overlay them in GIS software, or in a web mapping application. Making such overlay web applications available is generally discouraged in VGI communities, as it is thought to encourage ‘tracing’ data from proprietary sources. This is a violation of the licenses for most all commercial spatial data, and could thus mean legal trouble for VGI projects.
Nevertheless, some visual comparison tools do exist. Map Compare presents a side-by-side view of OpenStreetMap and Google Map, allowing for easy and intuitive exploratory comparing of the two. FLOSM takes it a step further with a full-on overlay of TeleAtlas data on top of OpenStreetMap data.
Automated analysis: KeepRight and OSM Inspector
The tools we’ve seen so far do not provide analysis intelligence themselves; they simply display the factual data and leave it to the user to draw conclusions. Another category of quality assurance tools takes the idea a step further and performs different spatial data quality analyses and displays the results in a map view.
German geo-IT company Geofabrik, also responsible for the Map Compare tool mentioned earlier, publishes the widely used OSM Inspector tool, that can be used to perform a range of data quality analyses on OpenStreetMap data. It can effectively visualize topology issues and common tagging errors. Input for the tool’s functionality and for extending its range of visualizations comes from the community. A recent addition requested by the Dutch community has been a visualization that shows the Dutch street data that has not been ‘touched’ since it was imported in 2007, when AND donated their street data for the Netherlands to OpenStreetMap, effectively completing the road network for the Netherlands in OpenStreetMap. This particular visualization helps Dutch OpenStreetMap contributors to establish which features have not yet been checked since they were imported. A similar tool was put in place when TIGER data from the US Census Bureau was imported into OpenStreetMap in 2008.
KeepRight takes a similar approach as OSM Inspector, analysing OpenStreetMap data for common errors and inconsistencies in the data and displaying them in a web map application.
While these tools are extremely useful for OpenStreetMap contributors looking to improve the data and correct mistakes, they are not particularly useful for visualizing quantitative data quality research outcomes, as those outcomes will be aggregated, generalized data.
For many of the ‘Crowd Quality’ indicators, I am probably going to take a grid approach: establishing quantifiable indicators for Crowd Quality and calculate them for each cell in the grid. What that grid will look like is actually also a matter of debate – it would depend on the quality indicator measured, and on the characteristics of the real world situation referenced by that grid cell.
To get an idea of what a grid visualization pertaining to quality could look like, it’s interesting to look at the visualization for the Aerial Imagery Tracing project running in the German OpenStreetMap community. A set of high resolution aerial photos was made available to OpenStreetMap, and integrated into map editing software for purposes of tracing features. Some tools were developed to assist in completing this effort; amongst those, a grid overlay visualizing the progress for each grid cell. No automated analysis is performed, rather, contributors are asked to scrutinize the grid cells themselves and rate completeness on several indicators. Although the pilot project was completed some time ago, the visualization is still online.
[Edit] This blog post goes into the technicalities of setting up a grid in PostGIS.