Priceless?
Volunteered Geographic Information
Free, Priceless Or Somewhere In Between?
This is the title that has been popping into my head since last summer. I am writing it down because it encompasses in a very general sense the themes that I want to cover in my dissertation, and thus serves me well in trying to guide me while I try to elaborate on them.
I have actually already written some paragraphs elaborating on the themes and ideas that follow, but I want to force myself to touch upon them concisely here.
Volunteered Geographic Information (VGI) is a concept that has not been around for a very long time. Geographic Information has, however: it is what maps are made out of, and what your car navigation device relies on to guide you. Traditionally, Geographic Information is collected, processed and used by professionals, but this no longer holds true: Geographic Information has undergone a process of democratization, both in the usage dimension and in the collection and processing dimension. People are now used to dealing with Geographic Information in different contexts, and have started to pool resources to collectively build repositories of Geographic Information, to facilitate the democratization of the entire ecosystem of Geographic Information.
OpenStreetMap is the most prominent of these efforts, and one in which I have been actively involved since early 2007. Since its conception in 2005, it has grown to a worldwide collaborative effort involving more than 100.000 contributors. In some regions, the maps available from OpenStreetMap are so rich and complete that they are used instead of commercially available map data.
I realize that I need to come up with some examples here, and some numbers that give an indication of how OpenStreetMap has grown, but I am on a train, blissfully disconnected from the internet, so you will just have to bear with me for now. But believe me, it’s getting big fast – at a rate that makes me worried about the validity of any quantitative research results that I might present in the context of this dissertation. But this will have to be dealt with in some future note.
Let us assume for now that OpenStreetMap – there are other VGI efforts around, and they will need to be touched upon as well – is indeed starting to occupy a significant share in the commercial market for Geographic Information. That means the OpenStreetMap data represent a commodity and as such, economic value. As OpenStreetMap data is available at no cost, this value is not quantified in the marketplace, however. This poses intriguing questions:
What is this freely available OpenStreetMap data actually worth?
How do you even begin to measure the value of something that is not subject to the usual economic market mechanisms?
When dealing with value, I believe I cannot omit the concept of quality, especially in this context. Any VGI effort relies on volunteers collecting data in their spare time. While some regions have very active communities, getting together to discuss progress and plan improvements to the map, checking and correcting each other’s contributions, other regions rely on single, isolated individuals contributing to the map – or worse: no-one contributing at all. The resulting picture is one of spotty coverage: very densely mapped regions exist side by side with tersely covered regions. More questions arise!
Is it possible to define the quality of volunteered geographic information in any satisfactory way?
How?
More generally: how do quality and value relate when dealing with geographic information?
I think I cannot proceed from here without looking at real world situations. Economic value is defined in the marketplace where supply and demand meet, and thus cannot be studied without some understanding of how and where this demand arises.
There clearly is a demand for VGI, but where does it originate?
Why would people want to use information that comes with no guarantees of completeness or even factual correctness, and that does not have a consistent quality?
I will need to get to the bottom of this. Apparently it is ‘good enough’ for some! If I’m not careful I will be entering into the domain of psychology. I think I need to stop soon, or I will have covered all domains of modern science and will have defined ample questions to last me three dissertations. But let me just finish this train of thought, and by then I will have arrived in Berlin – one of the best covered cities in OpenStreetMap, by the way; you can even get a detailed map of the zoo!
What drives the decision on the demand side to use volunteered geographic information instead of commercial offerings that do come with a quality label?
I can think of a number of reasons. Firstly, there is a growing number of application domains that do not require extensive, nationwide coverage. The growing domain of location based services are often only relevant in metropolitan areas; consider for example pedestrian and bicycle routing, social networking applications, tourist guide services or restaurant / bar recommendation applications. Even many applications in professional domains operate only within a designated metropolitan area: local police, fire brigades and other public safety professionals operate only within their metro area.
Interestingly, supply and demand sync up really nicely here: in areas where there is likely to be a great demand for high quality – whatever that may mean – geographic information, there is also likely to be a large number of contributors to volunteered geographic information repositories. (This reminds me of my master’s thesis that dealt with the quality of public transportation in rural areas. There was a similar process at play: because of the limited and geographically thinly spread demand, the costs of maintaining a reasonable quality of service had become so high that cuts in service quality had become unavoidable, lowering the demand even further. Both Dutch and German regional governments were struggling to counter this downward spiral, and I did a comparative study on the results of those efforts.)
Secondly, because there is very little restrictions and limitations in terms of how and where you can use the data. Commercial data usage licenses are more often than not restricted to a certain type of application, device or to a limited number of users or devices, and the data can only be used as-is. OpenStreetMap data can be used in almost every context imaginable, and you are free to modify and adapt the data to suit your needs.
Lastly, of course, because it’s free.
I have mixed feelings about this post. It feels unfocused, but I guess that is to be expected. More importantly, I don’t feel comfortable in the domain of economics. Sure, I did my two years of high school accounting and economics, but it did not quite take. It does not particularly interest me, but I feel I need to deal with it anyway. Intuitively, I am drawn to the question of defining and measuring quality. I want to think about how to do that, write tools to analyze OSM data – that part I am really passionate about. It seems like a good moment to talk to Henk and maybe some other people I know that could help and advise me at this junction.
So this is it!
So this is it. This is going to be my dissertation diary. I’m not going to make any commitments as to how often I will write in it; I just read that I should be spending at least 15 minutes every day on my dissertation. Every day for the next four, five, six years! Intriguing at least.
I’m at the very beginning of the process, and my thoughts are really unfocused at this point. In this first entry, I will not go into the theme itself, there will be ample opportunity for that. I would just, for a moment, like to ponder over the implications. At least four years of my life will be dedicated, at least to some degree, to researching and writing about this theme that has yet to unfold.
As I am writing this, I feel that I want to write, I like to explore my thoughts by putting them in writing, although writing in English makes it even harder for my fingers to keep up with my ever-wandering mind.
The first question that springs to mind as I embark on this diary is: should I publish it? Not the dissertation I mean, but these notes? It seems, on the one hand, pointless and vain. Who would want to read about the nitty-gritty details of my struggle towards acquiring a doctorate? Not many, probably, but there might be a reason or two to do it anyway.
Publishing my thoughts might help me overcome a feeling of awkwardness that I frequently have about this project: who am I to think I can do original, creative research? These isolated thoughts, rough outlines of a theme that I might want to pursue, seem so superficial and gratuitous! If I would just go ahead and publish my thoughts and ideas and processes – that would seem to provide some validity to them. An irrational thought maybe, but it works for me.
Publishing these notes may also invoke some sense of urgency. I know I have a tendency to keep thoughts and ideas to myself for too long, thinking they need to mature before they are ready to be shared with the world. This is an inhibition that will seriously slow me down and that I must learn to set aside. It has already happened and I have not even begun to formalize a proposal!
More than a year ago now, Henk Scholten invited me to come to the Vrije Universiteit to discuss the possibilities of him supervising my dissertation. We had a really nice and productive discussion and I felt both flattered and motivated, and told him I would write some ideas I had down for him to ingest. We would have a follow-up meeting soon.
I explored the idea for a while, discussed implications with a couple of colleagues and friends, thought about interesting themes. I think I even wrote some things down, but I did not feel any of them were good or mature enough to even put forward to Henk.
Although the though of doing a dissertation was on my mind now and then over the months that followed, I found myself glad to be distracted by other things to occupy my mind and time. And so time passed, and here we are. I feel that I want to do this more strongly now, for reasons I will explain in a future post. So I am going to write. And explore. It will be beautiful. I can be that naive.
Gemeentegrenzen uit OpenStreetMap
OpenStreetMap is de vrije wereldkaart waaraan iedereen kan bijdragen. De geodata is vrij beschikbaar volgens een Creative Commons-licentie. OpenStreetMap (OSM) bevat allang niet meer alleen straten, maar is uitgegroeid tot een veelzijdige repository van vrij beschikbare geodata. Het is alleen nog niet zo makkelijk om er uit te pakken wat je nodig hebt.
Het standaard exportformaat van OpenStreetMap is een eigen XML-formaat. Dit is met allerlei open source tools, die beschikbaar zijn via de OSM-wiki op http://wiki.openstreetmap.org of de subversion-repository op http://svn.openstreetmap.org/.
Dit artikel illustreert hoe je de actuele Nederlandse gemeentegrenzen uit de live OSM-database haalt en deze importeert in een PostGIS-database.
De database
Het startpunt voor het zoeken naar specifieke informatie in de OSM-database is de Map Features-wikipagina: http://wiki.openstreetmap.org/wiki/Map_Features. Deze pagina bevat een overzicht van alle gebruikte ‘tags’ voor objecten in de database. Gemeentegrenzen vallen onder de Administrative Boundaries: http://wiki.openstreetmap.org/wiki/Key:boundary. Hoewel de op deze pagina bijgehouden tabel met de indeling per land hier niet helemaal specifiek over is – er wordt gesproken van ‘boundaries for cities like Amsterdam but also smaller like Volendam and Lutjebroek’ – vallen de gemeentegrenzen onder admin_level=8. Op dezelfde pagina lezen we dat de modus operandi om administratieve grenzen in OSM te zetten is door gebruik te maken van ‘relations’. (OpenStreetMap kent slechts drie soorten objecten: nodes (punten), ways (lijnen) en relations (relaties tussen groepen van de andere twee types).)
Extractie
We weten nu dat we alle ‘relations’ van het type ‘admin_level=8′ willen hebben. Er zijn verschillende manieren om een dergelijke abstractie uit de live-database te maken. De ene is een actuele dump van het gewenste gebied downloaden (deze zijn beschikbaar via http://downloads.cloudmade.com/ ) en hieruit vervolgens met de command-line tool ‘osmosis’ (http://wiki.openstreetmap.org/wiki/Osmosis ) een selectie maken. Een andere manier is om gebruik te maken van de OSM Extended API (OSMXAPI, spreek uit OSM-Zappy, zie http://wiki.openstreetmap.org/wiki/Osmxapi ). De volgende URL levert dan de gemeentegrenzen op in OSM XML-formaat: www.informationfreeway.org/api/0.5/relation[admin_level=8][bbox=3.35376,50.57484,7.22095,53.51513].
Import
Het resulterende OSM-XML-bestand kun je importeren in een PostGIS-database met behulp van OSM2PGSQL: http://wiki.openstreetmap.org/wiki/Osm2pgsql.
Ervan uitgaande dat je al een spatial database hebt met de naam ‘postgis’ gaat het dan als volgt:
> osm2pgsql -H tm-sr -U postgres -W -d postgis gemeentegrenzen_081118.osm
osm2pgsql SVN version 0.55-20081118 $Rev: 10464 $
Password:
Using projection SRS 900913 (Spherical Mercator)
Setting up table: planet_osm_point
Setting up table: planet_osm_line
Setting up table: planet_osm_polygon
Setting up table: planet_osm_roads
Mid: Ram, scale=100
Reading in file: gemeentegrenzen_081118.osm
Processing: Node(110k) Way(2k) Relation(0k)
Node stats: total(110573), max(312315964)
Way stats: total(2579), max(28446793)
Relation stats: total(690), max(51805)
Writing way(0k)
Te zien is dat osm2pgsql vier tabellen aanmaakt (als deze al bestaan dan worden ze default leeggemaakt, let op dus!).
We maken ons even niet druk om spatial indexes en bekijken het resultaat:

Naschrift
Op de site van Cloudmade zijn ook ready-made shapefiles beschikbaar per land. In dit pakket zit ook een administrative shapefile, maar deze is niet goed:
Deze wat langere weg verdient dus nog steeds de voorkeur!
Overigens zijn de Nederlandse OSM-ers (waaronder ondergetekende) ook bezig met het invoegen van andere officiële en niet-officiële indelingen in de database. Denk aan COROP-gebieden, wijken en buurten, EGG-gebieden, politieregio’s, postcodegebieden en bebouwdekomgrenzen.
Note To Self: The One And Only RD Projection String
EPSG:28992, or the Dutch double stereographic RD (RijksDriehoekstelsel) projection, is quite often incompletely or just plain badly defined.
My version of MapServer for Windows (2.2.6 from september last year) states
+proj=stere +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.999908 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs no_defs <>
Which yields the following result when a native 28992 dataset is projected onto a Microsoft Virtual Earth (EPSG:900913 or EPSG:3785 as it is now called):

Note that the buildings layer on top of the VE aerial photos is shifted to the north, by about 100 metres.
Spatialreference.org has a slightly different take on EPSG:28992:
+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs
which yields an almost identical result:

These projection strings are both incomplete, because they do not take into account the datum shift that is used in the RD projection and can be approximated using the ‘towgs84′ parameter in PROJ4.
The one and only right PROJ4 projection string is
+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.999908 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +towgs84=565.2369,50.0087,465.658,-0.406857330322398,0.350732676542563,-1.8703473836068,4.0812 +no_defs <>

Links
- Explanation of the towgs84 parameter on this page
- Some discussion about the RD datum shift on the PROJ.4 mailing list
- A non-technical discourse on datum shift and coordinate systems in Dutch.
- The Dutch national survey has a website on the RD coordinate system.
- There is also a very Web 0.5 site on the RD system and NAP (Normaal Amsterdams Peil, the Dutch standard sea water level which can be observed in the Amsterdam City Hall)
The 5 minute guide to setting up GeoServer and GeoWebCache on Windows
I came across yet another tile caching implementation, GeoWebCache, through this article on the Google Open Source blog. It integrates nicely with the Geoserver OGC server, which should make it very easy to set up on a Windows box. So let’s try that. Read the rest of this entry »
The End Of Flickr?
Well, certainly not today, and certainly not soon, but the introduction of georeferenced photos on Google Maps this week will certainly rock the online photo communities’ boat. Sure, there are tons of websites overlaying flickr photos on top of a web map, and most are richer than what Google Maps currently offers.
Take for example loc.alize.us, a flickr/Google Maps mashup that has been around for a while. It offers tag filtering, user filtering, and a very nice and clean interface. To top it off, it offers a bookmarklet that integrates georeferencing into flickr.com very nicely. I still use it, although Yahoo Maps, the mapper of choice for Flickr’s mapping needs, of course, has had adequate coverage of the Netherlands for some time now.
But still.. It’s not directly ON Google Maps, which is – at least in Western Europe at this time – the ubiquitous web map. The general public will rarely discover any layer of the geographic web beyond Google Maps and Google Earth. ‘So, if I want my photos to show up on the web, I need to be on Panoramio.’ – Panoramio being the photo sharing community that has been showing off on Google Earth for as long as I can remember, and as from now on Google Maps as well. Panoramio was acquired by Google in May, 2007.
No, I don’t expect a mass flux of flickr users towards Panoramio. The latter will see a good number of new members though, and if Google remains as picky about which photos to display within Maps – I’m still confused as to where this leaves Picasa; I guess the user base is not large enough – Panoramio might become a force to be reckoned with in the online photo community universe.
OpenStreetMap Mapping Party ‘Saendelft’
The Dutch like to live in new, modern homes with a garden front and back. This leaves the country with many a suburban jungle like the one depicted below. This also means steady jobs for surveyors with the commercial mapping companies – and many a free weekend spent mapping for a Dutch OpenStreetMap contributor.
Virtual Earth Custom Tile Layers in 3D mode — not anymore.
The Microsoft Virtual Earth API lets you add your own tile layer to your VE Map. My colleague StevenO wrote about preparing a suitable TileCache setup. This used to work in both 2D and 3D map modes. Recently, Microsoft introduced the latest version of the API, 6.1, along with a major data upgrade and a new version of the 3D control. A step forward in many respects, but the tile layers will not show up in 3D mode anymore. Let’s investigate. UPDATED 080508 11pm, see below
Importing the GML 3.2.1 namespace into .NET
There comes a time for every geo-ict professional to have his first encounter with GML. Most of the time, this is not a pretty sight. Until now, I have managed to steer clear from GML when it comes to actually incorporating it into my own software. But today, this day dawned.
Benchmarking TileCache, part 1
I have been doing some benchmarking in the wake of my TileCache installation ‘endeavor’ of last week (part 1 – part 2). In a series – well, probably two – articles, I will try to provide some insight into the performance of the TileCache – Python – Apache ensemble. Read the rest of this entry »
