Why VGI needs a new name

In this day and age, no-one can afford to organize a conference that has anything, even remotely, to do with geographic information and not have a track, a workshop or at the very least a few talks on VGI. The scientific community is on it. The social web people are on it. Everyone agrees: VGI is an important emerging topic, and we are on the case.

But on which case, exactly? The acronym VGI – Volunteered Geographic Information – attempts to be a blanket description for something, and still it manages to exclude so much. It is so imprecise and incomplete that I am surprised – no, concerned! that we’re not seeing more anti-gravitation away from the term VGI and towards a more descriptive nomenclature for this expanding domain. I did have a chance to rant about it for a few days with some like-minded geo people at GIScience 2010 – which, by the way, had a VGI workshop and some talks on the topic – but I feel a strong urge now to take it further and systematically scratch this itch I have been feeling about VGI for a long time now. We – both scientists and people personally or professionally involved in the field – need a descriptive breakdown of this now temprarily unnamed domain.

Let’s try to identify the issues with the term VGI by taking it one letter at a time.

V is for Volunteered

Volunteering is the act of consciously offering something of your own volition. In the context of geographic information, this covers traditional (sic) community mapping platforms like OpenStreetMap. Dubbed ‘The Wiki World Map’, a significant portion of the information collected in this fast-growing open geodata platform is contributed by its users. They have all registered on the web site, chosen a user name and password and are volunteering their local knowledge through the editing tools made available to them. All contributions are conscious acts of transferring information from the individual to the collective information platform and thus can actually be considered volunteered geographic information.

If only the world were so simple. Take Waze, the social navigation and traffic platform. When you download the Waze app onto your smartphone and run it, it behaves pretty much like your garden variety turn-by-turn navigation system. You can actively report traffic conditions that will be shared with other users, but Waze will also automatically collect information while you drive around with the app running. Waze derives live traffic information and map corrections and additions from your position updates that are sent back to Waze continuously and without user intervention. While you, as a Waze user, might make a conscious decision to take part in a social platform when you decide to download and start using the app, the location updates you continuously send to improve the Waze platform can hardly be called volunteered. At the very least, the term starts to cause confusion.

This confusion is bound to be even greater when you consider a more closed variety of the same ‘social traffic’ idea: TomTom MapShare. MapShare basically does a similar thing, collecting positional updates continuously and feeding those updates back to the MapShare platform. Rather than using a live feedback loop, MapShare uses the location updates – two trillion of them so far – to flag potential map errors. This helps TomTom to update their maps faster and more reliably and offer those paid map updates to those very same users that shared their location updates for free. This puts the question of whether or not the users volunteer geographic information into yet another perspective. Volunteering implies some degree of mutual benefit, or at least contributing to a societal or communal benefit. When that benefit can only be enjoyed at an additional cost, isvolunteering still the appropriate term?

G is for Geographic

The previous examples, OpenStreetMap, Waze, TomTom MapShare, all deal with contributions that are unambiguously geographic. The contributions users make to those platforms, volunteered or no, explicitly contain a coordinate or two (or more). In the larger domain of social information live many instances of citizen-contributed information that was most likely  not consciously intended to be geographically interpreted – but it still is. The pretty geo-visualization to your right is a fine example. It is a visualization of tweets like “Just landed in SFO”. When someone send out a tweet like that, spatial information is implicit rather than explicit. Still, the massive volume of information shared though this social platform, combined with a certain degree of predictability of the human mind, can be leveraged to visualize worldwide air travel patterns with relatively little effort.

Even more implicitly geographic information is contained in generic unstructured text, like the millions of updates and stories shared through blogging platforms like Blogger, WordPress and Tumblr. The geography of this information may be more of a challenge, it is hardly impossible. Yahoo’s Placemaker service even offers this ‘geoparsing’ through a simple API available to anyone, free to use. There are many language-specific challenges and doing geoparsing well is far from straightforward, but the salient point is that there is a lot of information shared through social platforms that is not explicitly geographic, but still can be interpreted as such and thus is part of the still nameless domain that some used to call VGI.

I is for Information

Really? Do I want to go here? Do I want to question whether the Things contributed by citizens to social information systems constitute Information? Well – No and yes. Considering this topic from a top-down perspective like we are doing now, the whole of Things shared through social platforms does undoubtedly constitute information. From an individual citizen’s perspective things are, again, not so clear cut. When I actively contribute to OpenStreetMap, I am well aware that I am contributing to a body of information that can be used by others for their own purposes. When I share an update through Twitter or Facebook, this is less straightforward. For all intents and purposes Twitter is a platform of transient messages, and when I post an update I perceive it as such, not consciously thinking about the Library of Congress archiving every single tweet for future reference. My guess is most Twitter users don’t even know to what extent their tweets are archived and the information content re-used.

Things become even less clear when we consider passively and / or unconsciously shared updates. Think driving over an induction loop in the road, being recorded by one of millions of traffic and security cameras, any credit card payment you make – heck, even every Google search you do, you’re not only getting something, you’re giving something, too. That something is information – and it’s valuable information, too.  And a lot of those passively, unconsciously offered infinitesimal nuggets of semi-private or anonymous data have, or can be linked to, a location.

Intent and Spatiality

There’s a lot of  geodata hidden in the white noise of data collected in the Internet of Things – and it can be (re-)used for many different purposes: research, spatial planning, policy making, art. To make a good assessment of what data holds the right information content for those different purposes, we have to consider the intent – or lack thereof – with which is was collected. VGI, as a blanket description of this rich domain, does not allow us to do that. We need a descriptive context of spatially interpretable crowd data that exposes the diversity of this domain. I propose a definition along sliding scales of Intent and Spatiality:

So what does this way of classifying spatially interpretable crowd data give us?

  1. It rids the general public of the misconception that all data in this domain is volunteered, explicitlygeographic, or even explicitly information.
  2. It provides the GIScience community with a definition context for the domain, of which it is in dire need.
  3. Potential and actual professional users of crowd data

So I implore you: no more mention of VGI. Please. I mean, Spatially Interpretable Crowd Data is not exactly hot, but at least it does a better job at covering more bases. It is a rich domain and new applications for it emerge every day, as the user-generated information current of the social web continues to gain momentum and grows into a torrent in which it becomes exceedingly hard to make out which is which.

Justin is my friend.