Maps of Standards


Listening to a report about the Trans Siberian Railroad on the radio this morning, my mind drifted to rail gauges, and maps of standards in general.  Here are two that have been of personal interest to me for very different reasons. The first one because I am a rail enthusiast, the second because I used to travel quite a bit within Europe where, in spite of all the European Unionism, there are a lot of AC plug standards.

Some variegated questions for the audience:

  • Is there any correlation between these two standards distributions?
  • Should this type of information be mapped in OpenStreetMap?
  • What’s your favorite map of standards?
Map showing rail gauges in use around the world

Map showing rail gauges in use around the world. Source: Wikipedia

Map showing AC power plug standards in use around the world

Map showing AC power plug standards in use around the world. Source: Wikipedia

OpenStreetMap Data Temperature – How It Was Done (And Do It Yourself!)


In my talk at State Of The Map 2011, I introduced a few new concepts relating to the local quality of OpenStreetMap data and its contributor community:

  • The Community Scorecard which is a concise way to summarize the activity of a local OpenStreetMap community
  • Data Temperature attempts to capture the ‘warmth’ of a local community in one easily interpretable figure
  • The Temperature Map captures some of the most relevant metrics for way features, the age of the last update and the number of versions, in an abstracted image of the study area.

To generate the metrics and the geodata needed, I used Jochen Topf’s osmium framework. Because I am not a skilled C++ developer, I employed the osmjs Javascript interface to the framework to parse and analyze the city-sized OpenStreetMap files. After some delay, I have published the osmjs script together with some supporting files. Feel free to use it to do your own analyses, and please improve and build on it and share back. I’d love to see this script evolve into a more general purpose quality analysis tool.

If you missed my session at SOTM11, here‘s its page on the OpenStreetMap wiki, with links to all relevant resources.

Tutorial: Creating buffered country POLYs for OpenStreetMap data processing


OpenStreetMap represents a lot of data. If you want to import the entire planet into a PostGIS database using osmosis, you need at least 300GB of hard disk space and, depending on how much you spent on fast processors and (more importantly) memory, a lot of patience. Chances are that you are interested in only a tiny part of the world, either to generate a map or do some data analysis. There’s several ways to get bite-sized chunks of the planet – take a look at the various planet mirrors or the cool new Extract-o-tron tool – but sometimes you may want something custom. For the data temperature analysis I did for State of the Map, I wanted city-sized extracts using a small buffer around the city border. If you want to do something similar – or are just interested in how to do basic geoprocessing on a vector file – this tutorial may be of interest to you. Instead of city borders, which I created myself from the excellent Zillow neighborhood boundary dataset, I will show you how to create a suitably generalized OSM POLY file (the de facto standard for describing polygon extracts used by various OSM tools) that is appropriate for extracting a country from the OSM planet with a nice buffer around it.

Let’s get to work.

Preparing Quantum GIS

We will need to add a plugin that allows us to export any polygon from your QGIS desktop as an OSM POLY file. We can get that OSM POLY export plugin for Quantum GIS here.

Unzip the downloaded file and copy the resulting folder into the Python plugins folder. On Windows, if you used the OSGeo installer, that might be

C:\OSGeo4W\apps\qgis\python\plugins

See here for hints where it may be for you.

The plugin should now appear in the Quantum GIS plugin manager (Plugins > Manage plugins…).
If it is not selected, do that now and exit the plugin manager.

Getting Country Borders

Easy. Download world borders from http://thematicmapping.org/downloads/world_borders.php

Unzip the downloaded file and open it in QGIS:

Geoprocessing step 1: Query

Open the Layer Query dialog by either right-clicking on the layer name or selecting Query… from the Layer menu with the TM_WORLD_BORDERS-0.3 layer selected (active).

Type “ISO2” = “US” in the SQL where clause field and run the query by clicking OK.

Geoprocessing step 2: Buffering

The next step is to create a new polygon representing a buffer around an existing polygon. Because we already queried for the polygon(s) we want to buffer, there’s no need to select anything in the map view. Just make sure the TM_WORLD_BORDERS-0.3 layer is active and select Vector > Geoprocessing Tools > Buffer(s):

Make sure the input vector layer is TM_WORLD_BORDERS-0.3. Only the query will be affected, so we’re operating on a single country and not the entire world.

For Buffer distance, type 1. This is in map units. Because our source borders file is in EPSG:4326, this corresponds to 1 degree which is 69 miles (for the longitudinal axis, that measurement is only valid at the equator and decreases towards the poles). This is a nice size buffer for a country, you may want something larger or smaller depending on the size of the country and what you want to accomplish, so play around with the figure and compare results. Of course, if your map projection is not EPSG:4326, your map units may not be degrees and you should probably be entering much bigger values.

Select a path and filename for the output shapefile. Do not select ‘Dissolve buffer results’. The rest can be left at the default values. Push OK to run the buffer calculation. This can take a little while and the progress bar won’t move. Then you see:

Click Yes. Now we have a buffer polygon based on the US national border:

Geoprocessing step 3: Generalizing

We’re almost done, but the buffer we generated contains a lot of points, which will make the process of cutting a planet file slow. So we’re going to simplify the polygon some. This is also a QGIS built-in function.

Select Vector > Geometry tools > Simplify geometries:

Make sure your buffer layer is selected as the input. Set 0.1 (again, this is in map units) as the Simplify tolerance. This defines by how much the input features will be simplified, the higher this number, the more simplification.

Select a destination for the simplified buffer to be saved. Also select Add result to canvas. Click OK:

This dialog may not seem very promising, but it has worked. Also, I have sometimes gotten an error message after this process completes. Ignore these if you get them.

Geoprocessing step 4: resolving multipolygons

Now, if your simplified country border consists of multiple polygons (as is the case with the US) we have a slight problem. The POLY export plugin does not support multipolygons, so we need to break the multipolygon into single polygons. And even then, we will need to do some manual work if we want OSM .poly files for all the polygons. This is because the plugin relies on unique string attribute values to create different  POLY files, and we do not have those because the polygons we are using are all split from the same multipolygon. So we need to either create a new attribute field and manually enter unique string values in it, or select and export the parts to POLY files one by one and rename the files before they get overwritten.

Finale: Export as POLY

I am going to be lazy here and assume I will only need the contiguous US, so I select the corresponding polygon. After that I invoke the plugin by selecting Plugins > Export OSM Poly > Export to OSM Poly(s):

The plugin will show a list of all the fields that have string values. Select ISO2 and click Yes. Next you will need to select a destination folder for your exported POLY files. Pick or create one and push OK.

This is it! Your POLY files are finished and ready to be used in Osmosis, osmchange and other tools that use it for data processing.

By the way: you can’t load POLY files into JOSM directly, but there’s a perl script to convert POLY files to OSM files that I used in order to visualize the result.

Taking the Temperature of local OpenStreetMap Communities


In my recent talk at the State Of The Map 2011 conference, I introduced the idea of Data Temperature for local OpenStreetMap data. In this blog post, I want to follow up on that, because my talk was only twenty minutes, covered a broader theme and thus did not allow me to elaborate on it much.

Let me first show a slide from the talk to give you an idea – or a reminder, if you were in the audience – of what we are talking about. (The complete talk is available as a streaming video, and the slides are available as well. I linked to both in my previous blog post).

Community activity visualized

Let’s break this image down before I explain how I arrived at the temperature of 77 degrees for San Francisco. The blob of red, orange, yellow, green and gray is an abstracted map of the city, displaying only the linear OpenStreetMap features for that city. The features are styled so that two salient characteristics of community activity become the defining visual elements: number of versions and date of last update. The number of versions, indicating how many updates a feature has received since its creation, is visualized using line thickness, a thicker line style indicating a feature that has seen more community updates. The time passed since a feature last saw an update from the community is visualized using a gray-red-green color gradient, where gray represents features that have not seen an update in two years or more, while green represents linear features that were ‘touched’ in the last month.

The result is an unusually abstracted view of a city – but one that I hope helps to convey a sense of local community activity. In my talk I argued that communities sharing local knowledge are what sets OpenStreetMap apart from other geodata resources. It is what sets Warm Geography apart from Cold Geography.

 Data Temperature

For my talk, I wanted to take the Warm vs. Cold analogy one step further and devise a simple method to calculate the Data Temperature for a city or region. To do this, I decided I needed a little more information than just the version count and the time passed since the last update. Thinking about the US situation, I gathered that the TIGER import could provide me with some salient community statistics; the TIGER data is a reference point from which we can measure how much effort the local communities – or individual mappers – have shown to fix the imported data, enrich it and bring it up to date.

From the current planet file that I used because of resource constraints, you can only derive a limited understanding of the historical development of the OpenStreetMap data. For example, it is not possible to see how many unique users have been involved in contributing to the data for an area, only who have been involved in the current versions of the data. For individual features, it is not possible to determine their age. Features that have been deleted are not accessible. So all in all, by operating on the current planet file, we have a pretty limited field of view. When resources allow, I want to work more with the full history planet that has been available for a while now, and for which a tool set is starting to emerge thanks to the great work of Peter Körner, Jochen Topf and others.

These limitations being what they are, we can still derive useful metrics from the planet data. I devised a ‘Community Score Card’ for local OpenStreetMap data that incorporates the following metrics:

  • The percentage of users responsible for 95% of the current data. This metric tells us a lot about the skew in contributions, a phenomenon that has received considerable attention in the OSS domain1 and is apparent in OpenStreetMap as well. The less skewed the contribution graph is, the healthier I consider the local community. Less skew means that there are more people putting in significant effort mapping their neighborhood. For the cities I looked at, this figure ranged form 5% to 26%. I have to add that this metric loses much of its expressiveness when the absolute number of contributers is low, something I did not take into account in this initial iteration.
  • The percentage of untouched TIGER roads. This metric provides some insight into how involved the local mappers are overall – TIGER data needs cleanup, so untouched TIGER data is always a sign of neglect. Also, it gives an idea of how well the local mappers community covers the area geographically. For the cities I looked into for the talk, this figure ranged from 4% in Salt Lake City (yay!) to a whopping 80% in Louisville, Kentucky.
  • The average version increase over TIGER. This simple metric overlaps somewhat with the previous one, but also provides additional insight into the amount of effort that has gone into local improvements of the imported TIGER road network.
  • The percentage of features that has been edited in the last three months and in the last year. This is the only temporal metric that is part of the Community Score Card. For a more in-depth temporal and historical analysis of OpenStreetMap data, we need to look at the full history planet file, which for this first iteration I did not do. Even so, these two metrics provide an idea of the current activity of the local community. It does not tell us anything about the historical arguments that might be able to explain that activity or lack thereof, however. For example, the local community may have been really active up to a year ago, leaving the map fairly complete, which might explain a diminished activity since. For our purpose though, these simple metrics do a pretty good job quantifying community activity.

I applied a simple weighing to these metrics to arrive at the figure for the data temperature, and there’s really not much wisdom that went in that. My goal was to arrive at a temperature that would be conducive to conveying a sense of community activity, and would show a good range for the cities I analyzed for the talk. In a next iteration, I will attempt to arrive at a somewhat more scientifically sound approach.

The weighing factors are as follows:

  • Percentage of users responsible for 95% of the current data: 30
  • Percentage untouched TIGER roads: -30
  • Average version increase over TIGER road: 5
  • Percentage features edited in the last 3 months: 50
  • Percentage features edited in the last year: 40

I rounded the results to the nearest integer and added them to a base temperature of 32 degrees (freezing point of water on the Fahrenheit scale) to arrive at the final figure for the Data Temperature.

Visualization Is Hard

Looking at the Community Score Cards for the various cities I analyzed for the talk, and comparing them to the abstract maps representing the way versions and time since last edit, you will notice that the maps seem to paint a different picture than the Score Cards. Take a look at the San Francisco map and Score Card above, and compare that to the Baltimore one below.

 We see that while Baltimore’s data is much ‘cooler’ at 59 degrees than San Francisco’s at 77, the Baltimore map looks quite promising for community activity. I can give a few explanations for this. (We are really getting into the visualization aspect of this, but I believe that is a very important dimension of conveying a concept as fluid as Data Temperature.) Firstly, the color defines this map visualization in a more fundamental way than the line thickness. The ‘green-ness’ of the Baltimore map leads us to believe that all is well, even though it is just one element of the Community Score Card. Moreover, not all elements of the Score Card are even represented in the visualization: untouched TIGER roads are pushed to the background by the thicker lines representing roads that did receive community attention. Lastly, scale plays a role in obfuscating differences. To fit the different cities in the same slide template, I had to vary the scale of the maps considerably. Because the line thickness is defined in absolute values and not dependent on map scale, the result can be somewhat deceiving.

Conclusion

I believe that this first attempt at a Data Temperature for local OpenStreetMap data, and its accompanying map visualizations, served its purpose well. The talk was well received and inspired interesting discussions. It set the stage for a broader discussion I want to have within the OpenStreetMap community about leveraging sentiments of recognition and achievement within local communities in order to help those communities grow and thrive.

There is a whole range of improvements to this initial iteration of the Data Temperature concept that I want to pursue, though. Most importantly, I want to use the full history of contributions instead of the current planet file. This will allow me to incorporate historical development of the map data as a whole and of individual contributor profiles. Also, I want to improve the Score Card with more characteristics, looking into the quality of the contributions as well as the quantity. Lastly, I want to improve the map visualizations to more accurately represent the Data Temperature.

I will post more Data Temperature visualizations and Score Cards when I think of an efficient way to do so. I collected data for some 150 US cities based on OpenStreetMap data from August 2011. If you would like a particular city posted first, let me know. Also, if you would like to know more about the tools and methods involved in preparing and processing data, I am willing to do a blog post expanding on those topics a bit.

1. J. Lerner and J. Tirole, “Some simple economics of open source,” Journal of Industrial Economics (2002): 197–234.

Insert Coin To Play – my talk at State Of The Map 2011


UPDATE: I have re-uploaded the slides to Slideshare with the correct Community Score Cards — see the comments.

Phew! State Of The Map is over and it was so very good. In spite of the growth of OpenStreetMap and the conference, the grassroots spirit is still very much there, and that is what OpenStreetMap is about if you ask me.

Below are the slides from my talk. It was also recorded by Toby Murray, thanks Toby for all your effort recording and live-streaming several talks from your mobile phone! I am writing some more background about the data temperature concept that I introduced. I hope to publish that here before FOSS4G starts. I will be recycling that talk there, with a different title and some small tweaks — so if you have already seen it, you may want to plan your personal FOSS4G itinerary around me 😉

OpenStreetMap and Warm vs. Cold Geography


Denver, The Mile High City in Colorado, USA, will be the stage for two of the most prominent conferences in the domain of open geospatial software and data, FOSS4G and SOTM. I will be speaking at both conferences. The community is gearing up for these important and fun events, preparing their talks and discussing the hot topics of the conferences.

For me, the direction in which OpenStreetMap will be heading over the next few years is one of those main topics. I will address this topic myself from the angle of contributor motivation. Specifically, I will address the challenge of addressing the extremely high churn rate that OpenStreetMap is coping with — less than one tenth of everyone who ever created an OpenStreetMap account continue to become active contributors. I will investigate which tactics from the gaming domain OpenStreetMap could use — tactics that have made Foursquare and StackOverflow so successful.

OpenStreetMap needs those flesh and blood contributors, because it is ‘Warm Geography’ at its core: real people mapping what is important to them — as opposed to the ‘Cold Geography’ of the thematic geodata churned out by the national mapping agencies and commercial street data providers; data that is governed by volumes of specifications and elaborate QA rules.

Don’t get me wrong — I am not denouncing these authoritative geodata sources. They have their mandate, and an increasing amount of authoritative, high quality geodata is now freely (beer and speech) available to the public — and I like to think that OpenStreetMap’s success played some part in this.

However, OpenStreetMap occupies its very own special niche in the domain of geodata with its Warm Geography — real people contributing their local knowledge of the world around them.

Will OpenStreetMap retain that unique position?

OpenStreetMap has grown very rapidly over the last three years, both in number of contributors and in data volume. For some countries and regions, OpenStreetMap is considered more complete and / or of higher quality than any other available data source. However much or little truth there may be in such a statement, the fact of the matter is that the relation between OpenStreetMap and authoritative or commercial geodata sources is being reconsidered.

On the one hand, VGI (Volunteered Geographic Information, the widely accepted misnomer that covers a wide range of geospatial information resources that consist of contributions by non-professionals) techniques are introduced into the data collection processes of traditional producers. The degree of voluntarism varies — on one end of the spectrum, we see PND giant TomTom using data collected from their users to trigger better targeted road data updates; on the other end, the VGI experiments conducted by the USGS to help improve the National Map. In either case, we see the intrusion of crowdsourcing techniques into the traditionally closed domains of authoritative and commercial geospatial information.

On the other hand, we see authoritative and other Cold data being integrated into OpenStreetMap, by way of imports on various scales — some local, others covering entire nations.

I have a strong ambivalence towards data imports into OpenStreetMap. I have seen how they can spark and nurture the OpenStreetMap community in the regions affected — but I also envisage how detrimental they could be to OpenStreetMap as a whole. This touches on the very nature of OpenStreetMap, and I reiterate: OpenStreetMap is Warm Geography — real people contributing their local knowledge of the world around them. Will OpenStreetMap claim that particular space the geospatial information landscape? Does it even want to? Or will it manifest itself as an open data repository in which Cold and Warm Geography are mixed together resulting in something Lukewarm?

Warm and Cold Geography in OpenStreetMap side by side. Green lines, prevalent in Des Moines, mark imported TIGER data that has been reviewed by human OpenStreetMap contribuotors. Red data, overwhelmingly present in Omaha, a two hour drive to the east, marks dead TIGER data that has been sitting in OSM, untouched, since 2007.

One could argue that the large scale imports OpenStreetMap has already seen have probably helped spark the local efforts to improve the data and the community. This may be true — although it has never really been researched and you can easily make a counterargument with Germany as your case — but I believe we need to be looking at the relation between imports / authoritative data and the community, and OpenStreetMap should do so before imports are being done or even considered on an individual basis. Let me give two examples as food for thought to wrap this up.

The TIGER line data import in the US is one I deem useful and productive. It’s a low quality data source in terms of geometry, but pretty good (as far as I can tell) in terms of metadata. Also, it represents features that are easily recognizable in the field and map to our real world experience with ease: it’s the streets that we drive, walk, bike, address our mail to, and input in our satnav devices. Importing that data provided initial OpenStreetMap content with two key properties that are essential for an import to ‘work’ in the sense that it serves the interests of the data provider, the OpenStreetMap community and OpenStreetMap as an entity.

First, it provides initial map content that serves to make OpenStreetMap less scary for both aspiring volunteer mappers and potential professional users. An empty map is not attractive for casual mappers — a category that will become more important in the future, even if right now the majority of mapping is done by a small number of mappers in most regions (read my observations on churn rate in OpenStreetMap in a related post).

Second, it provides content that constitutes a low barrier for improving on it. The coarse, quirky geometry of the TIGER line segments can be easily fixed, especially now that nationwide high resolution imagery from BING is available as a backdrop for the OpenStreetMap editors. As a novice to OpenStreetMap, you have an easy way in: just start by fixing some streets in Potlatch2, couldn’t be easier. Having the instant gratification of seeing your contributions on the main map almost instantly helps motivate the novice to continue to contribute more. This process, I believe, would be further aided if OpenStreetMap would have more elements from gaming — competition, scoring, achievements, awards — but that will be the topic of my talk at SOTM ;).

In contrast, let me address my concerns with the land use import done recently in the Netherlands — see the current border of that import here to do a before/after comparison. This is authoritative data sourced from the national mapping agency via a loophole (the actual source dataset, Top10Vector, is not open data but a derived product was deemed to have a license compatible with OpenStreetMap’s). Its addition to the OpenStreetMap data body surely makes for a map that is very appealing visually, but it does not meet both the requirements for it to be warm, living data in OpenStreetMap. Land use data, in contrast to street data, is much more abstract and also much more difficult to survey, especially if you’re an OpenStreetMap novice. On top of that, it’s easy to break things. I can easily see how an OpenStreetMap contributor, even if he’s not a novice, would be daunted by an editor view that looked like this (see inset).

The OpenStreetMap web-based editor Potlatch centered on an area with almost exclusively imported data north of Amsterdam, The Netherlands.

Mind you that this is almost 100% imported data.

So to conclude: I am not against mixing and matching authoritative data and OpenStreetMap, but I firmly believe the distinction between mixing and matching should be heeded more carefully than it has been in some cases. Don’t mix when it’s not going to be mutually beneficial, keeping in mind the requirements I laid out in the cases I described just now. This is necessary for OpenStreetMap to retain its unique position in the geodata domain as a Warm Resource, and that is the only way it will survive in the longer run.

I have been invited to discuss the topic of OpenStreetMap versus Authoritative Data in a panel organized by Eric Wolf of the US Geological Survey at the upcoming State Of The Map conference in Denver. If you’re interested in this topic and would like a broader view on it, I invite you to attend. It takes place on Sunday, Sep. 11th around noon. See you in Denver!

UTA GTFS transit data into SQLite


I have landed in Salt Lake City. My wife and I are transitioning our way into my new life here slowly by slowly. As we start to get settled down here in the midst of the hottest part of the summer, I start picking up my geo habits. They prove really hard to root out. As a mass transit afficionado, my first professional inquiry is into the local public transportation system, provided by the Utah Transit Authority. I wrote up some initial experiences and thoughts on this topic. I published these on my personal blog for reasons unkown even to myself.

Walking home from the awesome Mariott Library where I spent most of the day working, I was thinking about an important dimension I found lacking in the Salt Lake public transportation offering: rider information. The trip planner leaves a lot to be desired, there is almost no information to be found at the bus stops, and there seems to be no real location-aware smartphone application (there is a mobile web site). I did find a simple iPhone application that gives you the static schedules for the TRAX and FrontRunner trains, nothing to get excited about. What I want is an app that shows me the next trains and buses – with departure times, routes and destinations – from a pre-selected stop, or from any stop within a certain number of feet from my location. This is not hard to do if you have the trip and stop data.

This is where the GTFS feed that UTA provides comes in! In this post, I will explore the first baby steps from the raw data to an iOS app: getting the data in a suitable database format for the iOS app to consume. I seem to remember – my iOS coding skills are fading fast I have the feeling… – that SQLite is the weapon of choice for iOS structured data storage and retrieval. So here goes. I guess SQLite will be a workable solution for Android and Windows Phone as well, I just don’t know. So mobile coders, get to work or I’ll have to do it myself.

Download the data here: http://204.126.146.53/googletransit/google_transit.zip http://www.gtfs-data-exchange.com/agency/utah-transit-authority/

Unzip the file. You will get a directory google_transit with the following files.

calendar_dates.txt
calendar.txt
stops.txt
shapes.txt
routes.txt
trips.txt
agency.txt
stop_times.txt

To enable a smooth and successful import into an SQLite database, we need to remove all double quote characters from each of these files. Use sed or tr or whatever suits your needs, I used BBEdit 🙂 – as long as they’re all gone.

Next, we create the SQLite database.

Create a text file with following content:

CREATE TABLE agency (agency_phone TEXT, agency_url TEXT, agency_id INTEGER, agency_name TEXT,  agency_timezone TEXT, agency_lang TEXT);
CREATE TABLE stops (stop_lat REAL, stop_code INTEGER, stop_lon REAL, stop_id INTEGER, stop_url TEXT, parent_station INTEGER , stop_desc TEXT , stop_name TEXT, location_type INTEGER, zone_id INTEGER);
CREATE TABLE routes (route_long_name TEXT,route_type INTEGER,route_text_color TEXT,route_color TEXT,agency_id INTEGER,route_id INTEGER,route_url TEXT,route_desc TEXT, route_short_name TEXT);
CREATE TABLE trips (block_id INTEGER,route_id INTEGER,direction_id INTEGER,trip_headsign TEXT,shape_id INTEGER,service_id INTEGER,trip_id INTEGER);
CREATE TABLE stop_times (trip_id INTEGER, arrival_time TEXT, departure_time TEXT, stop_id INTEGER, stop_sequence INTEGER, stop_headsign TEXT, pickup_type INTEGER, drop_off_type INTEGER, shape_dist_traveled REAL);
CREATE TABLE calendar (service_id INTEGER,start_date TEXT,end_date TEXT, monday INTEGER,tuesday INTEGER,wednesday INTEGER,thursday INTEGER,friday INTEGER,saturday INTEGER,sunday INTEGER);
CREATE TABLE calendar_dates (service_id INTEGER, date TEXT, exception_type INTEGER);
CREATE TABLE shapes (shape_id INTEGER, shape_pt_lat REAL, shape_pt_lon REAL, shape_pt_sequence INTEGER, shape_dist_traveled REAL);
.separator ,
.import calendar.txt calendar
.import agency.txt agency
.import calendar_dates.txt calendar_dates
.import routes.txt routes
.import shapes.txt shapes
.import stop_times.txt stop_times
.import stops.txt stops
.import trips.txt trips

Important: The order of the fields in the GTFS dump tends to change. Make sure the order of the fields in the CREATE statements match the order in the GTFS files. These CREATE statements work for the utah-transit-authority_20111208_0426 file.

Save this file as create_fill_gtfs_tables.sql in the directory where the unzipped GTFS text files reside.

Next, we’re going to create the SQLite3 database and load the data into the tables.

mvexel> sqlite3 utagtfs.db < create_fill_gtfs_tables.sql

Test the result:

mvexel> sqlite utagtfs.db
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM stops WHERE stop_name LIKE '%3rd%';
stop_id     stop_code   stop_name        stop_desc         stop_lat    stop_lon     zone_id     stop_url    location_type  parent_station
----------  ----------  ---------------  ----------------  ----------  -----------  ----------  ----------  -------------  --------------
4170        118073      3RD AVE @ 203 E  E 3RD AVE & A ST  40.772882   -111.883441                          0
12651       118023      3RD AVE @ 402 E  E 3RD AVE & E ST  40.772807   -111.877229                          0
12652       118024      3RD AVE @ 488 E  E 3RD AVE & G ST  40.772816   -111.874724                          0
12653       118025      3RD AVE @ 580 E  E 3RD AVE & I ST  40.772818   -111.87192                           0
12654       118026      3RD AVE @ 686 E  E 3RD AVE & K ST  40.772818   -111.868838                          0
12655       118179      3RD AVE @ 788 E  E 3RD AVE & M ST  40.772802   -111.865804                          0
12656       118028      3RD AVE @ 848 E  E 3RD AVE & N ST  40.772793   -111.864359                          0
12657       118029      3RD AVE @ 948 E  E 3RD AVE & P ST  40.772748   -111.861256                          0
12658       118030      3RD AVE @ 1048   E 3RD AVE & R ST  40.77278    -111.858267                          0
12659       118031      3RD AVE @ 1136   E 3RD AVE & T ST  40.772784   -111.855372                          0
12660       118180      3RD AVE @ 1232   E 3RD AVE & VIRG  40.772791   -111.852561                          0
13455       118042      3RD AVE @ 1153   E 3RD AVE & T ST  40.772912   -111.854728                          0
13456       118043      3RD AVE @ 1039   E 3RD AVE & R ST  40.772916   -111.858361                          0
13457       118044      3RD AVE @ 953 E  E 3RD AVE & P ST  40.772917   -111.860875                          0
13458       118045      3RD AVE @ 855 E  E 3RD AVE & N ST  40.772891   -111.863713                          0
13459       118046      3RD AVE @ 805 E  E 3RD AVE & M ST  40.772915   -111.86534                           0
13460       118047      3RD AVE @ 705 E  E 3RD AVE & K ST  40.773093   -111.868189                          0
13461       118048      3RD AVE @ 611 E  E 3RD AVE & I ST  40.773017   -111.871066                          0
13463       118050      3RD AVE @ 453 E  453 E 3RD AVE &   40.773017   -111.875754                          0
15231       118074      3RD AVE @ 119 E  E 3RD AVE         40.772885   -111.886642                          0
20435       118039      3RD AVE @ 505 E  E 3RD AVE & G ST  40.772906   -111.874281                          0

Nice! Now figure out some neat queries to do the ‘What’s the next bus downtown I can take from within 1000ft of my current location?’ dance and make that WhatTheUTA app rock! I’ll buy whoever comes up with a working app or mobile web site first a 3.2% beer.