UTA GTFS transit data into SQLite


I have landed in Salt Lake City. My wife and I are transitioning our way into my new life here slowly by slowly. As we start to get settled down here in the midst of the hottest part of the summer, I start picking up my geo habits. They prove really hard to root out. As a mass transit afficionado, my first professional inquiry is into the local public transportation system, provided by the Utah Transit Authority. I wrote up some initial experiences and thoughts on this topic. I published these on my personal blog for reasons unkown even to myself.

Walking home from the awesome Mariott Library where I spent most of the day working, I was thinking about an important dimension I found lacking in the Salt Lake public transportation offering: rider information. The trip planner leaves a lot to be desired, there is almost no information to be found at the bus stops, and there seems to be no real location-aware smartphone application (there is a mobile web site). I did find a simple iPhone application that gives you the static schedules for the TRAX and FrontRunner trains, nothing to get excited about. What I want is an app that shows me the next trains and buses – with departure times, routes and destinations – from a pre-selected stop, or from any stop within a certain number of feet from my location. This is not hard to do if you have the trip and stop data.

This is where the GTFS feed that UTA provides comes in! In this post, I will explore the first baby steps from the raw data to an iOS app: getting the data in a suitable database format for the iOS app to consume. I seem to remember – my iOS coding skills are fading fast I have the feeling… – that SQLite is the weapon of choice for iOS structured data storage and retrieval. So here goes. I guess SQLite will be a workable solution for Android and Windows Phone as well, I just don’t know. So mobile coders, get to work or I’ll have to do it myself.

Download the data here: http://204.126.146.53/googletransit/google_transit.zip http://www.gtfs-data-exchange.com/agency/utah-transit-authority/

Unzip the file. You will get a directory google_transit with the following files.

calendar_dates.txt
calendar.txt
stops.txt
shapes.txt
routes.txt
trips.txt
agency.txt
stop_times.txt

To enable a smooth and successful import into an SQLite database, we need to remove all double quote characters from each of these files. Use sed or tr or whatever suits your needs, I used BBEdit🙂 – as long as they’re all gone.

Next, we create the SQLite database.

Create a text file with following content:

CREATE TABLE agency (agency_phone TEXT, agency_url TEXT, agency_id INTEGER, agency_name TEXT,  agency_timezone TEXT, agency_lang TEXT);
CREATE TABLE stops (stop_lat REAL, stop_code INTEGER, stop_lon REAL, stop_id INTEGER, stop_url TEXT, parent_station INTEGER , stop_desc TEXT , stop_name TEXT, location_type INTEGER, zone_id INTEGER);
CREATE TABLE routes (route_long_name TEXT,route_type INTEGER,route_text_color TEXT,route_color TEXT,agency_id INTEGER,route_id INTEGER,route_url TEXT,route_desc TEXT, route_short_name TEXT);
CREATE TABLE trips (block_id INTEGER,route_id INTEGER,direction_id INTEGER,trip_headsign TEXT,shape_id INTEGER,service_id INTEGER,trip_id INTEGER);
CREATE TABLE stop_times (trip_id INTEGER, arrival_time TEXT, departure_time TEXT, stop_id INTEGER, stop_sequence INTEGER, stop_headsign TEXT, pickup_type INTEGER, drop_off_type INTEGER, shape_dist_traveled REAL);
CREATE TABLE calendar (service_id INTEGER,start_date TEXT,end_date TEXT, monday INTEGER,tuesday INTEGER,wednesday INTEGER,thursday INTEGER,friday INTEGER,saturday INTEGER,sunday INTEGER);
CREATE TABLE calendar_dates (service_id INTEGER, date TEXT, exception_type INTEGER);
CREATE TABLE shapes (shape_id INTEGER, shape_pt_lat REAL, shape_pt_lon REAL, shape_pt_sequence INTEGER, shape_dist_traveled REAL);
.separator ,
.import calendar.txt calendar
.import agency.txt agency
.import calendar_dates.txt calendar_dates
.import routes.txt routes
.import shapes.txt shapes
.import stop_times.txt stop_times
.import stops.txt stops
.import trips.txt trips

Important: The order of the fields in the GTFS dump tends to change. Make sure the order of the fields in the CREATE statements match the order in the GTFS files. These CREATE statements work for the utah-transit-authority_20111208_0426 file.

Save this file as create_fill_gtfs_tables.sql in the directory where the unzipped GTFS text files reside.

Next, we’re going to create the SQLite3 database and load the data into the tables.

mvexel> sqlite3 utagtfs.db < create_fill_gtfs_tables.sql

Test the result:

mvexel> sqlite utagtfs.db
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM stops WHERE stop_name LIKE '%3rd%';
stop_id     stop_code   stop_name        stop_desc         stop_lat    stop_lon     zone_id     stop_url    location_type  parent_station
----------  ----------  ---------------  ----------------  ----------  -----------  ----------  ----------  -------------  --------------
4170        118073      3RD AVE @ 203 E  E 3RD AVE & A ST  40.772882   -111.883441                          0
12651       118023      3RD AVE @ 402 E  E 3RD AVE & E ST  40.772807   -111.877229                          0
12652       118024      3RD AVE @ 488 E  E 3RD AVE & G ST  40.772816   -111.874724                          0
12653       118025      3RD AVE @ 580 E  E 3RD AVE & I ST  40.772818   -111.87192                           0
12654       118026      3RD AVE @ 686 E  E 3RD AVE & K ST  40.772818   -111.868838                          0
12655       118179      3RD AVE @ 788 E  E 3RD AVE & M ST  40.772802   -111.865804                          0
12656       118028      3RD AVE @ 848 E  E 3RD AVE & N ST  40.772793   -111.864359                          0
12657       118029      3RD AVE @ 948 E  E 3RD AVE & P ST  40.772748   -111.861256                          0
12658       118030      3RD AVE @ 1048   E 3RD AVE & R ST  40.77278    -111.858267                          0
12659       118031      3RD AVE @ 1136   E 3RD AVE & T ST  40.772784   -111.855372                          0
12660       118180      3RD AVE @ 1232   E 3RD AVE & VIRG  40.772791   -111.852561                          0
13455       118042      3RD AVE @ 1153   E 3RD AVE & T ST  40.772912   -111.854728                          0
13456       118043      3RD AVE @ 1039   E 3RD AVE & R ST  40.772916   -111.858361                          0
13457       118044      3RD AVE @ 953 E  E 3RD AVE & P ST  40.772917   -111.860875                          0
13458       118045      3RD AVE @ 855 E  E 3RD AVE & N ST  40.772891   -111.863713                          0
13459       118046      3RD AVE @ 805 E  E 3RD AVE & M ST  40.772915   -111.86534                           0
13460       118047      3RD AVE @ 705 E  E 3RD AVE & K ST  40.773093   -111.868189                          0
13461       118048      3RD AVE @ 611 E  E 3RD AVE & I ST  40.773017   -111.871066                          0
13463       118050      3RD AVE @ 453 E  453 E 3RD AVE &   40.773017   -111.875754                          0
15231       118074      3RD AVE @ 119 E  E 3RD AVE         40.772885   -111.886642                          0
20435       118039      3RD AVE @ 505 E  E 3RD AVE & G ST  40.772906   -111.874281                          0

Nice! Now figure out some neat queries to do the ‘What’s the next bus downtown I can take from within 1000ft of my current location?’ dance and make that WhatTheUTA app rock! I’ll buy whoever comes up with a working app or mobile web site first a 3.2% beer.

5 thoughts on “UTA GTFS transit data into SQLite

  1. Good stuff, thanks! One thing to note is that not all transit agencies use integers for stop_id (and some other fields where they could use integers). New York MTA, for example, has three-character varchars for stop_id.

  2. is it posible for 130mb of file(stop_times.txt) to save data into sqlite using the above procedure please give the explanation

  3. yeah I am also looking for the same…. is it efficient to store around 150mb file size in sql lite ?? I tried it with MySQl and my table size gone to nearly 1 GB. So back tracking for some other database that can make it efficient as well as handy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s