Tuesday, August 12, 2014

Visualizing Bike Trip Data

In an earlier post I discussed the Bay Area Bike Share program, which is coming up on its one year anniversary.  BABS has released several detailed data sets that cover the first half year of operations, and held a challenge for "...anyone with a bit of curiosity to present the data in visually compelling ways."



You can view the winners here.

Four separate data sets were included in this competition and these data can still be downloaded  at the link above.  These files include:

1) approximately 17 million records of "rebalancing data" (these show availability of bikes and docks at stations)
2) 69 station records (station latitude, longitude, etc.)
3) approximately. 144,000 trip records
4) 920 records of daily weather by city

To me the most interesting of these four is 3.), the trip data, which includes data on individual trips, including the starting and ending station, duration in seconds, and the rider's zip code.  I can foresee a variety of ways of using these data in teaching and research.  In fact, last year I used bike share trip data on the final exam I gave in my Economics Statistics class.  Questions 6 through 13 use some data (that ostensibly corresponds to my bike trips to and from campus and the San Jose train station,) to test understanding of a difference in means hypothesis test.  In my class, I also emphasize the connection between linear regression and difference-in-means tests, as I hope this brings into focus the "big picture" of statistical hypothesis tests for students.  I plan to use this approach to teaching statistics in the future, and now that BABS data is available, I can use it to develop corresponding empirical exercises.