View on GitHub

Project 3: Right Here, Right Now

CS 424 Project

Home Youtube Demo Run the App Project Data Team SourceCode How-To Interesting Findings and Patterns
Download this project as a .zip file Clone This Project

Project Data

Data Overview

Project 3 is a data visualization application on real-time feeds of Chicago 3-11 data (potholes, abandoned vehicles, etc.), CTA Bus Tracker data, and Divvy Bike data. On this page, we will describe API's used to retrieve the data, the refresh data rates, and optimization design decisions for enabling the application to handle the approximately 30,000 data points loaded and refreshed.

API's\External Services

  • Socrates Open Data API (SODA) allows you to programmatically access the Chicago Data Portal, which contains a majority of the datasets that we included in our application. On each call, up to 5,000 rows can be retrieved.
  • Bus Tracker API tracks all of the routes in the city and from the routes, the buses are found for each route and displayed on the map (for a selected shape\region). No more than roughly 7 requests can be made to the bus tracker per hour.
  • dbpediaLayer Library\API for Wikipedia accesses Wikipedia to provide wiki descriptions about interesting locations in Chicago.
  • yahooapis is an interface to the Yahoo Query Language (YQL) that allows you to query data from APIs (such as SODA API, the Divvy Bikes data, etc). The biggest benefit is even if cross-domain (that is, jsonp) is not supported, yahooapis acts as a json proxy.

    Datasets

    The following provides a description of each of the datasets used in the application.
    DatasetDescriptionAPIRefresh Rate (Seconds)
    PotholesThese are small holes in the road that need repair.SODA120
    Abandoned VehiclesA car is considered abandoned if it's irrepairable, it's deserted, or it's hazardousSODA130
    Street LightsAn outage of at least 1 street light out on a street.SODA140
    CrimeThese are reported crimes (with the exception of murder) occurring since 2001.SODA100
    Divvy Bike StationsLists the current status (in service vs out of service, available docks, etc) for each station.JSON80
    Food InspectionThis refers to the inspection of food establishments in Chicago since 2010.SODA150
    CTA Bus TrackerThe current CTA buses in Chicago moving around.CTA API60

    Data Design

    We now discuss the design details of each of our datasets.
  • Potholes required 6,000 rows (that is, 2 calls, one for 5,000 rows and a subsequent call for 1,000 more rows) in order to obtain 1 1 month full of data. Also, the potholes with status of 'Completed' and 'Completed - Dup' are excluded as the user only should care about those potholes that are still open. It should be mentioned that we filter this dataset by (1) less than 1 week old VS (2) up to 1 month old in our right hand side control toolbar on our application. Finally, note the slow refresh rate (120 seconds) primarily because this data will not change that much in that time anyways. We detect if a new pothole appears in our application by comparing if the service request number associated to that pothole already exists.
  • Abandoned Vehicles required 3,000 rows (that is, 1 call for 3,000 rows) in order to obtain 1 month full of data. Also, the abandoned vehicles with status of 'Completed' and 'Completed - Dup' are excluded as the user only should care about those potholes that are still open. It should be mentioned that we filter this dataset by (1) less than 1 week old VS (2) up to 1 month old in our right hand side control toolbar on our application. Finally, note the slow refresh rate (130 seconds) primarily because this data will not change that much in that time anyways. We detect if a new abandoned vehicle appears in our application by comparing if the service request number associated to that abandoned vehicle already exists.
  • Street Lights required using two datasets (1-2 street lights dataset and 3 or more street lights dataset). There were 3,000 rows extracted for the 1-2 street lights out dataset and 10,000 rows extracted (across two calls of 5,000 rows each) for the 3 or more street lights out dataset. This guaranteed that we obtain 1 month full of data. Also, the street light repairs with status of 'Completed' and 'Completed - Dup' are excluded as the user only should care about those potholes that are still open. It should be mentioned that we filter this dataset by (1) less than 1 week old VS (2) up to 1 month old in our right hand side control toolbar on our application. Finally, note the slow refresh rate (140 seconds) primarily because this data will not change that much in that time anyways. We detect if a new street light repair request appears in our application by comparing if the service request number associated to that street light repair already exists.
  • Crime required 25,000 rows (across 5 calls of 5,000 rows each) to guarantee that we obtain 1 month full of data. By far, the crime data is the largest of any dataset and also the most important to the user, so we set the refresh rate of 100 as a reasonably high refresh rate for this dataset. It should be mentioned that we filter this dataset by (1) less than 2 weeks old VS (2) up to 1 month old in our right hand side control toolbar on our application. We detect if a new crime incident appears in our application by comparing if the identification (id) number associated to that crime already exists.
  • Divvy Bike Stations requires simply one call to obtain information on the complete 300 divvy stations found across the city. We check to see if a new divvy station is added, or if the number of available bikes, or if the total docks available changes. A relatively fast refresh rate (80 seconds) reflects the fact that there are changes often to divvy bikes stations.
  • Food Inspection required 2,000 rows in 1 call to guarantee that we obtain 1 month full of data. We check to see if a new inspection id is created that does not already exist since last refresh to determine if a new one is added. A slow refresh rate of 150 seconds reflects how slow we expect this dataset to update (that is, not very often).
  • CTA Buses are literally moving (in terms of longitude and latitude) every minute and therefore our refresh rate is 60 seconds. It requires approximately 16 calls of 8 specific routes each in order to obtain all of the CTA buses across the city. The routes are publically published on the CTA Bus Tracker site and hence is used to extract all of the buses.
  • Data Implementation

    We implemented our datasets using yahooapis wrapper for each specific http request call. Custom marker icons were developed to uniquely identify each of the datasets along with a unique color for each. Finally, we added popup option upon click that displays information about each data point. The three screenshots below are example popups for (1) potholes, (2) food inspection, and (3) crime.