New York city’s Citi Bike provides an open data. All files are zipped and can be found over the site.
The objectif is to use the dataset to do explore some interesting insight of the NYC citibike users’ usage patern.
The current documentation is to make the data retrieval reproducible.
The pipeline of data retrieval is:
- Scrapper the page which lists all the zip files
- Get the url of all the zip files
- Download the zip files and unzip
- Construct a data frame
- Due to large volume of data, the data is stored in SQLite and only subset a trunk
The package required: