The previous two posts illustrate the extraction and exploration of trip data and weather data. In this section, I will illustrate importing the data of new membership subscription and the data of public holiday.
With the information of new membership subscription, I can recalculate the “active” user each day. This variable could well explain the number of trips, especially for customer user. And the public holiday could explain as well some strange behavior that are abnormal in the weekdays.
The libraries required are:
library(plyr) # require rbind.fill
Getting new membership subscription
The membership data is aggregated at daily level which can be obtained via the page - section “Citi Bike Daily Ridership and Membership Data”. Click on the link of data wrapper and download the csv file. (No R script is created, hence all manual download)
Then we can read all the csv files and do the crunching:
## GET csv file names
There are two problems in the downloaded dataset:
- the data of Oct - Dec 2013 only has the sum of 1 day pass and 7 days pass, no seperated volume.
- the data of Oct - Dec 2014 has some formating problem, which requires manual correction.
Since we only use the data of 2014, we ignore the first problem.
Some preprocessing is required.
## merge columns
Here is some descriptive plot:
qplot(dt_aggr$date_parsed, dt_aggr$subscriber_total, geom="line", main="Cumulative Subscriber")
qplot(dt_aggr$date_parsed,dt_aggr$new_subscriber_per_day, geom="line", main="New annual membership")
qplot(dt_aggr$date_parsed, dt_aggr$new_customer_per_day_sum, geom="line", main="New 24H/7D pass")
qplot(dt_aggr$date_parsed, dt_aggr$new_customer_per_day_pass_1day, geom="line", main="New 24H pass")
qplot(dt_aggr$date_parsed, dt_aggr$new_customer_per_day_pass_7days, geom="line", main="New 7D pass")
First we transform the data to calculate the active membership/pass.
Active annual subscriber:
dt_aggr$subscriber_exit <- lag(dt_aggr$new_subscriber_per_day, 365)[1:nrow(dt_aggr)]
Active customer only calculated for 2014:
dt_aggr$customer_pass_1day_exit <- lag(dt_aggr$new_customer_per_day_pass_1day, 1)
Prepare the data to save:
dt_aggr <- tbl_df(dt_aggr) %>%
Getting public holiday data
The public holiday data is getting through this site. We will stick to R to do some web scrapping. The package required is
## Base information