ABSTRACT
Research efforts examining bicycle-sharing systems (BSS) employed a wide range of sample size depending on the temporal or spatial aggregation. This paper proposes a systematic evaluation of the impact of sample size on model estimates, inference measures and predictive performance using data from New York City's CitiBike. We evaluate two major dimensions of BSS data: (1) system usage – impact of contributing factors on hourly arrival and departure rates at station level, (2) user destination choice – impact of factors on users' preference of destination station choice. The model estimation exercises for system demand and destination choice are conducted on several samples of data. The performance of these sample models in terms of parameters, inference statistics and predictions relative to a base sample data is observed. The results would help the analysts to make decisions on sample size for accurately examining BSS usage.
Acknowledgements
The authors are grateful for the insightful feedback from four anonymous reviewers on an earlier version of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1 We have six models and six tables of results. If we wanted to add the results for daily users, the entire effort documented will need to be repeated for daily users. Moreover, the daily users typically account for a small share of BSS usage; for example, in New York City, only about 10% of trips are made by daily customers in 2014. To be sure, the proposed modeling framework and the systematic evaluation of the impact of sampling procedure can be applied on trips made only by daily users.