ABSTRACT
One common limitation faced in mode choice modeling is data imbalance. Mode choice models, such as logit models, may output biased estimations for alternatives with smaller shares and consequently have high prediction errors. Since accurate prediction of the less commonly used modes is important in some applications, such as predicting transit mode share in many auto-oriented American cities, it is essential to improve the prediction capability of logit models for those modes. Hence, this study applies an imbalanced learning technique and evaluates the prediction capability and interpretability of logit models under both balanced and imbalanced datasets using a case study for the City of Nashville, Tennessee. The results show that the proposed method improves the accuracy of the less commonly used modes and the mean absolute percentage error by 18% and 2%, respectively, while keeping the models interpretable. Finally, we provide some high-level guidelines for mode choice modeling with imbalanced data.
Acknowledgments
The research was supported by Tennessee Department of Transportation (TDOT) as part of the project titled as RES2020-15: “Improvement of Park-And-Ride Facilities and Services in Metropolitan Areas of Tennessee”.
Disclosure statement
No potential conflict of interest was reported by the author(s).