139
Views
1
CrossRef citations to date
0
Altmetric
ARTICLES

Criteria for selecting model updating methods for better temporal transferability

ORCID Icon
Pages 1310-1332 | Received 31 Mar 2019, Accepted 15 Mar 2020, Published online: 03 Apr 2020
 

Abstract

When older and more recent datasets have large and small numbers of observations, respectively, then discrete choice modellers must decide whether to utilise both datasets with model updating (transfer scaling, joint context estimation, Bayesian updating, and combined transfer estimation) or only the more recent dataset. This study investigates the case when the data collection time points and the number of observations from each time point differ. Bootstrapping was applied to commuting mode choice models utilising datasets from Nagoya, Japan. The following criteria are proposed: (1) when the more recent time point has a large number of observations, use only the more recent data; (2) when the more recent time point has a smaller number of observations, use transfer scaling or joint context estimation based on the differences in the contexts of the two time points and the sample size from the older time point.

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers 25380564, 16K03931, and 19H01538. The author acknowledges the use of data provided by the Chubu Regional Bureau, Japan’s Ministry of Land, Infrastructure, Transport and Tourism, and the NUTREND (Nagoya University TRansport and ENvironment Dynamics) Research Group. This paper is based on presentations at the 95th Annual Meeting of the Transportation Research Board in Washington, D.C., U.S.A., in January 2016 and the 52nd Infrastructure Planning Conference of the Japan Society of Civil Engineers, Akita, Japan, in November 2015.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Other updating methods utilising aggregate data is not of interest in the present study. Interested readers refer to Ortúzar and Willumsen (Citation2011, Section 9.5).

2 Options 1 and 3 are not compared. Comparing options 1 and 2 has more value than comparing options 1 and 3, since Sanko (Citation2017) showed that option 2 produced better forecasts than option 3. Both Sanko (Citation2015, Citation2018) and the present study compare options 1 and 2. For option 1, Sanko (Citation2015, Citation2018) considered transfer scaling, while the present study considers transfer scaling, joint context estimation, Bayesian updating, and combined transfer estimation.

3 Four levels of hierarchy that affect transferability are discussed in Sikder (Citation2013). The top three levels are (1), (2), and (3) in the main text. Examples included in the three levels are (1) utility maximisation vs. lexicographic and trip-based vs. tour-based; (2) logit vs. nested logit; and (3) choice of explanatory variables, linear vs. non-linear formulation of explanatory variables, and consideration of heterogeneity. The present study's interest lies in (4) model parameter estimates. The two dimensions of interests – data collection time points and the numbers of observations – impact the model parameter estimates.

4 Several studies have investigated the impact of model specifications on temporal transferability, with most concluding that models with more explanatory variables are more temporally transferable (Badoe and Miller Citation1995b; Fox et al. Citation2014; Karasmaa and Pursula Citation1997; Train Citation1979). However, models estimated with more explanatory variables sometimes result in overfitting (Badoe and Miller Citation1995a). The present study does not address the impact of the model specifications, and the same model specifications are utilised throughout the analysis. Different model specifications result in different levels of temporal transferability. However, if these specifications improved or worsened to the same extent in all of the updated models, then the impact of the model specifications would be cancelled out. If the model specifications improved or worsened to different degrees, then the specifications would affect the results of the study. This is a topic for future study.

5 Cleaning the data for estimation purposes produced more than 30000 usable observations from all of the time points. However, in other studies by the author and in the previous studies reported in Section 2, analysing up to 10000 observations contributed significantly to the results. Therefore, the present study follows the same method.

6 The fixed number of observations and the varied numbers of observations respectively correspond to the larger number of observations from the older time point and the smaller number of observations from the more recent time point in the present study. Their study includes a case in which the varied numbers of observations are larger than the fixed number of observations; this is not of interest in the present study.

7 After cleaning the data for estimation purposes, the shares of travel modes among rail, bus, and car in 1971, 1981, 1991, and 2001 were 28%, 28%, 26%, and 25%, respectively, for rail, 21%, 9%, 5%, and 3%, respectively, for bus, and 51%, 63%, 68%, and 72%, respectively, for car.

8 Log-likelihood values are used for evaluation, since they are used for calculating other measures to assess transferability.

9 In 1971, 1981, 1991, and 2001, percentages of 20 years old or older are 94.1%, 96.2%, 97.1%, and 98.8%, respectively; percentages of 65 years old or older are 1.5%, 1.6%, 2.1%, and 3.1%, respectively.

10 Anonymous reviewers expressed concerns about too small a number of explanatory variables in the models. One of reviewers asked if the author adopted a set of variables currently/previously used in practice in Japan. Actually, the models are developed by the author and independent from any models used in practice. Of course, the present study intends to answer research questions which will provide practical implications, but the present study is based on academic interests. While reviewing models developed by others provides much insight into choosing explanatory variables, relying on them too much should be avoided, since this makes the present research a post-project evaluation. For the purpose of the present study, the author chose models that he considered the best. Another reviewer raised a question relating to sensitivity to the model specification, which is beyond the scope of the present study. See footnotes 3 and 4 for more details. There are infinite model specifications, none of which is perfect, meaning that no sensitivity analysis can persuade all researchers and practitioners. However, the impact of the model specifications on temporal transferability receives huge attention, so this direction is a topic for future study.

11 The author's approach of not considering the travel cost is empirically justified by Sanko, Morikawa, and Kurauchi (Citation2013). They estimated commuting mode choice models between car and public transportation for the Nagoya metropolitan area. They found that the car cost parameter was not estimated significantly and that the public transportation cost parameter was estimated with the wrong sign and was excluded from the model.

12 Mean absolute error, which calculates absolute differences between predicted and actual shares for each mode and sums up the absolute differences, sometimes is used as an accuracy measure. Although not presented in the study, the mean absolute error produced the same order of model superiority for all combinations of data collection time points.

13 The two requirements of (a) alternative-specific constants and (b) dummy variables apply differently to five updating methods. For sma, both requirements apply to data from y2. For bay and com, both requirements apply to data from y1 and y2 separately. For cst, both requirements apply to data from y1, and (a) applies to data from y2. For jnt, (a) applies to data from y1 and y2 separately, but (b) applies to pooled data from y1 and y2. Therefore, the remaining numbers of repetitions are larger in cst and jnt than sma but smaller in bay and com than sma for the dataset used in the present study.

14 Out of 1000 bootstrap repetitions, repetitions where all five updating methods produced poor estimates are removed from the analysis. In the repetitions remained, the updating method producing the highest forecasting performance most frequently is reported.

Additional information

Funding

This work was supported by Japan Society for the Promotion of Science (JSPS KAKENHI) [grant numbers 25380564, 16K03931, and 19H01538].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 594.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.