ABSTRACT
In this paper, we attempt to create a unique forecasting model to forecast out-of-sample the tourism demand in 24 European Union countries. The initial dataset included 34 relevant variables of annual frequency that span the period from 2010 to 2020 for 40 countries. A data prefiltering process resulted in a final set of 17 relevant variables for 24 countries. Additionally, in the effort to investigate the impact of uncertainty on international tourism, apart from the traditional factors that affect tourism, we also include variables that measure various forms of uncertainty: we use the World Pandemic Uncertainty (WPU) Index, the Global CBOE Volatility Index, the Political Globalisation Index, the Economic Globalisation Index, and the Political Stability Index. In the empirical part of our research, we employ and compare in terms of their forecasting accuracy a set of six state-of-the-art machine learning algorithms, the Support Vector Regression with both a linear and an RBF kernel, the Random Forests, the Decision Trees, the KNN, and gradient-boosting trees. The results show that the Gradient-Boosting Trees algorithm outperforms the other five models providing the most accurate forecasts with a MAPE of 0.10% and 1.36% in the training and the out-of-sample tests, respectively.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Variance Importance Factor (VIF): VIF is used to measure multicollinearity by calculating how much the variance of an estimated regression coefficient rises when predictors are correlated. If the VIF equals to 1, there is no multicollinearity among regressors. If the VIF is more than 1, then the regressors may be slightly correlated. A VIF of 5–10 suggests significant correlation. If the VIF exceeds 10, then the regressions coefficients are underestimated due to multicollinearity.