Abstract
Accurate forecasts and analyses of mortality rates are essential to many economic and finance practices, such as the designing of pension schemes. Recent studies have proposed advanced mortality models with age coherent forecasts, such that long-term predictions will not diverge infinitely among ages. Despite their effectiveness, individual models are inevitably misspecified in the empirical analysis, which reduces the reliableness. In this article, we propose a model averaging approach (MAA) that allows for age-specific weights and asymptotically achieves age coherence. Relevant technical details are also provided. The proposed method balances both the in-sample fitting and out-of-sample forecasting, with a uniquely designed smoothness penalty to resolve the potential overfitting and thus avoid abrupt changes in the long-term mortality forecasting. Using a large empirical dataset of 10 countries including 0 to 100 ages and spanning 1950 to 2018, the outstanding forecasting performance of MAA is presented. This robustly holds in various sensitivity analyses and is supported by simulation evidence. A case study is further conducted to discuss the improved forecasting efficiency of MAA, as well as its usefulness in economic and finance applications such as the annuity pricing. The proposed MAA approach is therefore a useful tool in forecasting mortality data for other practices.
ACKNOWLEDGMENTS
We thank the Editor (Michael Sherris) and three anonymous referees for providing valuable and insightful comments on earlier drafts. The usual disclaimer applies.
Notes
1 The data presented are the logged mortality rates averaged over 10 selected countries: Australia, Belgium, Canada, Czechia, France (civilian population), Ireland, New Zealand, Portugal, Sweden, and England and Wales (civilian population). The range of selected populations considers a diversified geographic coverage.
2 Overfitting here indicates that the in-sample errors are “abnormally” small, whereas out-of-sample forecast errors are large.
3 Note that the STAR, LC and SVAR models assume that errors follow the Gaussian distribution. It is well known that violation of the innovation distribution assumption may then reduce the credibility of interval estimates. Therefore, we choose to produce PIs using the bootstrap replicates, which are more robust against violation of the Gaussian assumption.
4 Note that in a general case, learners in the ensemble should be heterogeneous (in structure) to achieve the optimized diversity. As assumed in Theorem 2, all learners need to be age coherent individually for the MAA to produce age-coherent forecasts. Since age coherence is a relatively new concept, such learners are scarce in the existing literature. Thus, the ensemble discussed in this paper has a comparatively small model set to realize the optimized diversity. Nevertheless, the same principal will apply when more diversified individual age-coherent learners are available and included in the ensemble. With the development of age-coherent mortality models, the degree of diversity achieved via the proposed MAA can be examined in the future studies.
5 This means that the central mortality rates are calculated as deaths (the summation of male and female deaths) divided by exposures (the summation of male and female exposures). In other words, we do not distinguish mortality rates between males and females.
6 We also considered three additional sensitivities: a starting year of 1970, single-year ages 0–49, and singe-year ages 50–100. The results are consistent with our baseline findings and are available upon request.
7 Note that the BMA strategy does not employ a penalty term as in our MAA case. Consequently, BMA is likely to be less robust than MAA against data change. A detailed comparison is outside the scope of this article and is for future exploration.
8 To evaluate the influence of non-Gaussian distribution of errors, we also examine the case with multivariate t-distribution using the same mean and covariances. Although larger variations, as measured by the standard errors, are apparent, the mean RMSFE is largely identical to that with the Gaussian distribution. Our conclusions in this section then robustly hold with this change. Detailed results are available upon request.
9 Note that the 10 countries examined in this article may exhibit different demographic features. It is then debatable whether one single population is sufficiently representative of the whole group. To avoid this, in the case study, we choose to work with the hypothetical “average” population. Nevertheless, it can be shown that the same conclusions are applicable to all 10 individual populations.
10 Note that if is irrelevant to h (as in the STAR model), this condition is equivalent to