1,327
Views
0
CrossRef citations to date
0
Altmetric
Acceptance – Research Article

Determinants of basic childhood vaccination coverage in European and OECD countries

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2123883 | Received 10 Jun 2022, Accepted 08 Sep 2022, Published online: 29 Sep 2022

ABSTRACT

Vaccination coverage varies between countries and over time. Using official databases, we extracted data on 50 national-level immunization, socio-economic, demographic, healthcare, and cultural factors, and the uptake of the third dose of diphtheria toxoid, tetanus toxoid, and pertussis vaccines (DTP3) and the first dose of measles-containing vaccines (MCV1) for 61 countries between 1990 and 2019. The main branch of the analysis included all covariates, while a secondary branch excluded life-expectancy and child mortality. The statistical analysis was completed in three stages: a variable-selection stage via random forests; multilevel multiple imputation for missing data in the reduced dataset; and generalized estimating equations (GEE) over all imputed datasets with pooled results. Less than 20 covariates were retained after variable-selection. Among a relatively small number of statistically significant (p-value <.05) effects in the pooled GEE results of our main branch, under-5 mortality and long-term orientation culture showed negative associations with both uptake outcomes and GDP per capita a positive association. For MCV1, whether a second dose was integrated into routine immunization appeared as the overall strongest negative correlate. In the secondary analytical branch, results were largely consistent, with a few additional statistically significant effects emerging, mainly related to immunization and healthcare system characteristics. These insights improve our understanding of the main factors influencing vaccine uptake, some of which are broadly contextual (e.g., GDP, socio-cultural factors), requiring bespoke vaccine program approaches, in order to maximize childhood vaccine uptake over time.

Introduction

Vaccination has been lauded as one of the greatest achievements of public health.Citation1,Citation2 The World Health Organization (WHO) estimates that 2–3 million deaths are prevented yearly thanks to immunization.Citation3 Benefits of vaccines go beyond prevention of infection in the vaccinated and reduction of pathogens’ circulation in the broader community. Indeed, the economic and societal impact has been shown to be far-reaching: from saving healthcare costs and averting productivity losses, to better cognitive development, educational achievement, and subsequent productivity and wealth,Citation4–7 as well as greater health equity and social integration.Citation8 Despite tremendous successes in recent decades (such as smallpox eradication and regional polio elimination), outbreaks of vaccine-preventable diseases continue to occur even in countries with relatively high vaccination coverage rates,Citation8 at least in part due to the underimmunization of specific subgroups in the population.Citation9,Citation10 In order to maintain health, economic, and societal gains, continued surveillance of vaccine uptake and a better understanding of the drivers behind it are needed.

Much scientific research to date has focused on the topics of vaccination status, immunization timeliness, and vaccine hesitancy. The latter has been identified among the top 10 threats to global health by the WHOCitation11 and, while generally more common in wealthy nations,Citation8 this phenomenon has been observed in both “developed” and “developing” countries.Citation12 Recent (systematic) reviewsCitation13–20 summarize empirical findings with regard to a large number of vaccination determinants, such as immunization program and logistics (e.g., vaccine availability and storage, accessibility, affordability), as well as child, parental (education, knowledge, attitudes), and family (socio-economic status, location) characteristics. Using survey, interview, or administrative data, previous research has investigated determinants and outcomes at the individual-level, with multi-country studies, for the most part, focusing on low- and middle-income countries (LMIC). The challenges in achieving and maintaining high vaccination coverage differ fundamentally between countries of completely different income levels. It is therefore of interest to focus on countries within a limited range of income variation. Indeed, important variations in vaccination coverage are observed among relatively wealthy countries, even within Europe. Despite many historical similarities, European countries tend to fall into different groups of sociocultural contexts Citation21,Citation22 and have made different choices in the establishment of their health systems and implementation of their vaccination programs. Since vaccination is widely considered as one of the most important tools in public health, there is a continued need to examine which country and time-dependent factors may influence childhood vaccination coverage the most. This need seems obvious: we can learn from country differences to improve our understanding of the relative importance of different factors. Examining rich datasets to investigate the influence of a wide array of potential determinants of vaccination coverage rates in European and in other predominantly wealthy countries over time should therefore be of high interest to vaccine program managers and policy-makers.

Materials & methods

We were interested in performing analyses on a diverse and large sample of relatively wealthy countries that belong to Europe or to the Organization for Economic Co-operation and Development (OECD). Appendix A presents a table with the rankings of the 61 countries included in this study according to gross national income (GNI) for 2019, the last year we considered. As per the World Bank’s classification by income levelCitation23 from the 1st of July 2020, all but one country fall either in the “high income” or the “upper-middle income” brackets. This selection was thought to allow us to collate a rich dataset from existing databases for both the outcomes and potential determinants, allowing for a thorough analytical approach.

Outcome variables

We considered the nation-wide coverage of the third dose of diphtheria toxoid, tetanus toxoid, and pertussis vaccines (DTP3) and the first dose of measles-containing vaccines (MCV1) as outcomes, which are traditionally used as proxy performance measures of childhood immunization programs.Citation24–29 Because of data-availability concerns and COVID-19’s negative impact on routine immunization efforts worldwide,Citation29,Citation30 we decided to take into consideration the annual WHO and United Nations International Children’s Emergency Fund (UNICEF) coverage estimates for these two vaccinesCitation31 for the 30-year period leading up to 2019 (i.e., 1990–2019). shows that among 2019 coverage rates for the 61 European or OECD member countries included in our study, all but a few stand above the global average of 85%.Citation26 The interested reader can also find an interactive visualization of the longitudinal national coverage profiles of the countries included in Appendix B or at https://www.simid.be/software/dtp3/.

Figure 1. First dose measles-containing vaccine (MCV1) and third dose diphtheria-tetanus-pertussis (DTP3) vaccination coverage rates for the 61 countries included in the study (only 2019 shown).

Figure 1. First dose measles-containing vaccine (MCV1) and third dose diphtheria-tetanus-pertussis (DTP3) vaccination coverage rates for the 61 countries included in the study (only 2019 shown).

Covariates

A full list of the 50 covariates included in this study can be found in . Ten WHO immunization coverage indicators applicable to one or both of the vaccines of interest were taken into account.Citation32 In addition, we developed a new financial variable in order to measure relative expenditure on immunization activities. For this purpose, we used the WHO indicator “What is the total expenditure (from all sources) on routine immunization?,” whose original values were reported either in USD or in local currency (LCU). If reported in local currency, values were first converted to USD according to the official exchange rate (LCU per USD, period average) of the International Monetary Fund, as reported by the World Bank.Citation33 All amounts were then divided by the respective country’s nominal GDPCitation33 in order to obtain a measure reflecting immunization expenditure relative to GDP. Furthermore, 20 general economic, demographic, population health, political stability, and healthcare system indicators were collected from two World Bank databases,Citation33,Citation34 and data on the average number of completed years of education was retrieved from the United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Statistics (UIS).Citation35 The World Values Survey (WVS) was used as a source of information on social, political, cultural, economic, and religious values. WVS is a widely used and cited noncommercial long-term project investigating human beliefs and values across the world. As of May 2022, the WVS has been executed in over 120 societies, in 7 waves. For the purposes of our study, we extracted 12 aggregate indicators from the “WVS Wave 1 to 6 Key Aggregates” dataset.Citation36 Finally, we also considered information from the 6-dimensional Hofstede national culture model.Citation37 Inspired by an IBM International personnel survey in the 1960s, this model has been continuously developed and currently includes six indices, based on data from over 100 countries worldwide. The six dimensions of national culture and their definitions,Citation38 as conceived by Hofstede and colleagues, can also be found in .

Table 1. Initial list of covariates.

Statistical analysis

Our initial dataset consisted of two outcome variables and 50 covariates, for 61 countries, over a 30-year time-span (1990–2019). The dataset had a two-level structure, as the six Hofstede cultural indices are considered time-invariant and thus have just one value per country. The relationship between 48 and 49 covariates and DTP3 and MCV1 national vaccination rates, respectively, was investigated separately for each outcome in the main branch of the analysis. In an additional, secondary, branch, we excluded life-expectancy at birth and under-5 mortality, because of a possible bi-directional causal relationship, leaving 46 covariates for DTP3 and 47 for MCV1.

The statistical analysis was completed in three stages, executed in turn for each outcome in each branch (with and without life-expectancy and under-5 mortality). In the first stage, we employed the random forests (RF)Citation39 procedure as a variable-selection technique. Fine-tuning model parameters, we ran the procedure a total of 11 times (refer to Appendix C for full details on the statistical methods employed). In order to determine the most important covariates, we looked at the variable importance indicator and ranked covariates according to the number of times each appeared among the “top 20” variable importance scores. For a reduction of over 60%, down from 48–49 to 18–19 covariates, only those with a frequency of at least 8 out of 11 were retained for further investigation. In the second stage of the analysis, we performed multilevel multiple imputation in order to address the issue of missing data. Missingness after variable selection was 19% in the main branch and 24% in the secondary branch, over the total of 42,090 data points in both reduced dataset (30 time points for 61 countries for 23 variables). Before implementing multiple imputation, the arcsine transformation was applied to the two coverage rates expressed as proportions, and then all variables were standardized to have a mean of 0 and a standard deviation of 1. Using multivariate imputation by chained equations (MICE)Citation40 predictive mean matching (PMM), we obtained 20 imputed datasets, which were then analyzed individually using generalized estimating equations (GEE),Citation41 in the third stage of the analysis. Finally, results from the 20 individual GEE models were pooled together to get an estimate for each covariate’s effect. See for a diagram of the main branch of the analysis (the secondary one is identical, apart from the fact that it contains two covariates less – life-expectancy and under-5 mortality).

Figure 2. Flow chart of analytical steps (example of the main branch).

Figure 2. Flow chart of analytical steps (example of the main branch).

Results

shows the covariates retained in the main branch of the analysis, after completing the first, variable-selection, stage. Covariates are listed according to the data source (i.e., in no particular order) and the number of RF models (out of 11 possible) in which they had importance scores among the “top 20” is shown as well, in case they were retained for a particular outcome. Nineteen covariates were retained for DTP3, 16 of which were also among the 18 covariates retained for MCV1. According to our established criterion, nurses and midwives, political stability and absence of violence/terrorism, and surface area turned out to be important only for DTP3, while births attended by skilled health staff (and, naturally, whether the second dose of the Measles vaccine is integrated into the routine immunization system) were only to be included with MCV1 as an outcome. Leaving life-expectancy and under-5 mortality out in the second, additional, branch of the analysis did not have a significant impact on the selection of variables, as can be seen in . Eighteen covariates were retained for both DTP3 and MCV1. In comparison with the main branch, surface area, political stability and absence of violence/terrorism, number of adverse events following immunization, and whether the country has a standing technical advisory group on immunization were retained for the analysis of MCV1 now, while individualism and power distance from the Hofstede cultural indices were dropped. The only difference for the DTP3 selection in this branch compared to the main one was the inclusion of births attended by skilled health staff now.

Table 2. Covariates retained in the main branch of the analysis based on variable importance scores in the random forests (RF) procedure (if a covariate was retained for a given outcome, its score here shows in how many of the 11 runs of the RF procedure it was among the “top 20”*).

Table 3. Covariates retained in the branch of the analysis without life-expectancy and under-5 mortality based on variable importance scores in the random forests (RF) procedure (if a covariate was retained for a given outcome, its score here shows in how many of the 11 runs of the RF procedure it was among the “top 20”*).

show our final results for the main and additional branches of the analysis, respectively, after pooling the GEE effect estimates over the multiple imputed and separately analyzed datasets. Starting with DTP3 in the main analysis (), we observe the strongest overall and strongest negative statistically significant (at the conventional 0.05-level) association with under-5 mortality (coefficient of −0.413). The second strongest, this time positive, effect is that of power distance (0.374), followed by GDP per capita (0.259) and political stability and absence of violence/terrorism (0.185), both positively related as well. Long-term orientation (−0.165), surface area (−0.138), and whether the country has a multi-year plan for immunization (coded as 0=no/1=yes; coefficient of −0.097) all have smaller negative statistically significant effects. With regard to MCV1, whether the second dose of the vaccine is integrated into the routine immunization system shows a very strong negative association (−0.924), meaning that as the status of the second dose changes from “not integrated,” to “partially integrated,” to “integrated,” MCV1 coverage rates tend to decrease. Furthermore, under-5 mortality shows a negative effect (−0.428), very similar in magnitude to the one we see for DTP3. Lastly, indulgence (−0.323) and long-term orientation (−0.218) both exhibit a negative relation to MCV1, while GDP (−0.319) and urbanization (0.274) show a positive relation.

Table 4. Pooled generalized estimating equations (GEE) results from the main branch of the analysis (statistically significant effects at the 0.05-level are shown in bold).

Table 5. Pooled generalized estimating equations (GEE) results from the branch of the analysis without life-expectancy and under-5 mortality (statistically significant effects at the 0.05-level are shown in bold).

Examining the pooled estimates from the secondary analysis (), we see a larger number of statistically significant effects. The additional effects we found here can be assumed to have been previously “absorbed” by those of life-expectancy and under-5 mortality, which were included as covariates in the main analysis. Overall, the results are consistent with those of the main analysis, especially regarding the effects of the Hofstede cultural indices, GDP per capita, and the indicator pertaining to the second MCV dose. Births attended by skilled health staff now show a statistically significant positive relation with both outcomes (coefficient of 0.190 for DTP3 and 0.156 for MCV1), while, on the contrary, whether the country has a multi-year immunization plan is no longer significant for DTP3 (the effect was significant, but very small before). In addition, the proportion of the population living in urban areas now shows a positive effect on DTP3 (coefficient of 0.223, being barely non-significant in the main analysis before with a p-value of 0.06), while the density of nurses and midwives exhibits a smaller, negative, association (−0.108). With regard to MCV1, the indicator on whether the country has a standing technical advisory group on immunization (not selected for retention in the main analysis) shows a relatively strong negative effect (−0.563), meaning that countries with such an advisory group tend to have lower MCV1 coverage rates. Finally, the current health expenditure and density of physicians also showed a statistically significant negative, albeit much weaker, relation (coefficients of −0.155 and −0.104, respectively).

Discussion

With the present study, we attempted to uncover national-level determinants of vaccination coverage among a large number of factors related to immunization and healthcare system organization, economics, demographics, politics, culture, finances, and societal values in European states and OECD member countries. This study provides a marked expansion of most previous research, as we considered a large array of data for over 60 countries simultaneously, over the span of 30 years. We applied rigorous statistical methods to select variables, correct for missing data, and, finally, estimate longitudinal effects. Our results point to a number of interesting findings.

To begin with, one of the most stable effects observed in our analysis was that of GDP per capita. Often used as a proxy for development and wealth, this is the most widely studied and established positive determinant of general population health.Citation42 It is reasonable to expect nations experiencing economic prosperity to be more successful in the organization and financing of their healthcare systems, and, thus, immunization activities, by extension. However, when GDP per capita is already high, the marginal gains of rising GDP as a contributor to basic healthcare and health correlate are expected to decrease. Yet, we find that even among relatively wealthy countries with already relatively high vaccination coverage, GDP per capita still emerges, on average, as an important determinant for vaccination coverage differences between countries and over time. Political stability and armed conflict would clearly be factors that undermine immunization efforts, as vaccination logistics then become more difficult to maintain and deploy. Indeed, the political stability and absence of violence/terrorism covariate in our analysis showed an important and persistent positive effect on DTP3 vaccination coverage – a finding also supported by other research.Citation43,Citation44

Another pair of determinants with anticipated observed effects in our study is that of surface area and urbanization. A larger territory (as in Canada, Russia, the USA, and Australia) means that remote areas are harder to reach and therefore likely underserved by routine immunization programs, and that healthcare and other policies may be more diverse and fragmented within the country compared to smaller-area countries. The negative effect of surface area in our analysis was statistically significant only for DTP3, perhaps due to the difference in age at vaccination relating to access to vaccination and ease of travel (typically, DTP3 is administered at around 16 weeks, and MCV1 at around 12–15 months of age). In accord, a number of individual-level studies in high-income countries have reported parents stating practical barriers as reasons for not vaccinating their children.Citation18 On the other hand, high levels of urbanization affect vaccination coverage rates in a positive way. The relationship is apparent in countries like Belgium, Israel, Japan, and Luxembourg, for example, where more than 90% of the population live in urban areas and where coverage rates have been very high in recent years (evidence from other, single-country, studies speaks of the same phenomenon).Citation15 Naturally, immunization activities are easier to organize in urban areas, they reach a much larger proportion of the population more efficiently, and provide for greater overall accessibility to vaccination.

A determinant deserving special mention in our discussion is that of under-5 mortality, considered indicative of the performance of healthcare services in general.Citation45 Its negative relationship with vaccination coverage suggests that, on average, the impact of lower vaccination coverage causing higher disease burden dominates upward pressure on vaccination coverage through an expected higher willingness to accept vaccination when the disease burden it would prevent is higher (referring to the so-called prevalence elasticity of vaccine demand).Citation46 That is, causality from lower vaccination coverage to increased mortality under 5 seems to dominate potential reverse causality of higher mortality leading to higher vaccination coverage. The observed effect in our analysis was relatively large, for both vaccines. When under-5 mortality was taken out of the equation in the secondary branch of our analysis, the proportion of births attended by skilled health staff came to the forefront as a statistically significant positive determinant. This indicator can be seen as a proxy to overall maternal and postnatal care organization, indicating that a lower threshold – both in access to information and logistics to facilitate actual access to vaccination – is important in order to improve vaccination coverage.Citation47

With regard to immunization and healthcare system indicators other than births attended by skilled health staff, our analysis yielded more puzzling results, at least at first glance. Current health expenditure, density of physicians, density of nurses and midwives, whether the country had a multi-year plan (MYP) or a standing technical advisory group (NITAG) for/on immunization, or if the second dose of the Measles vaccine was integrated into the routine immunization system, all showed statistically significant negative effects, mostly for MCV1 and mostly in the absence of under-5 mortality (i.e., in the secondary branch of the analysis). One possible explanation for that could be that structures like MYP and NITAG were put in place when and where vaccination coverage rates were low, and MCV2 was relatively more included in routine immunization (both as a booster dose and as a “catch-up” opportunity) in countries where, or in periods when, relatively more children missed MCV1. Furthermore, if we consider factors like health expenditure and medical personnel availability to be indicative of well-organized and functioning healthcare systems with a focus on cure rather than prevention, and we know these are more characteristic of high-income countries, we come to suspect vaccine hesitancy to be the underlying factor behind some of our observed effects. The point that demand-related issues are at the core of low coverage rates in “developed” countries (as opposed to availability issues in “developing” nations) has already been put forward in the literature.Citation18,Citation48 With vaccine-preventable disease burden less visible, the subsequent undervaluation of prevention, the existing safety net of good healthcare, and the easier spread of misinformation (the most well-known and cited example of MMR vaccine scare),Citation49 it should not be surprising that strong hesitancy attitudes have emerged in recent decades. For instance, Larson et al.Citation50 suggested links between higher GDP, total health spending, births attended by skilled health staff, access to healthcare, and mean years of schooling on the one hand, and lower positive vaccine sentiments on the other. If we accept that these individual-level sentiments translate directly into lower national vaccination rates, our findings provide congruent complementary information to these observations.

Lastly, based on our results, we would like to draw attention to the concept of national culture, as a possible important player when it comes to childhood vaccination. The six Hofstede indices of national culture that we considered were consistently identified among the most important covariates during the variable-selection phase of our analysis, across the multiple RF models we ran. Power distance, or a population’s tendency to accept the existing uneven distribution of power, showed a relatively strong statistically significant positive effect in both branches of the analysis for DTP3. These results may be attributed to a general acceptance of and trust in the immunization practices authorities have implemented. At the same time, indulgence, or the tendency to “allow relatively free gratification of basic and natural human drives related to enjoying life and having fun,” showed a negative effect of similar size for MCV1, perhaps alluding to a desire to avoid non-pleasurable activities without immediate gratification, such as vaccination. A more curious observation though was the negative effect of long-term orientation, statistically significant for both vaccines and in both branches of the analysis. Higher long-term orientation stands for easier adaptation to changing conditions, a strong inclination to save and invest, and perseverance in achieving results. That long-term orientation would negatively affect vaccination coverage may again be pointing in the direction of vaccine hesitancy and possible vaccine safety concerns. While we did not account for vaccine hesitancy explicitly, we did include a number of country-level attitudinal covariates relating to trust, perceived fairness and democracy, and information connectedness, for example, which concepts may be assumed linked to hesitancy to some degree. None of those indices survived the variable-selection phase of the analysis, though, while, again, all cultural factors did. To our knowledge, no other study to date has investigated national culture characteristics in the context of vaccination, unlike individual-level attitudes and hesitancy toward vaccines, that have received considerable attention. It is conceivable that different cultures are susceptible to vaccine hesitancy to a different degree, and therefore this may be a research route worth exploring in the future. Some approaches may have different effects in one cultural context, as opposed to another. Another topic of further study may be the interplay of different cultural traits and the communication and implementation strategies adopted by vaccinating authorities.

Before concluding, we need to acknowledge a few statistical and conceptual limitations to our study. Firstly, although we used only widely accepted authoritative international databases for both the determinants and the vaccination coverage data and many variables are measured in the same manner for all countries using international standards, there is always the possibility that some countries’ and/or time period data are less reliable than others. It is impossible for us to determine whether this occurs, and if so, how this would influence our findings since there is no telling which and how many of the variables are potentially impacted and whether this affects some countries more than others. If data would be unreliable due to fraud, one can speculate that this occurs more for country-reported data in countries where there is more corruption, and then it could have the effect that the influence of variables expressing corruption is underestimated, especially if vaccination coverage outcome data are more subject to this type of data unreliability than the other variables we used. Secondly, with regard to the presence of missing data in our dataset, while we would have liked to estimate effects for all of the 50 covariates we initially had, multiple imputation, as the “state-of-the-art” technique to apply, was just technically impossible to implement with that many variables relative to the 61 clusters, or countries, that we had. However, the consistency we saw in the results of the different runs of the RF procedure, regardless of the different model specifications, gave reassurance that, indeed, the most important covariates were identified. Next, we were, unfortunately, unable to provide direct interpretation of the size of the effects we found, due to the necessary data transformations. Conversely, as variables were now on the same scale, we were able to compare effects and speak of stronger or weaker associations in relative terms. Mild to moderately strong correlations were present in the dataset, as can be expected with so many related variables. Dealing with longitudinal data, we did not have the option of applying a well-established dimension-reduction technique eliminating multicollinearity (such as principal components, for example), even if the subsequent decrease in interpretability would have been deemed acceptable. We, therefore, have to emphasize that results from this study should be read with due caution. Additionally, there are inevitable limitations of using national-level vaccination coverage rates by a certain age as an outcome of interest – these are not sensitive to vaccination timelinessCitation51 and ignore possible sub-national and regional coverage differences. Nevertheless, we believe that the macro-level insights obtained in this study constitute a significant contribution toward a greater understanding of the main factors influencing vaccination uptake. Understanding the mechanisms and determinants of vaccine uptake remains important, given the indisputable public health impact of higher vaccination coverage, which also reverts back to some of the covariates that may explain it, like GDP per capita. What is striking in this analysis is the identification of a relatively small subset of determinants that is influential to explain between-country differences in childhood vaccine uptake. This may guide politicians and public health authorities to develop the circumstances in which these determinants can be most effective in achieving high vaccination coverage. Clearly, this task is easier to achieve for the more specific determinants than for the more contextual determinants our analysis identified as influential.

Supplemental material

Supplemental Material

Download Zip (351 KB)

Acknowledgements

We thank Lander Willem for posting the vaccination coverage longitudinal visualization on the simid.be website.

Disclosure statement

Unrelated to the work reported here, the University of Antwerp has received unrestricted grants and compensation for meeting attendance with GSK, Merck, and Pfizer. FV contributed as a full-time employee of the University of Antwerp. In April 2022, after his contributions to this work ended, FV was employed by GSK.

Supplementary material

Supplemental data for this article can be accessed on the publisher’s website at https://doi.org/10.1080/21645515.2022.2123883

Additional information

Funding

The work reported here received funding from the Research Foundation Flanders (https://www.fwo.be/en/), FWO project number G0D5917N, and the European Union’s Horizon 2020 research and innovation programme (Project EpiPose – No. 101003688, 2020). Unrelated to the work reported here, the University of Antwerp has received unrestricted grants and compensation for meeting attendance with GSK, Merck, and Pfizer.

References

Appendix A

Table A1. Ranking of selected countries according to gross national income (GNI) per capita (in current USD) for 2019 (data not available for Andorra, Liechtenstein, Monaco, San Marino). The World Bank’s classification by income level from the 1st of July 2020 shown is based on GNI per capita for 2019*.

Appendix B

Appendix C.

Statistical analysis

  • Data preparation

Based on data availability for different countries and time periods over the selection of data sources, we considered data for 61 European or Organisation for Economic Co-operation and Development (OECD) member countries for the years between 1990 and 2019. The main branch of our analysis included two outcome variables (DTP3 and MCV1 coverage rates) and 50 covariates in total, 48 for DTP3 and 49 for MCV1, as three covariates were only applicable to one vaccination rate or the other. As an additional, secondary, branch of the analysis, we also ran the whole procedure excluding life-expectancy at birth and under-5 mortality, because of a possible strong bi-directional causal relationship with the outcomes (thus having 46 remaining covariates for DTP3 and 47 for MCV1).

With the exception of the six Hofstede cultural indices that are considered time-invariant and thus consist of just one value per country (or region), all other data came in the form of annual entries, which gave our dataset a two-level structure. We considered only national values for the cultural indices and ignored any sub-country divisions. In the case of Germany, the World Values Survey (WVS) reported separate values for East and West Germany for all waves of the survey. In order to obtain a single national value, we used a weighted average based on the year-specific population size.Citation52 For this set of 12 WVS covariates, we disregarded the original wave-based structure of the obtained data and instead considered the year in which data for a particular country was collected, for the purposes of data integration across the different databases we utilized. This, in addition to the fact that some of the countries of interest had only participated in one or two waves of the survey, or never at all, resulted in pronounced data sparsity for the WVS indicators, which was treated as other missing data (see more below).

  • Random forests (RF)

As a first step in both branches of the analysis – with and without life-expectancy and under-5 mortality, we employed the random forests statistical procedure as a variable-selection technique, using SAS. Random forestsCitation39 is a well-known and widely used supervised machine-learning technique primarily used for predictive modeling. It builds an ensemble of decision trees introducing randomness at two different points in the process of developing each individual tree. On the one hand, tree training is done using only a fraction of the original data (in-bag fraction) selected via bootstrap sampling (without replacement, in our case), while the rest of the data is set aside for model validation. On the other hand, the split-variable for each split on each tree is selected from a random sample of the original covariates, thus reducing correlations between the trees. The procedure has great efficiency and is very robust. It is also well suited to handle missing data. In addition to all covariates, we included a country indicator in order to account for the clustering in the data. We ran multiple models for both outcomes in both branches of the analysis, varying three parameters in the model – the number of trees to grow (100 or 200), the number of variables to choose from for splitting (6 values at approximately equal intervals between 2 and the total number of covariates), and the in-bag fraction (0.4, 0.6, or 0.8). In order to determine the most important covariates of DTP3 and MCV1 vaccination coverage rates, we looked at the variable importance indicator calculated based on the loss reduction,Citation53 and took a count of the number of times each covariate appeared among the “top 20” variable importance scores. Top-scoring covariates were then retained for the next phase of the analysis.

  • Multiple imputation (MI)

Initially, the dataset for the main branch of the analysis showed 51% overall missingness, while the dataset for the secondary branch showed 53% missingness. After variable-selection, these numbers were reduced to 19% and 24%, respectively. In order to address this remaining issue, we performed multilevel multiple imputation (in R) with the reduced dataset obtained in the previous analytical step. Multiple imputationCitation54 is the current “state-of-the-art” approach to dealing with missingness, as it accounts for the inherent uncertainty when imputing, which is ignored by single imputation techniques. Under the missing-at-random (MAR) assumption, we implemented the multivariate imputation by chained equations (MICE) algorithm,Citation40 also known as fully conditional specification (FCS). MICE does not assume a multivariate distribution for the data, but instead uses a set of conditional densities. Imputation is done on a variable-by-variable basis, iterating over a conditionally specified imputation model for each incomplete variable. We used the predictive mean matching (PMM) method, which entails first calculating predictions for each entry of the target variable, whether observed or missing in reality. For each missing entry, then, a small number of “candidate donors” among the observed cases are selected, based on proximity between the predictions. At the end, one donor from the group is randomly selected, and its observed value is imputed for the missing entry. Some of the advantages of this method are as follows: it ensures that the imputed value is always within the plausible range as it is based on observed data; it has the ability to handle all types of variables (our dataset contained both continuous and categorical ones); and it is robust to transformations. As our outcome variables were proportional, the data ranged between 0 and 1. Roughly 75% of all vaccination coverage rates, in fact, lie between 0.90 and 0.99, which necessitated applying the variance-stabilizing arcsine transformation. In addition, we standardized all variables to have a mean of 0 and a standard deviation of 1. We implemented the simplest proper imputation model, where for the imputation of a target variable we used only variables that were going to be included together with it in the same substantive model (in the next step). In order to minimize convergence issues, we ran the MICE procedure multiple times (with different seeds) generating a smaller number of imputations at a time, for a total of 20 imputed datasets – a number considered sufficient with 18–19 covariates included in the model.

  • Generalized estimating equations (GEE)

In the final stage of the analysis, we applied (again in SAS) the generalized estimating equations techniqueCitation41 to each imputed dataset obtained in the previous step individually, considering the covariates that were previously selected as most important via the random forests procedure (see above). GEE is a well-known semi-parametric approach to analyzing longitudinal data. It belongs to the marginal models family, which treats the within-subject (in our case, within-country) covariance structure as a nuisance and focuses on the mean, or population-averaged, response. The procedure has the advantages of being relatively simple computationally and avoiding distributional assumptions. It iteratively calculates re-weighted least squares using a pre-specified working correlation matrix for the weights. In the presence of time-dependent covariates, our only valid choice for this working correlation matrix was that of independence among the responses.Citation55 The normal distribution with the identity link function was used for our standardized arcsine-transformed vaccination coverage rates. All inferences were based on the robust, “sandwich estimator,” standard errors. Lastly, for each of the four scenarios separately – DTP3 and MCV1 in the main branch, and DTP3 and MCV1 in the branch without life-expectancy and under-5 mortality, results from the 20 individual GEE models were pooled together according to Rubin’s.Citation54