ABSTRACT
Many empirical studies in tourism research use primary or secondary survey data. Generally, non-response is not random in the target population, so sampling weights are required to adjust for endogenous sampling. However, it is yet unclear in which situations survey weights should be applied. This paper proposes two simple tests to detect whether weights are truly needed. The relevance of weighting for correct inference and how to perform the tests are illustrated using two case studies that employ large-scale survey microdata on travel habits by European residents from the Flash Eurobarometer. First, we investigate the correlates of travel frequency paying attention to the role of age. Second, we study cross-country heterogeneity in the factors that motivate European citizens to consume peer-to-peer accommodation services. Our findings have important implications for research practice, and the proposed tests can be easily applied in case studies to detect the need for sampling weighs.
Acknowledgements
The author wishes to thank five anonymous referees for their comments and suggestions.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 We focus on the case of secondary data from public statistical offices for simplicity. In these cases, survey weights are taken by the researcher as given rather than having to construct them.
2 OLS is the Best Linear Unbiased Estimator (henceforth BLUE) of the expected value of an outcome. It requires lack of perfect collinearity, uncorrelation between the covariates and the error term, homoskedasticity and normally distributed errors. Potential heteroskedasticity can be easily handled using White-robust or clustered standard errors when the form of heteroskedasticity is unknown. The potential non-normality of the residuals is an issue of concern in small samples. In large samples, however, Gauss-Markov Theorem does not require the residuals to be normally distributed for OLS to be a BLUE estimator. The central limit theorem kicks in and justifies that the OLS estimators are well approximated by a multivariate normal distribution (Wooldridge, Citation2008).
3 The reader is referred to the works by Pfeffermann (Citation1993), Pfeffermann and Sverchkov (Citation1999), Magee et al. (Citation1998), Solon et al. (Citation2015) and Bollen et al. (Citation2016) for further details on this.
4 This could reflect the fact that unobservable factors that determine the probability of being sample also affect the outcome, potentially resulting in bias from omitted variables.
5 The reader is referred to Bollen et al. (Citation2016) for a deeper discussion about alternative methods to determine the need for weighting.
6 To some extent, this can be seen as a descriptive analysis of the conditional-on-covariates relationship between age and travel intensity.
7 Because of the formula in (4), each individual has a distinct marginal effect depending on his/her age and actual trips taken. We plot the mean values of the marginal effects for each potential age.
8 Because we use a logistic regression, we cannot recover the residuals to perform the first of the two proposed tests. That is why in this case we only resort on the second one.