ABSTRACT
The importance of controlling for intragroup correlation in clustered samples is largely acknowledged in applied econometrics. However, this issue has remained underexplored in tourism research. In many instances, the observation units are naturally grouped, either geographically or due to the sampling scheme, and therefore the iid assumption of the error term in linear regression is broken. This paper presents two case studies to show how default standard errors overstate the estimator precision when the error terms are independent across clusters but correlated within clusters. First, hedonic pricing functions for the Airbnb rental market are revisited using data for almost 225,000 listings in 14 countries. Second, destination choice modelling is reconsidered exploiting monthly household microdata for Spain involving 115,937 tourism trips between 2015 and 2019. Practical implications for research practice are derived.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Airbnb is the most analyzed online rental market for peer-to-peer accommodations. However, (i) web-scrapping techniques for drawing listings’ data are not purely random, and (ii) the data collected normally are a snapshot of the accommodations available at a specific day. As a result, datasets used in applied studies can be considered to some extent clustered samples.
2 A review of the typical explanatory variables used in Airbnb hedonic pricing functions can be found in Sainaghi et al. (Citation2021).
3 The estimated value of lambda is and is statistically different from zero. As discussed in Cameron and Trivedi (Citation2009, p. 94), since the point estimate is closer to zero than to one, the Box Cox regression provides greater support for a log-linear model.
4 February and March and Australia are left are taken here as the excluded categories.
5 Recall that heteroskedastic-robust standard errors assume the variance-covariance matrix of the residuals is diagonal.
6 In all the cases the fixed effects are estimated in the form of dummy variables. Incidental parameter problems are minimized here since the number of parameters to be estimated for the three dimensions (regions, months and municipality type) are low relative to N.