12
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Geographic disaggregation of household surveys. Modified estimators for correcting misclassification bias

Received 11 Feb 2022, Published online: 03 May 2024
 

ABSTRACT

We present a two-stage method to estimate spatial conditional means at a higher spatial resolution than the data actually have. In the first stage, we increase the spatial resolution of the data using classification tools and ancillary data. In the second stage, we estimate spatial conditional means (conditioning on the new spatial resolution). The estimation procedure in the second stage is not straightforward because the new finer spatial areas are subject to misclassification (measurement error). We prove that the least square (LS) estimators are biased under this framework and propose a consistent and asymptotically normal estimator under non-differential measurement errors. Given that the proposed estimator depends on unobservable terms, we also present its feasible version. Unlike most of the spatial downscaling methods, our proposal is non-model-based, and does not require area homogeneity assumptions. We assess analytical results by some Monte Carlo simulations, showing that our proposals work properly and outperform the spatial microsimulation approach. Finally, we conduct an empirical application where we analyse poverty and unemployment in one of the main urban agglomerates of Argentina known as Gran Rosario, we spatially disaggregate the original database to make inferences at a finer geographic scale.

ACKNOWLEDGEMENTS

I am very grateful to Mariano Tommasi and all the CEDH members for their comments and useful discussions. I would also like to express my sincere gratitude to the two anonymous referees who reviewed my work. Their contributions were very valuable.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Notes

1 Spatial conditional means are not only relevant by themselves but there are many poverty measures which are a function of conditional means, see (Tarozzi & Deaton, Citation2009).

3 See (Eicher & Brewer, Citation2001), (Goodchild et al., Citation1993) and (Mennis & Hultgren, Citation2006) for empirical applications of areal interpolation to some socioeconomic variables.

4 See (O’Donoghue et al., Citation2014) for an extensive review of applications and methods.

5 There are several SAE methods, such as (Fay III & Herriot, Citation1979), (Elbers et al., Citation2003), (Tarozzi & Deaton, Citation2009), (Molina & Rao, Citation2010), (Molina et al., Citation2014), (Fernandez-Vazquez et al., Citation2020), among others.

6 See (James et al., Citation2013) for a deeper insight into classification methods.

7 Although we focus on spatial data, our proposal applies to more general contexts.

8 In Annex A.1 in the online supplemental data we provide a detailed analysis of the bias caused by misclassification in the LS estimators.

9 Authors have another simulation scheme with a different artificial population where variability assumption holds. It allows to assess all formal derivations and is available upon request.

10 In empirical applications, the classification rule is selected by the modeller depending on their criteria. Here we apply an arbitrary one because it allows us to manipulate the classification errors. Our proposals do not depend on the type of classifier but on the classification errors.

11 We only plot some values to improve visualization.

12 The National Institute of Statistics and Census from Argentina runs all the official statistical activities carried out throughout the country. https://www.indec.gob.ar/.

13 Dichotomizing variables is a controversial technique (Altman & Royston, Citation2006). To strictly follow our proposal, we should have the same observational units in both data sets, that is, we should have data at households level in the census. Although it is not necessary to get all census observations, a sample is enough.

14 The selection of the predictors is not a trivial issue. See appendix C for some details.

15 We estimate the confusion matrix dividing the census in training (70%) and test sets (30%), but results are quite similar, the accuracy is around 97%, so we prefer to use the test set to adjust the classifier.

16 For poverty status, only a few correlation coefficients are around 0.25 and the others are below 0.10. For unemployment, all of them are below 0.08.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 254.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.