178
Views
6
CrossRef citations to date
0
Altmetric
Research Article

Pull your small area estimates up by the bootstraps

, &
Pages 3304-3357 | Received 01 Sep 2020, Accepted 03 May 2021, Published online: 18 May 2021
 

Abstract

This paper presents a methodological update to the World Bank's toolkit for small area estimation. The paper reviews the computational procedures of the current methods used by the institution: the traditional ELL approach and the Empirical Best (EB) addition introduced to imitate the original EB procedure of Molina and Rao [Small area estimation of poverty indicators. Canadian J Stat. 2010;38(3):369–385], including heteroskedasticity and survey weights, but using a different bootstrap approach, here referred to as clustered bootstrap. Simulation experiments provide empirical evidence of the shortcomings of the clustered bootstrap approach, which yields biased and noisier point estimates. The document presents an update to the World Bank’s EB implementation by considering the original EB procedures for point and noise estimation, extended for complex designs and heteroscedasticity. Simulation experiments illustrate that the revised methods yield considerably less biased and more efficient estimators than those obtained from the clustered bootstrap approach.

JEL classifications:

Acknowledgments

The authors acknowledge financial support from the World Bank. We thank Samuel Freije-Rodriguez, Roy van der Weide, Alexandru Cojocaru and David Newhouse for comments on an earlier draft. We also thank Kristen Himelein, and Carlos Rodriguez for comments, suggestions and overall guidance. Additionally we thank Carolina Sánchez for providing support and space to work on this. Finally, we thank the Global Solutions Group on Welfare Measurement and Statistical Capacity, as well as all attendants of the Summer University courses on Small Area Estimation. Any error or omission is the authors responsibility alone. This work was also supported by the Spanish grants MTM2015-69638-R (MINECO/FEDER, UE) and MTM2015-72907-EXP from Ministerio de Economía y Competitividad.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 The bootstrap procedure is not discussed in Van der Weide's paper.

2 A factor which likely contributed to these methods being well known is the readily available software, namely PovMap Zhao [Citation36] for ELL and R's sae package of Molina and Marhuenda [Citation24]

3 Poverty mapping is the common name within the World Bank for SAE methodology, where the obtained estimates are mapped for illustrative purposes.

5 Users should be aware that results from Stata and PovMap differ slightly due to the use of different random number generators

6 Haslett et al. [Citation19] presents the problems in the original GLS implementation of ELL.

7 For a detailed look into the ELL approach, interested readers should refer to the original ELL papers (ELL [Citation10,Citation11]) and Section 3 of Nguyen et al. [Citation27], which presents the current GLS estimator from Van der Weide [Citation35]

8 In a comparison of the simulation methods proposed by ELL [Citation10], Demombynes et al. [Citation7] shows that the delta method from ELL and the parametric drawing of the parameters provide similar results. In tests with pseudo surveys, the delta method seems to provide wider standard errors than the parametric approach (Demombynes et al. [Citation7]), suggesting that perhaps the parametric estimates are too optimistic when compared to the delta method

9 Note that under a model-based approach, the true value of the indicator τc is random. Under this setup, an estimator/predictor τ^c of τc is said to be unbiased when E(τ^cτc)=0 or, accordingly, when E(τ^c)=E(τc).

10 Note that the total survey sample size n is typically large.

11 Note that here welfares are generated for all households in the census, sampled and non-sampled.

12 Interested readers should refer to Van der Weide [Citation35] and/or Nguyen et al. [Citation27] for an in-depth look at how these are obtained under ELL fitting method and Henderson's method III.

13 In the traditional ELL method of Section 2, σ^η2 is assumed to follow a Gamma distribution, which does not hold in this case.

14 Actually, Van der Weide [Citation35] does not offer a method for the estimation of the standard errors, but the approach described below was the one implemented on PovMap and consequently also in the Stata sae package from Nguyen et al. [Citation27].

15 Note that not all clusters are expected to appear

16 Note that, in each bootstrap replicate, the census is generated from a different model.

17 These may come from the census or administrative records (ELL [Citation11, p. 356]).

18 The same units are sampled in every simulation and the values of x1 and x2 for all the census units are also kept fixed; this means that the x1 and x2 values for the sample units are always the same across simulations.

19 For a detailed description of the difference between the methods, readers should refer to Nguyen et al. [Citation27].

20 E(τ^cjτc) for the bias, and E(τ^cjτc)2 for the MSE.

21 For ELL method from Section 2 and H3 CBEB from Section 4, since the process relies on a single computational algorithm, we take M = 1, 000

22 Results not shown for smaller fractions.

23 The ELL estimates are not really flat if sorted from smallest to largest.

24 Results for the Census EB estimators obtained from the original procedure of MR [Citation26] with REML estimation are not shown since they are aligned to the Census EB estimators from Section 6.

25 Another slight modification made in separate simulations is that the location effect is simulated as ηciidN(0,0.072).

26 Results available upon request. Under this scenario, the resulting estimate σ^η turned out to be negative in many simulated populations. In those cases, a new population was generated.

27 For other FGT indicators see figures: , and .

28 For other FGT indicators see figures: , and .

29 Note that the purpose in SAE is not obtaining estimators performing well on average across areas since, in that case, the overall sample mean at the population level would be sufficient.

30 Note that for a random variable v, the transformation that leads to normality is given by Φ1(F(v)), where F() is the true cumulative distribution function (cdf) of v and Φ() is the standard normal cdf. The problem is that the true cdf F() is typically unknown.

31 For other FGT indicators see figures: , and 

32 Similar conclusions were obtained for other indicators, such as the poverty severity (FGT indicator with α=2), although results are not shown for brevity.

33 For other FGT indicators see Figures: , and .

34 See Tables  and  for FGT0 and FGT2, respectively.

35 Note that σ^η2 is not the one estimated from the sample, it comes from a bootstrap sampling of the data.

36 This simulation is executed under the same scenario as that of Subsection 7.2, except that L = 5, 000 instead of L = 10, 000

37 Income is defined as money received from work performed during the course of the reference week by individuals of age 12 or older within the household.

38 Roughly 10 households are sampled from each selected PSU; hence, the mentioned median municipality in the sample is represented by just 10 households.

39 For a thorough discussion on this, see Corral et al. [Citation3]

40 An update to Nguyen et al. [Citation27] is in progress, but all Stata codes and commands used in this document are available at https://github.com/pcorralrodas/SAE-Stata-Package

41 The revision is also accompanied by an update to the 2018 Stata ‘sae’ package by Nguyen et al. [Citation27]

Additional information

Funding

This work was supported by Ministerio de Economía y Competitividad (Spain) [MTM2015-69638-R], [MTM2015-72907-EXP].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,209.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.