1,757
Views
0
CrossRef citations to date
0
Altmetric
Guest Editorial

Survey sampling and small-area estimation

&

This issue is devoted to survey sampling methods. It carries on a tradition of Mathematical Population Studies, after the issues guest-edited by Malay Ghosh and Tomasz Ża̧dło (Citation2014) and Vera Toepoel and Schonlau (Citation2017).

Wright (Citation2001) presented some major moments of the history of survey sampling. He acknowledged the pioneering work of Pierre Simon de Laplace (Citation1878-Citation1912; Gillispie, Citation1997), who estimated the population size of France in 1802 based on a sample of communes, which were administrative districts. He multiplied the population size of the sampled communes by the ratio of the recorded total number of births for the whole country to the one recorded in the sample. He used the same method to estimate the population size of France for 1782. John Graunt (Citation1665) had also used a similar calculus to estimate the population size of England in 1662.

In design-based inference, introduced by Neyman (Citation1934), the values taken by the variable of interest are considered as fixed and the sampling design is the only source of randomness affecting the estimates. In model-based inference, the values taken by the variable of interest are considered as the realizations of random variables. The set of conditions defining the class of this distribution is called “super-population” model (Cassel et al., Citation1976: 80) and inference is made conditionally on the sample, which is either drawn at random or chosen purposively from the population. Accuracy is measured only over possible realizations of the variables. In model-assisted inference, the model is used to increase accuracy, but good design-based properties, such as design consistency, are of primary interest. Various methods include calibration estimators and pseudo-empirical best linear unbiased predictors. The accuracy of the former is evaluated through randomization techniques; the accuracy of the latter through a model. In the Bayesian framework, the estimator is a conditional expectation in the posterior distribution of the population or subpopulation parameters and the posterior variance is used as a measure of the variability of the Bayesian estimator. This Bayesian technique applies to continuous, binary, and count data.

Sampling methods are used in surveys dealing with households, business, agriculture, and environment, to name a few. A research direction is inference on subpopulation (or domain) parameters in the case of small or even null sample sizes. This subject has given rise to small-area estimation. The term “small area” denotes any subpopulation for which direct estimates are out of reach with satisfactory accuracy and for which the direct estimator uses the sole information available in the domain and the period of interest. Indirect estimators or predictors borrow information from other domains or periods. According to Rao and Molina (Citation2015), the first application of indirect estimation was a survey about radio listening in the United States conducted in 1945. A regression was used to correct the biased direct estimators of the medians of the variables of interest. The case is commented in Hansen et al. (Citation1953).

The four contributions included in this special issue intertwine design and estimation. Estimation can be based on super-population models; it can be Bayesian or frequentist. Design more than ever conditions the quality of the results, and inference based on randomization remains the reference. Design and estimation are inseparable. They both involve available auxiliary information.

Malay Ghosh, Jiyoun Myung, and Paduthol Godan Sankaran (published in the previous issue 25(3)) estimate the population median based on a Bayesian nonparametric technique. The absolute error loss is more complicated to handle than the quadratic loss. The authors consider the flexible class of Dirichlet process prior. They address finite populations characterized by a probability distribution over a finite set of points. They combine Bayesian with asymptotic frequentist properties. Survey sampling and small area estimation lead them to put aside randomization-based inference, but to keep design-related properties in the evaluation.

Mauno Keto, Jussi Hakanen, and Erkki Pahkinen use a linear mixed model to maximize the accuracy of the small-area estimate at both the area and the population levels. By simulation, they compare the efficiency of possible allocations resulting from either the model- or the design-based estimators.

Tomasz Ba̧k examines the ways of selecting balanced samples over space, which is a problem raised when the target population is structured along a network. This is the case encountered in agriculture, in forestry, in environmental sciences, and in the social sciences. A network allows one to define a distance and the property of adjacency between population units. The efficient design then involves the information on the structure. Ba̧k builds on Wywiał (Citation1996) to develop the case of so-called “drawn-by-drawn” samples, which are samples where only two successively sampled units can be neighbors to each other.

Barbara Kowalczyk and Dorota Juszczak address rotating panel surveys, which are repeated surveys where some respondents are replaced at each round. Most large-scale social surveys, including the European Union Survey on Income and Living Conditions, are designed that way. Information on target variables collected on previous rounds is useful for the entire survey. This improves the efficiency of estimates. The authors consider the situation where the target variables are numerous and weights are independent of the variable of interest. Their comparison to other estimators is encouraging.

References

  • Cassel, C. M., Sarndal, C. E., and Wretman, J. H. (1976). Foundations of Inference in Survey Sampling. New York, NY: Wiley.
  • Ghosh, M. and Żądło, T. (2014). Editorial. Special issue: Survey sampling methods. Mathematical Population Studies, 21(1): 1.
  • Gillispie, C. C. (1997). Pierre Simon de Laplace, 1749-1827, A Life in Exact Science. Princeton, NJ: Princeton University Press.
  • Graunt, J. (1665). Natural and Political Observations Mentioned in a following Index, and Made upon the Bills of Mortality. London, UK: John Martin and James Allefry.
  • Hansen, M. H., Hurvitz, W. N., and Madow, W. G. (1953). Sample Survey Methods and Theory. New York, NY: Wiley.
  • Laplace (de), P. S. ( published 1878-1912). Œuvres, Vol. VII. Paris: Gauthier-Villars, 398–401.
  • Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97(4): 558–625.
  • Rao, J. N. K. and Molina, I. (2015). Small Area Estimation. (2nd Ed.). Edison, NJ: Wiley.
  • Toepoel, V. and Schonlau, M. (2017). Dealing with nonresponse: Strategies to increase participation and methods for postsurvey adjustments. Mathematical Population Studies, 24(2): 79–83.
  • Wright, T. (2001). Selected moments in the development of probability sampling: Theory & practice. Survey Research Methods Section Newsletter, 13: 1–6.
  • Wywiał, J. (1996). On space sampling. Statistics in Transition, 2(7): 1185–1191.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.