2,135
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

ORCID Icon, & ORCID Icon
Pages 1229-1239 | Received 01 Mar 2019, Accepted 31 Jan 2023, Published online: 03 Apr 2023

ABSTRACT

Standard statistical methods without taking proper account of the complexity of a survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the Type I error rates of hypotheses tests using standard methods can be much larger than the nominal significance level. Methods incorporating design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests that involve estimated covariance matrices of parameter estimates. In this article, we present a unified approach to hypothesis testing without requiring estimated covariance matrices or design effects, by constructing bootstrap approximations to quasi-likelihood ratio statistics and quasi-score statistics and establishing its asymptotic validity. The proposed method can be easily implemented without a specific software designed for complex survey sampling. We also consider hypothesis testing for categorical data and present a bootstrap procedure for testing simple goodness of fit and independence in a two-way table. In simulation studies, the Type I error rates of the proposed approach are much closer to their nominal significance level compared with the naive likelihood ratio test and quasi-score test. An application to an educational survey under a logistic regression model is also presented. Supplementary materials for this article are available online.

1 Introduction

Testing statistical hypotheses is one of the fundamental problems in statistics. In the parametric model approach, hypothesis testing can be implemented using Wald test, likelihood ratio test, or score test. In each case, a test statistic is calculated and then compared with the 100(1α)%-quantile of the reference distribution, which is the limiting distribution of the test statistic under the null hypothesis, where α is the nominal significance level. The limiting distribution is often a Chi-squared distribution (Shao Citation2003).

However, statistical inference with survey data involves additional steps incorporating the sampling design features. Korn and Graubard (Citation1999) and Chambers and Skinner (Citation2003) provided comprehensive overviews of the methods for analyzing survey data from complex sampling designs. In hypothesis testing with survey data, the limiting distribution of the test statistic is generally not a Chi-squared distribution. Rather, it can be expressed as a weighted sum of several independent random variables from χ2(1), which is a Chi-square distribution with one degree of freedom, and the weights depend on unknown parameters. To handle such problems, one may consider some corrections to the test statistics to obtain a Chi-square limiting distribution approximately. Such an approach usually involves computing “design effects” associated with test statistics (Lumley and Scott Citation2014).

In this article, we use a different approach to computing the limiting distribution using a bootstrap approximation. Beaumont and Bocci (Citation2009) investigated a general weighted bootstrap method for hypothesis testing under complex sampling designs in the context of Wald tests for linear regression analysis. In this article, we generalize the bootstrap testing idea of Beaumont and Bocci (Citation2009) and present a unified bootstrap weight approach to obtain the limiting distribution of test statistics under complex sampling designs, including stratified multi-stage sampling with clusters selected with replacement.

The proposed method is a bootstrap weight method (Mashreghi, Haziza, and Léger Citation2016) under complex survey sampling. Although Praestgaard and Wellner (Citation1993) and Chatterjee and Bose (Citation2005) rigorously investigate bootstrap weight methods under regularity conditions, both may provide erroneous inferences under complex survey sampling. Praestgaard and Wellner (Citation1993) established central limit theorems for the weighted bootstrap empirical process, but they assumed that the sample is independently and identically distributed (iid). Although it is common to assume that the finite population is iid, elements in a sample may not be due to complex sampling designs; see Pfeffermann (Citation1993) for details. In addition, Praestgaard and Wellner (Citation1993) only focused on mean estimation, but we are interested in hypothesis testing for parameters that solve a certain estimating equation. Chatterjee and Bose (Citation2005) proposed a bootstrap weight method for estimating functions, and their theoretical results are valid even for correlated random variables. However, the bootstrap weight method of Chatterjee and Bose (Citation2005) is not applicable to informative sampling designs, since survey weights are not taken into consideration; see Pfeffermann (Citation1993) for details.

There exist bootstrap weight methods under complex survey sampling, but the proposed bootstrap method differs from them in the following aspects. Rao, Wu, and Yue (Citation1992) proposed a bootstrap weight method to estimate the variance of a function of population total estimators under stratified random sampling, but did not investigate the theoretical properties of their bootstrap method. Chipperfield and Preston (Citation2007) proposed a without replacement scaled bootstrap to achieve the same goal as Rao, Wu, and Yue (Citation1992) under stratified random sampling, but their method is only applicable when the parameter of interest is a smooth function of population totals. Moreover, neither method is applicable to other complex survey sampling. Bertail and Combris (Citation1997) proposed a bootstrap weight method to make inferences for population means under general sampling designs, and the corresponding theoretical properties were investigated. However, their results are not applicable to parameters obtained by solving an estimating equation. Although Beaumont and Patak (Citation2012) developed a general bootstrap weight method for parameter solving an estimating equation in general sampling designs, they focused only on variance estimation without a rigorous discussion about hypothesis testing. Antal and Tillé (Citation2011) proposed a one-one bootstrap method for variance estimation under general sampling designs. Similar to the one-one bootstrap method, Antal and Tillé (Citation2014) developed a doubled half-bootstrap to estimate the variance of the estimator for population means. However, neither Antal and Tillé (Citation2011) nor Antal and Tillé (Citation2014) considered parameters solving an estimating equation. Moreover, existing work did not establish limiting distributions for the corresponding bootstrap estimator, especially for the one solving an estimating equation. Thus, up to our knowledge, there is no theoretical guarantee for existing methods when used for hypothesis testing.

Since the use of bootstrap for hypothesis testing is not fully investigated in the literature, our goal is to fill this important gap under general sampling designs. To do this, we first establish a bootstrap central limit theorem under regularity conditions. The sampling design is allowed to be informative in the sense that the sampling mechanism depends on the response variables being sampled (Pfeffermann Citation1993). Once the bootstrap central limit theorem is established, the proposed bootstrap method can be applied to the quasi-likelihood-ratio test and the quasi-score test. As long as the bootstrap central limit theorem holds for the sampling design, the bootstrap replicates of the test statistics can be used to approximate their distributions under the null hypothesis. Besides, this article also has a practical contribution to survey sampling, since bootstrap weights are commonly provided in agencies, including Statistics Canada.

The article is organized as follows. In Section 2, the basic setup is introduced. The proposed bootstrap method is presented in Section 3. In Section 4, the quasi-likelihood ratio test and the quasi-score test are introduced. Test for categorical survey data is briefly discussed in Section 5. The results of two simulation studies are presented in Section 5. An application to data from an educational survey under a logistic regression model is presented in Section 6. Concluding remarks are made in Section 7.

2 Basic Setup

Suppose that a finite population FN={(zi,xi,yi):i=1,,N} is a random sample of size N from a super-population model (Isaki and Fuller Citation1982; Pfeffermann Citation1993; Sverchkov and Pfeffermann Citation2004; Rubin-Bleuer and Schiopu-Kratina Citation2005), where yi is the response of interest, and xi is the corresponding auxiliary vector of length d, and zi is the vector of design variables such that the first-order inclusion probability for unit i can be obtained by πi=π(zi;ϕN), where ϕN is a design parameter determined by {z1,,zN}. In other literature, the inclusion probability is expressed as πi=π(FN,i); see Sverchkov and Pfeffermann (Citation2004) for details.

We are interested in hypothesis testing for a parameter θ0=argmaxθΘE{l(θ;X,Y)}, where ΘRp is the parameter space, (X,Y) is the random variable associated with elements of FN, the expectation is taken with respect to the super-population model, and l(θ;x,y) is a pre-defined objective function. For example, if the finite population is a random sample generated from a super-population model with a density function f(x,y;θ0) and if θ0 is the parameter of interest, then l(θ;x,y)=logf(x,y;θ). For regression problems, θ0 can be the parameter in the conditional expectation of the response of interest given the auxiliary vectors.

From the finite population FN, a probability sample {(xi,yi):iA} of size n is selected by a probability sampling design with the first-order inclusion probability πi, where A is the set of sample indexes. The design variables zi and the design parameter ϕN are not available at the sample level. From the sample, we can compute a pseudo M-estimator of θ0 by θ̂=argmaxθΘlw(θ),where lw(θ)=N1iAwil(θ;xi,yi) is the survey-weighted objective function and wi=πi1 is the sampling weight for iA. Note that lw(θ) is design-unbiased for lN(θ)=N1i=1Nl(θ;xi,yi) in the sense that E{lw(θ)|FN}=lN(θ), where the conditional expectation conditioning on FN is with respect to the sampling mechanism. To be rigorous, a subscript N should also be used to index the sample, the sampling weights, the estimator and other quantities, but it is suppressed in the sequel for simplicity.

Remark 1.

When discussing a finite population in the preceding paragraph, we implicitly assume that elements of FN are iid. Such an assumption is widely adopted in survey sampling; see Corollary 1.3.2.1 and Section 2.2.1 of Fuller (Citation2009) for details. Besides, under a linear regression model, (1) Y=XTβ0+ϵ(1) with E(ϵ|x)=0, the finite population {(xi,yi):i=1,,N} satisfies the iid assumption; see Särndal, Swensson, and Wretman (Citation1989, sec. 5.1), Breidt and Opsomer (Citation2000), Särndal, Swensson, and Wretman (Citation2003, sec. 6.4), Kim et al. (Citation2006, sec. 5), Toth and Eltinge (Citation2011), Wu and Thompson (Citation2020, sec. 5.1) and Han and Wellner (Citation2021) for more general setups about the super-population model. If we are interested in the regression parameter β0 in (1), an objective function is l(θ;x,y)=(yxTβ)2.

Although we implicitly make an iid assumption for the finite population, elements in the sample A may not be iid due to complex survey designs. That is, the informative sampling no longer preserves the iid structure in the sample; see Pfeffermann (Citation1993) for details. Under certain sampling designs, we can relax the iid assumption, and the super-population model can involve cluster random effects (Rubin-Bleuer and Schiopu-Kratina Citation2005); refer to Section S4 of supplementary material for an extension to stratified multi-stage sampling with cluster random effects. We focus on element sampling mainly in this article.

Since the parameter of interest is the superpopulation parameter, the final sample can be viewed as a realization of the two-phase sampling design, where the finite population is the first-phase sample and the final sample is the second-phase sample. Chen and Rao (Citation2007) and Rubin-Bleuer and Schiopu-Kratina (Citation2005) present rigorous asymptotic setups for two-phase sampling.

Often, the pseudo M-estimator θ̂ can be obtained by solving (2) Ŝw(θ)=θlw(θ)=1NiAwiS(θ;xi,yi)=0,(2) where S(θ;x,y)=l(θ;x,y)/θ is the tangent function of θ. To discuss the asymptotic properties of the pseudo M-estimator, we assume the following regularity conditions for a sequence of finite populations and samples (Isaki and Fuller Citation1982).

C1. The parameter space Θ is an open and convex set containing θ0 as an interior point. For θΘ,l(θ;x,y) is concave and twice-continuously differentiable with respect to θ,E{|l(θ;X,Y)|}<, and E{l(θ;X,Y)} is uniquely maximized at θ0.

C2. For θΘ,V{lw(θ)|FN}0 in probability with respect to the super-population model as n, where V{lw(θ)|FN} is the design variance of lw(θ) conditional on the finite population FN. In addition, there exists a nonnegative design variance estimator V̂{lw(θ)|FN}, such that E[V̂{lw(θ)|FN}|FN]=V{lw(θ)|FN} almost surely.

C3. There exists a compact set BΘ containing θ0 as an interior point, such that supθBnV{Ŝw(θ)|FN}ΣS(θ)0 in probability with respect to the super-population model as n, where V{Ŝw(θ)|FN} is the design variance of Ŝw(θ) conditional on the finite population, M=supxRm,x=1Mx is the operator norm of an m × m matrix M associated with the Euclidean vector norm ·, and ΣS(θ) is a positive-definitive nonstochastic matrix.

C4. A central limit theorem holds for the survey-weighted score function Ŝw(θ0) in (2). Specifically, n1/2Ŝw(θ0)N(0,ΣS(θ0))in distribution as n with respect to the super-population model and the sampling mechanism.

C5. The weak convergence of Îw(θ)=N1iAwiI(θ;xi,yi) is uniform in θB, where I(θ;x,y)=S(θ;x,y)/θT. Specifically, supθBÎw(θ)I(θ)0 in probability with respect to the super-population model and the sampling mechanism as n, where I(θ)=E{I(θ;X,Y)}, and I(θ0) is invertible.

Conditions C1–C2 are sufficient conditions for the weak consistency of θ̂. Conditions C3–C5 are used to establish the asymptotic normality of θ̂. In Conditions C3–C4, we consider the case where V[E{S^w(θ)|FN}] is asymptotically negligible compared with E[V{Ŝw(θ)|FN}], and it is often the case when n/N0 as n. That is, we can safely ignore the variability for generating the finite population and focus on the design variance of Ŝw(θ) given the finite population. This assumption is a building block of the proposed bootstrap method in the next section, and a similar assumption is also proposed by Beaumont and Bocci (Citation2009); see Assumption 5 and its discussion of Beaumont and Bocci (Citation2009) for details.

As mentioned by Han and Wellner (Citation2021), “establishing a finite-dimensional CLT … can be a nontrivial problem for general sampling designs,” and it is beyond our scope to prove Condition C4 under general sampling designs. In fact, there is no guarantee that the central limit theorem holds for general sampling designs. Instead, we simply consider the set of sampling designs such that the central limit theorem holds; see Hájek (Citation1964), Rosén (Citation1972a), Rosén (Citation1972b), Berger (Citation1998), Rubin-Bleuer and Schiopu-Kratina (Citation2005), sections 1.3.3–1.3.4 of Fuller (Citation2009), Chauvet (Citation2015), and Han and Wellner (Citation2021) for details. For simplicity, we use the same notation, “·”, to denote the matrix norm and the vector norm. Detailed comments on these regularity conditions are presented in Section S1 of supplementary material. For the proposed conditions, we implicitly assume that N as n. The following theorem establishes the asymptotic normality of θ̂. Its proof can be found in Section S2 of supplementary material.

Theorem 1.

Under Conditions C1–C5, we have (3) n(θ̂θ0)N(0,Σθ)(3) in distribution as n with respect to the joint distribution of the super-population model and the sampling mechanism, where Σθ=(θ0)1ΣS(θ0){(θ0)1}T.

Now, by the second-order Taylor expansion, we obtain (4) lw(θ0)=lw(θ̂)+ŜwT(θ̂)(θ0θ̂)12(θ̂θ0)TÎw(θ̂)(θ̂θ0)+op(n1)=lw(θ̂)12(θ̂θ0)TÎw(θ̂)(θ̂θ0)+op(n1),(4) where Îw(θ)=N1iAwiI(θ;xi,yi). Define (5) W(θ0)=2n{lw(θ0)lw(θ̂)}.(5)

By (4), we obtain W(θ0)=n(θ̂θ0)TÎw(θ̂)(θ̂θ0)+op(1).

Thus, by Theorem 1, we have (6) W(θ0)G=i=1pciZi2(6) in distribution as n, and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where c1,,cp are the eigenvalues of D=Σθ(θ0), and Z1,,Zp are p independent random variables from the standard normal distribution. Result (6) was established by Rao and Scott (Citation1984) and Lumley and Scott (Citation2014), and it can be regarded as a version of the Wilks’ theorem for survey sampling. For p = 1, we can use c11W(θ0) as the test statistic with χ2(1) distribution as the limiting distribution under the null hypothesis, where c1=σS2(θ0)/I(θ0), and σS2(θ) and I(θ) are the counterparts of Σs(θ) and I(θ), respectively. Unless the sampling design is simple random sampling and the sampling fraction is negligible, the limiting distribution does not reduce to the standard Chi-squared distribution with p degrees of freedom.

3 Bootstrap Calibration

We propose using a bootstrap weight method to approximate the limiting distribution in (6) and other test statistics under complex survey sampling. Such a bootstrap calibration is very attractive in practice since there is no need to derive the analytic form of the limiting distribution of the test statistic. That is, standard software that takes account of the weights can be used to implement bootstrap hypothesis testing from data files reporting weights and associated sets of bootstrap weights, as done in Statistics Canada and some other agencies. Given the sample A, the proposed bootstrap method is as follows:

  • Step 1 Generate a rescaling vector r* of size n=|A| from a bootstrap distribution with E*(r*)=1n, and the covariance of r* is determined by the sampling design, where |A| is the cardinality of set A, E*(·) is the expectation with respect to the bootstrap distribution conditional on the sample A, and 1n is a vector of 1 with length n. The bootstrap weights are chosen in such a way that E*{Sw*(θ)}=Ŝw(θ) and V*{Sw*(θ)}=V̂{Ŝw(θ)|FN}, where Ŝw*(θ)=iAwi*S(θ;xi,yi),wi*=ri*wi,V*(·) is the bootstrap variance conditional on the sample A, and V̂{Ŝw(θ)|FN} is a nonnegative design variance estimator of Ŝw(θ) conditional on the finite population FN.

  • Step 2 Obtain a bootstrap replicate of θ̂, denoted as θ̂*, by solving Ŝw*(θ)=0.

Different distributions may be used to generate r* under different sampling designs, but the generation of r* only depends on the inclusion probabilities and not on the specific form of the estimating function for parameter estimation. Thus, once the bootstrap weights are generated, they can be used for various inference problems based on the sample A. However, the commonly used linearization technique is based on the specific form of the estimating function, so different linearization is required for different hypothesis testing problems. Thus, compared with the linearization technique, the proposed bootstrap method is more appealing to the practitioners.

We now present additional regularity conditions to validate the proposed bootstrap method.

C6. For θΘ, the bootstrap weights are nonnegative and satisfy E*{Ŝw*(θ)}=Ŝw(θ) and V*{Ŝw*(θ)}=V̂{Ŝw(θ)} conditional on the sample A, where V̂{Ŝw(θ)} satisfies nsupθBV̂{Ŝw(θ)}V{Ŝw(θ)}0in probability with respect to the super-population model and the sampling mechanism as n.

C7. A central limit theorem holds for the bootstrap weighted score function Ŝw*(θ) uniformly over B. Specifically, (7) n1/2{Ŝw*(θ)Ŝw(θ)}|A  N(0,ΣS(θ))(7) in distribution uniformly over B, and the reference distribution is the bootstrap sampling distribution conditional on the sample A.

C8. supθBÎw*(θ)Îw(θ)0 in probability with respect to the bootstrap sampling distribution conditional on the sample A as n, where Îw*(θ)=N1iAwi*I(θ;xi,yi).

Condition C6 guarantees that the scaled bootstrap variance of Ŝw*(θ) converges to ΣS(θ) in Condition C3, and they are also used to show the bootstrap central limit theorem in Condition C7. Condition C8 is needed to establish the asymptotic normality of the bootstrap estimator θ̂*. To check Condition C7, we may use a bootstrap version of the Lyapounov condition. As discussed in Condition C4, we do not claim that Condition C7 holds under general sampling designs, but we can validate both under certain sampling designs. For example, in Section S4 of the supplementary material, we use a Lyapounov-type condition to prove the two central limit theorems under stratified multi-stage sampling. Detailed comments on these regularity conditions are presented in Section S1 of the supplementary material. The high-level conditions are also verifiable under other commonly used sampling designs, including Poisson sampling.

Theorem 2.

Let the assumptions for Theorem 1 hold. Then, under Conditions C6–C8, we have (8) n(θ̂*θ̂)|A  N(0,Σθ)(8) in distribution as n, and the reference distribution on the left side of (8) is the bootstrap sampling distribution conditional on the sample A, where Σθ is defined in Theorem 1.

Theorem 2 is proved under general sampling designs; see Section S3 of the supplementary material for details. Theorem 2 establishes the bootstrap central limit theorem for θ̂*, and the limiting distribution coincides with the one in (3) for θ̂. Thus, the proposed bootstrap method can be used to approximate the sampling distribution of θ̂.

For estimating the mean of a finite population, Chao and Lo (Citation1985) first proved a bootstrap central limit theorem under simple random sampling without replacement, and Bickel and Freedman (Citation1984) proved the bootstrap central limit theorem under stratified random sampling without replacement. We now present stratified multi-stage sampling with clusters selected with replacement as an example to validate the bootstrap central limit theorem in Theorem 2. Stratified multi-stage sampling with clusters selected with replacement is extensively used in socio-economic surveys when the first-stage sampling fractions within strata are negligible, as is often the case with large-scale sample surveys.

Example 1.

Under stratified multi-stage sampling, a popular approach is to assume that nh(2) clusters are selected with replacement from the hth stratum, with the selection probabilities phi satisfying i=1Nhphi=1 for h=1,,H (Rao and Wu Citation1988), where Nh is the number of clusters in the hth stratum. As mentioned in Section 2, the selection probabilities can be random with respect to the super-population model. Within each sampled cluster, denote Ŝw,hi(θ) as a design-unbiased estimator of the cluster mean under sampling at the second and subsequent stages. For simplicity, we still use Ŝw,hi(θ) for the case when the (hi)th cluster is fully observed. Then, the weighted score function is Ŝw(θ)=h=1HWhŜw,h(θ), where Wh=Mh/M,M=h=1HMh,Mh=i=1NhMhi, Mhi is the size of the (hi)th cluster, Ŝw,h(θ)=nh1i=1nhS˜w,hi(θ),S˜w,hi(θ)=da(hi)pa(hi)1Ŝw,a(hi)(θ),da(hi)=Ma(hi)/Mh, and a(hi) is the index of the ith selected cluster in the hth stratum.

The bootstrap replicate of Ŝw(θ) can be obtained in the following way. For the hth stratum, generate nh*=(nh1*,,nhnh*)T by a multinomial distribution with nh1 trials and a success probability vector nh1(1,,1)T of length nh. Then, the bootstrap weighted score function is Ŝw*(θ)=h=1HWhŜw,h*(θ), where Ŝw,h*(θ)=nh1i=1nhrhi*S˜w,hi(θ), rhi*=khnhi*, and kh=nh/(nh1); Rao, Wu, and Yue (Citation1992) proposed a similar rescaling bootstrap method under stratified multi-stage sampling, but they did not investigate the corresponding theoretical properties with respect to hypothesis testing of super-population model parameters. By Lemma S7 in Section S4 of supplementary material, we conclude that E*{Ŝw*(θ)}=Ŝw(θ) and V*{Ŝw*(θ)}=V̂{Ŝw(θ)|FN}, where V̂{Ŝw(θ)|FN}=h=1HWh2{nh(nh1)}1S˜h,vec(θ)(InhP1,nh)S˜h,vecT(θ),P1,nh=1nh(1nhT1nh)11nhT is the projection matrix to the linear space spanned by 1nh, 1nh=(1,,1)T is a vector consisting of nh 1’s, Inh is an nh×nh identity matrix, and S˜h,vec(θ)=(da(h1)pa(h1)1Ŝw,a(h1)(θ),,da(hnh)pa(hnh)1Ŝw,a(hnh)(θ)).

In Section S4 of supplementary material, we show that the bootstrap central limit theorem in Theorem 2 holds with n=h=1Hnh.

Under stratified simple random sampling without replacement (Bickel and Freedman Citation1984), a bootstrap method is proposed in Section S7 of the supplementary material, and it is essentially the same as the one considered by Rao and Wu (Citation1988) when estimating the population mean; see Rao and Wu (Citation1988) and Beaumont and Patak (Citation2012) for details.

To establish a bootstrap version of Wilks’ Theorem in (6), we use W*(θ̂)=2n{lw*(θ̂)lw*(θ̂*)}

as the bootstrap version of W(θ0) in (5). Under the assumptions for Theorem 2, we can show that (9) W*(θ̂)|A  G(9) in distribution as n, and the reference distribution is the bootstrap sampling distribution conditional on the sample A, where G is the limiting distribution (6) of the quasi-likelihood ratio test statistic W(θ0) in (5) under the null hypothesis.

By (9), we conclude that the corresponding test statistic generated by the proposed bootstrap method has the same asymptotic distribution as in (5). Thus, we can use the bootstrap distribution of W*(θ̂) to approximate the sampling distribution of W(θ0).

Remark 2. A

s mentioned in Section 1, most existing bootstrap methods were proposed for variance estimation under survey sampling. There indeed exist some papers discussing bootstrap confidence intervals under survey sampling, but these methods do not apply to general hypothesis testing problems. For example, Rao, Wu, and Yue (Citation1992) proposed a bootstrap-t confidence interval for the parameter of interest, but their bootstrap confidence interval does not apply when the parameter of interest is multi-dimensional. In addition, their bootstrap method is only valid under the stratified simple random sampling. Beaumont and Patak (Citation2012) and Bertail and Combris (Citation1997) discussed bootstrap confidence intervals in their simulation studies, but it is not clear how the corresponding intervals are constructed. Also, neither Bertail and Combris (Citation1997) nor Beaumont and Patak (Citation2012) investigated their bootstrap confidence intervals theoretically. Theorem 2, on the other hand, is a building block to show that the proposed bootstrap method can be used for hypothesis testing under general sampling designs. Although we adopt almost the same technique to prove Theorems 1–2, we are the first to establish that the bootstrap estimator θ̂* has the same asymptotic distribution as the original θ̂. Then, we can show that the corresponding bootstrap statistic W*(θ̂) shares the same asymptotic distribution as the original one W(θ0) under the null hypothesis, so the proposed bootstrap method can be used for hypothesis testing without deriving the corresponding variance estimator as in Lumley and Scott (Citation2014). In addition, the proposed bootstrap method can be widely applied in commonly used sampling designs; see Sections S4–S5 of the supplementary material for details.

Remark 3.

In many practical situations, the design weights wi=πi1 are further modified to incorporate population-level constraints (as in calibration weighting) or to handle unit nonresponse (as in nonresponse weighting adjustment). In this case, the bootstrap design weights can be modified in the same way to obtain the bootstrap final weights. See Bessonneau et al. (Citation2021) for a detailed illustration of constructing bootstrap final weights in household surveys. When the test statistic is computed using the final weights rather than the design weights, the bootstrap final weights can be applied to the same test statistics to obtain the bootstrap distribution of the test statistic.

4 Quasi-Likelihood-Ratio Test

Without loss of generality, we denote θ as the true parameter θ0 in this section. Let Θ0Θ be the parameter space under the null hypothesis, so the null hypothesis can be written as H0:θΘ0. In this section, we consider H0:θ2=θ2(0) for some known vector θ2(0), where θ=(θ1T,θ2T)T. Thus, we have Θ0={θΘ;θ2=θ2(0)}. Let θ̂1(0) be the profile pseudo M-estimator of θ1 under H0:θ2=θ2(0), which is obtained by maximizing lw(θ1,θ2(0)) with respect to θ1.

The quasi-likelihood ratio test statistic for testing H0:θ2=θ2(0) is defined as (10) W(θ2(0))=2n{lw(θ̂(0))lw(θ̂)},(10) where θ̂(0)=(θ̂1(0)T,θ2(0)T)T. Under simple random sampling with replacement, W(θ2(0)) in (10) is asymptotically distributed as χ2(q) with q=pp0 and p0=dim(Θ0), and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where dim(Θ) is the dimension of Θ. The following lemma, proved by Lumley and Scott (Citation2014), presents the limiting distribution of the quasi-likelihood ratio test statistic in (10) for general sampling designs.

Lemma 1.

Under H0:θ2=θ2(0) and the assumptions for Theorem 2, (11) W(θ2(0))  G1=i=1qciZi2(11) in distribution as n, and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where c1c2cq>0 are the eigenvalues of P=Σθ,222·1(θ1,θ2(0)), Z1,,Zq are q independent random variables from the standard normal distribution, Σθ,2 is the asymptotic variance of n1/2θ̂2 in (3), 22·1(θ1,θ2)=22(θ1,θ2)21(θ1,θ2){11(θ1,θ2)}112(θ1,θ2), and ij(θ1,θ2)=E[2l{(θ1,θ2);X,Y}/(θiθjT)] for i,j=1,2.

Lumley and Scott (Citation2014) proposed to estimate the limiting distribution in (11) using a design-based estimator of P=Σθ,222·1(θ1,θ2(0)).

We consider an alternative test using a novel application of the bootstrap method discussed in Section 3. To do this, a bootstrap version of the quasi-likelihood ratio test statistic in (10) is obtained as (12) W*(θ̂2)=2n{lw*(θ̂1*(0),θ̂2)lw*(θ̂*)},(12) where θ̂1*(0)=argmaxθ1lw*(θ1,θ̂2), and θ̂2 is the second component of θ̂=(θ̂1T,θ̂2T)T. Under the conditions of Theorem 2, we can show that the limiting distribution for W*(θ̂2) is the same as that in (11). Thus, the bootstrap distribution using W*(θ̂2) in (12) can be used to obtain the reference distribution of the test statistic W(θ2(0)) in (10) under H0:θ2=θ2(0).

Remark 4.

In addition to the quasi-likelihood ratio test, the proposed bootstrap method is also applicable to quasi-score tests (Rao, Scott, and Skinner Citation1998). To develop a quasi-score test for H0:θ2=θ2(0), we first decompose the survey-weighted score function as (13) Ŝw(θ)=(Ŝw1(θ)Ŝw2(θ))=(N1iAwiu1(θ;xi,yi)N1iAwiu2(θ;xi,yi))(13) with uj(θ;xi,yi)=(yiμi){V0(μi)}1(μi/θj) for j = 1, 2, where μi=μ(xi;θ),μ(x;θ)=E(Y|x) is a known function with an unknown parameter θ, and V0(μi)=V(Y|x) is a known function of μi. Note that quasi-score tests clearly specify the model without requiring a correctly specified density function, so they are more important in practice. Let θ̂1(0) be the solution to Ŝw1(θ1,θ2(0))=0, where Ŝw1(θ) is defined in (13). The quasi-score test statistic for H0:θ2=θ2(0) is (14) XQS2(θ2(0))=Ŝw2T(θ̂(0)){Îw22·1(θ̂(0))}1Ŝw2(θ̂(0)),(14) where (15) Îw22·1(θ)=Îw22(θ)Îw21(θ){Îw11(θ)}1Îw12(θ)(15)

and (Îw11(θ)Îw12(θ)Îw21(θ)Îw22(θ))=(Ŝw1(θ)/θ1Ŝw1(θ)/θ2Ŝw2(θ)/θ1Ŝw2(θ)/θ2).

The bootstrap replicates of XQS2(θ2(0)) can be constructed by replacing the sampling weights wi and θ̂(0), respectively, by the bootstrap weights wi* and the bootstrap estimator that solves Ŝw*(θ)=N1iAwi*{yipi(θ)}(1,xiT)T=0 subject to θ2=θ̂2. Then, the bootstrap version of the quasi-score test statistic XQS2(θ2(0)) is (16) XQS2*(θ̂2)=Ŝw2*T(θ̂*(0)){Îw22·1*(θ̂*(0))}1Ŝw2*(θ̂*(0)),(16) where θ̂*(0)=(θ̂1*(0)T,θ̂2T)T, and Îw22·1*(θ̂*(0)) is computed similarly to (15) using bootstrap weights.

5 Test for Categorical Survey Data

5.1 Goodness-of-Fit Test

Suppose that a finite population U is partitioned into K categories with U=U(1)U(K). Define pk=Nk/N to be the population proportion of the kth category, where Nk is the size of U(k). Thus, we implicitly assume that the finite population is an iid realization of the super-population model following a multinomial distribution with K categories.

Suppose that we are interested in the simple goodness-of-fit test H0:pk=pk(0) for k=1,,K, where (p1(0),,pK(0))T is a pre-specified vector satisfying k=1Kpk(0)=1. From the sample A, we compute p̂k=N̂k/N̂ as an estimator of pk, where N̂k=iAkwi is a design-unbiased estimator of Nk, Ak=AU(k), and N̂=k=1KN̂k=iAwi. The estimated proportions can be obtained by p̂=argmaxpΘiAwil(p;xi), where l(p;xi)=k=1Kxiklog(pk), with xik = 1 if the ith element belongs to the kth category and 0 otherwise, p=(p1,,pK)T.

Then, the Pearson Chi-squared goodness-of-fit test statistic for H0 is X2(p(0))=nk=1K(p̂kpk(0))2/pk(0),

where p(0)=(p1(0),,pK1(0))T. Under regularity conditions, according to Rao and Scott (Citation1981), we have (17) X2(p(0))  G2=k=1K1λkZk2(17) in distribution as n under H0, and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where λ1λ2λK1>0 are the eigenvalues of the design effect matrix D=P01Σp,P0=diag(p(0))p(0)(p(0))T,Σp is the asymptotic variance of np̂, and Z1,,ZK1 are K – 1 independent random variables from the standard normal distribution. Under simple random sampling with replacement, the limiting distribution in (17) is a Chi-squared distribution with K – 1 degrees of freedom since P0=Σp.

We now apply the proposed bootstrap method to approximate the limiting distribution in (17), without computing the estimated covariance matrix Σ̂p. To describe the proposed method, let p̂* be the estimator of the population proportion p based on the bootstrap weights wi*. The proposed bootstrap statistic of X2(p(0)) is X2*(p̂)=ni=1K(p̂i*p̂i)2/p̂i.

We use p̂i in place of pi(0) in the bootstrap test statistics. Under the conditions of Theorem 2, we can show that the limiting distribution for X2*(p̂) is the same as that in (17).

5.2 Test of Independence in a Two-Way Table

The proposed bootstrap can also be used to test independence in a two-way count table. Let pij=Nij/N be the population proportion for cell (i, j) with margins pi+ and p+j for i=1,,R and j=1,,C, where R and C are the numbers of rows and columns, and {Nij;i=1,,R,j=1,,C} is the set of population counts with margins Ni+ and N+j. Let N̂ij be a design-unbiased estimator of Nij and p̂ij=N̂ij/N̂. By assuming a multinomial distribution for the super-population model, the Chi-squared statistic for testing independence H0:pij=pi+p+j for all i and j is (18) XI2=ni=1Rj=1C(p̂ijp̂i+p̂+j)2p̂i+p̂+j.(18)

The limiting distribution of XI2 is a mixture of χ2(1) distributions (Rao and Scott Citation1981).

To develop bootstrap replicates of the test statistics under H0, let p̂ij* be the bootstrap cell proportion computed using the bootstrap weights, p̂i+*=j=1Cp̂ij*, and p̂+j*=i=1Rp̂ij*. The proposed bootstrap version of XI2 is (19) XI2*=ni=1Rj=1C{(p̂ij*p̂i+*p̂+j*)(p̂ijp̂i+p̂+j)}2(p̂i+p̂+j).(19)

Under H0, the terms in the numerator of XI2 are identical to {(p̂ijp̂i+p̂+j)(pijpi+p+j)}2. That is, the bootstrap test statistic is calculated by simply replacing {p̂ij,p̂i+,p̂+j} and {pij,pi+,p+j} with {p̂ij*,p̂i+*,p̂+j*} and {p̂ij,p̂i+,p̂+j}, respectively. It can be shown that the limiting distribution of XI2* conditional on the sample is the same as that of XI2.

6 Simulation Study

6.1 Single-Stage Sampling

In this section, we test the performance of the proposed bootstrap method under the PPS (probability proportional to size) sampling with replacement. A finite population of size N is generated as follows. For i=1,,N, xiU(5,5) and yi=θ1+θ2xi+ϵi, where U(a,b) is a uniform distribution on the interval (a, b), (θ1,θ2)=(1,1), and ϵiN(0,22). To make the sampling design informative, the PPS sampling with replacement is carried out to generate a sample of size n with selection probabilities pi(6Ii+1) and i=1Npi=1, where Ii = 1 if xiϵi>0 and 0 otherwise; see Section S6 of the supplementary material for details about the probability proportional to size sampling and the bootstrap weights. The sampling design is informative (Pfeffermann Citation1993), since the selection probabilities are partially determined by the residuals {ϵi:i=1,,N}. A similar model setup was also used by Verret, Rao, and Hidiroglou (Citation2015) in the context of small area estimation. We consider two scenarios: (N,n)=(2,000,200) and (N,n)=(15,000,500).

We are interested in testing H0:θ2=θ2(0) with the nominal significance level of α=0.05 and consider three different values for θ2(0){1,1.1,1.2}. The following testing methods are compared:

  1. Naive likelihood ratio method with W(θ2(0)) in (10) and χ2(1) as test statistic and reference distribution, respectively.

  2. Naive quasi-score method with test statistic (14) and χ2(1) as the reference distribution.

  3. Lumley and Scott (Citation2014) method. The test statistic is W(θ2(0))/δ̂, where δ̂=nV̂(θ̂2)Îw,22·1,Îw,22·1 is defined in (15), and V̂(θ̂2) is obtained from Îw1(θ̂)V̂{Ŝw(θ)}{Îw1(θ̂)}T. The reference distribution is F1,k, where Fν1,ν2 is an F distribution with parameters ν1 and ν2, k is the degrees of freedom of the variance estimator based on the sampling design, and k is obtained by subtracting the number of parameters from the effective sample size associated with the sampling design.

  4. Bootstrap quasi-likelihood-ratio method with W(θ2(0)) in (10) being the test statistic, and the reference distribution is approximated by the empirical distribution of W*(θ̂2) in (12).

  5. Bootstrap quasi-score method with XQS2(θ2(0)) in (14) being the test statistic, and the reference distribution is approximated by the empirical distribution of the bootstrap test statistics,

When calculating the likelihood-ratio test statistics, we assume a normal distribution with a constant but unknown variance for the conditional distribution yi given xi. For each scenario, we generate 1000 Monte Carlo samples, and we consider M = 500 and M = 1000 iterations for both bootstrap methods. summarizes the simulation results. For both scenarios, the naive likelihood-ratio method and the naive quasi-score method have a significantly inflated Type I error rate when the null hypothesis is true. The corresponding design effects (Wu and Thompson Citation2020) are more than 2 for estimating θ2 under both scenarios; see Section S8.1 of the supplementary material for details. Thus, the distributions of W(θ2(0)) and the quasi-score test statistic (14) are more “dispersed” than χ2(1) under the null hypothesis, and due to the lack of design-effect corrections, the two naive methods are not theoretically valid. The Type I error rates of the Lumley and Scott (Citation2014) method and the two proposed bootstrap testing methods are similar under H0, regardless of the population and sample sizes. The Type I error rates of the three methods come closer to the nominal truth as the population and sample sizes increase. Furthermore, the powers of the two bootstrap testing methods are reasonable compared to the Lumley-Scott method. The naive methods have a slightly larger power at the expense of elevated Type I error rates. In addition, we get similar test results regardless of the number of bootstrap repetitions.

Table 1 Test power for the hypothesis test H0:θ2=θ2(0) based on 1 000 Monte Carlo simulations, the true parameter is θ2=1.00, and the nominal significance level is 0.05.

Besides, Section S9 of the supplementary material reports results of an additional simulation study for a stratified two-stage sampling design.

6.2 Test of Independence

In this section, we consider test of independence in a 3 × 3 table of counts to check the performance of the proposed bootstrap testing method. For a finite population U={yi:i=1,,N}, yi is generated by a multinomial distribution with one trial and a success probability vector p, where p=(p11,,pij,,p33)T, and pij is the success probability for the cell in the ith row and jth column for i,j=1,,3. That is, yi is a dummy variable consisting of eight 0’s and a 1. For the success probability vector p, we consider three cases:

  • Case I: p11=1/4,p12=p13=p21=p31=1/8,p22=p23=p32=p33=1/16.

  • Case II: p11=1/4,p12=p13=(1.4)/8,p21=p31=(0.6)/8,p22=p33=1/16,p23=(1.4)/16,p32=(0.6)/16.

  • Case III: p11=p2,3=p3,2=1/6,p12=p13=p21=p31=p22=p33=1/12.

Case I satisfies independence for the two-way table of counts, but Cases II–III do not. The level of nonindependence can be expressed by γ=i=13j=13(pijpi+p+j)2/(pi+p+j), and γ = 0 corresponds to independence. The values of γ are 0, 0.017 and 0.125 for Cases I–III, respectively.

For each yi, we generate an auxiliary variable xi=βTyi, where β=(β1,,β9)T,βj=0.5+ej for j=1,,9,ejEx(1). Probability proportional to size sampling with replacement is used to generate a sample of size n with selection probability proportional to xi. We consider two scenarios for the population and sample sizes: (N,n)=(2000,75) and (N,n)=(10,000,150).

We are interested in testing independence in the two-way table with α=0.05 nominal significance level. For each sample, we compare the following test methods:

  1. Naive Pearson method based on XI2 in (18) with χ2(4) being the reference distribution.

  2. The second-order Rao-Scott correction method using the Satterthwaite approximation (Thomas and Rao Citation1987). This method is widely used in analyzing two-way contingency table and is implemented by the command svychisq in the R package survey (Lumley Citation2020).

  3. Bootstrap Pearson method using XI2, and its distribution is approximated by that of XI2* in (19).

For each scenario, we generate 1000 Monte Carlo samples, and we consider M = 500 and M = 1000 iterations for both bootstrap methods. summarizes the simulation results. For Case I when the null hypothesis is true, the Type I error rate of the naive Pearson method is much larger than the nominal significance level 0.05 for different sample sizes, since the asymptotic distribution of the test statistic XI2 in (18) is more “dispersed” than a χ2(4) distribution under the null hypothesis; see Section S8.2 of the supplementary material for details. The performance of the Rao-Scott method with second-order correction is similar to the bootstrap testing method in terms of Type I error and power. The power of the bootstrap testing method increases with the value of γ. Similar to the previous simulation, the power of the naive method is larger at the expense of elevated Type I error rates. In addition, we get similar test results regardless of the number of bootstrap repetitions.

Table 2 Power of the test procedures for independence based on 1000 Monte Carlo simulation samples, Case I corresponds to the independence scenario, and the nominal significance level is 0.05.

7 Application

We present an analysis of the 2011 Private Education Expenditure Survey (PEES) in South Korea using the proposed bootstrap testing methods. The purpose of this survey is to study the relationship between private education expenditure and academic performance of students before entering college.

A stratified two-stage sampling design was used for the 2011 PEES, and the strata consist of 16 first-tier administrative divisions, including most provinces and metropolitan cities in South Korea. For each stratum, PPS sampling with replacement was conducted in the first stage, and the primary sampling unit was the school. The students were randomly selected from the sample school in the second stage. There are about 1000 sample schools and 45,000 students involved in this survey.

For the student i in the sample A, let yi be the academic performance assessed by the teacher, and it takes a value from 1 through 3 corresponding to low, middle and high academic performance, respectively. Associated with yi, let xi be the vector of covariates of interest. As discussed by Kim, Park, and Lee (Citation2017), we consider the following covariates: after-school education, hours taking lessons provided by the school after regular classes in a month; father’s education, 1 for college or higher and 0 otherwise; gender, 1 for female and 0 for male; household income per month; mother’s education, 1 for college or higher and 0 otherwise; private education, hours taking private lessons in a month.

In this section, we study the academic performance of students in middle school and high school separately and are interested in studying the conditional probability of achieving high academic performance. Specifically, consider the following logistic model (20) logit{pr(Y=1|x)}=(1,xT)θ,(20) where logit(x)=log(x)log(1x) for x(0,1), Y = 1 if high academic performance is achieved and 0 otherwise, x is a vector of six covariates, and θ=(θ0,θ1,,θ6)T. We are interested in testing the null hypotheses H0,i:θi=0 for i=1,,6 with α=0.05 nominal significance level. Since the Wald method is widely used in practice, the naive likelihood ratio method, naive quasi-score method, bootstrap quasi-likelihood ratio method, bootstrap quasi-score method with 1000 iterations and a two-sided Wald test are compared. The p-values for the two naive methods and the Wald test are obtained using reference distributions for simple random sampling. Specifically, the reference distribution for the two naive methods is χ2(1), and that of the Wald test is a normal distribution with estimated variance (Fuller Citation2009, sec. 1.2.8) for θ̂ using the sandwich formula. The p-values for the proposed bootstrap testing methods are obtained by bootstrap empirical distributions of the corresponding test statistics.

The results of our analysis are summarized in . The two bootstrap testing methods and the Wald test perform approximately the same in terms of p-values. The p-values of the two naive methods are approximately the same, but they differ from those of the bootstrap testing methods, especially for “after school education” at the middle school level and most covariates at the high school level, as the naive methods may not properly reflect the intra-cluster correlation due to cluster sampling.

Table 3 Estimates (Est) and the p-values (Unit: 102) for testing H0,i:θi=0, where i=1,,6, in (20) for the middle school and high school.

Based on the two bootstrap testing methods in , we have the following conclusions under α=0.05 nominal significance level. Controlling other covariates, the probability that female students achieve high academic performance is significantly higher than that of male students in middle school, but the gender effect is not significant in high school. Hours spent on private education and after-school education can significantly increase the probability of achieving high academic performance in both middle school and high school. The household income has a significantly positive influence on their child’s academic performance. However, the level of education of the father and mother has a significant influence during the middle school period only.

8 Concluding Remarks

Many statistical agencies provide microdata files containing survey weights and several sets of associated replication weights, in particular bootstrap weights. Standard statistical packages often permit the use of survey-weighted test statistics, and we have shown how to approximate their distributions under the null hypothesis by their bootstrap analogues computed from the bootstrap weights supplied in the data file. Using bootstrap weights, we can easily apply hypothesis testing without using specialized software designed for complex survey data, as noted by Beaumont and Bocci (Citation2009). The proposed bootstrap method is applied to quasi-likelihood ratio tests and quasi-score tests.

Our theories depend on establishing bootstrap central limit theorems under specified sampling designs, including stratified multi-stage sampling with clusters selected with replacement. Under some sampling designs, the central limiting theorem does not necessarily hold and the proposed method is not applicable. In the case of sampling designs that allow for the central limit theorem, the bootstrap central limit theorems can be established under some additional assumptions. Under the assumption that the bootstrap central limit theorem holds in the sampling design, the proposed hypothesis testing methods are applicable. The hypotheses we have discussed are special cases of the one considered in Section 4.2 of Jiang (Citation2019), so the theoretical properties of the proposed bootstrap method can also be investigated using the general method developed in Section 4.2 of Jiang (Citation2019). The proposed method can be applied to the case of categorical data by developing bootstrap procedures for testing simple goodness of fit and independence in a two-way table. We plan to extend our bootstrap method for categorical data to test hypotheses from multi-way tables of weighted counts or proportions, using a log-linear model approach proposed by Rao and Scott (Citation1984). Moreover, the bootstrap method can also be applied to nonresponse and weighting adjustments for variance estimation under regularity conditions, and extending the proposed bootstrap method for weight adjustments for hypothesis testing is also a promising project.

Supplementary Materials

Supplementary Material contains the following: comments on the regularity conditions (S1), proof of Theorem 1 (S2), proof of Theorem 2 (S3), verification of the proposed bootstrap method under stratified multi-stage sampling (S4), bootstrap method for Poisson sampling (S5) bootstrap method for probability proportional to size sampling with replacement (S6), stratified simple random sampling (S7), design effect (S8) and an additional simulation study: stratified two-stage sampling (S9).

Supplemental material

Supplemental Material

Download Zip (276 KB)

Acknowledgments

We are grateful to the associate editor and three anonymous referees for their constructive comments, which have greatly improved the article.

Additional information

Funding

Kim is partially supported by NSF (OAC-1931380). Rao is supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Wang is partially supported by NSFC (11901487, 71988101, 72033002, 12231011) and National Key R&D Program of China (2022YFA10038002).

References

  • Antal, E., and Tillé, Y. (2011), “A Direct Bootstrap Method for Complex Sampling Designs from a Finite Population,” Journal of the American Statistical Association, 106, 534–543. DOI: 10.1198/jasa.2011.tm09767.
  • Antal, E., and Tillé, Y. (2014), “A New Resampling Method for Sampling Designs Without Replacement: The Doubled Half Bootstrap,” Computational Statistics, 29, 1345–1363.
  • Beaumont, J.-F., and Bocci, C. (2009), “A Practical Bootstrap Method for Testing Hypotheses from Survey Data,” Survey Methodology, 35, 25–35.
  • Beaumont, J.-F., and Patak, Z. (2012), “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling,” International Statistical Review, 80, 127–148. DOI: 10.1111/j.1751-5823.2011.00166.x.
  • Berger, Y. G. (1998), “Rate of convergence to normal distribution for the Horvitz-Thompson estimator,” Journal of Statistical Planning and Inference, 67, 209–226. DOI: 10.1016/S0378-3758(97)00107-9.
  • Bertail, P., and Combris, P. (1997), “Bootstrap Généralisé d’un Sondage,” Annales d’Economie et de Statistique, 46, 49–83. DOI: 10.2307/20076068.
  • Bessonneau, P., Brilhaut, G., Chauvet, G., and Garcia, C. (2021), “With-Replacement Bootstrap Variance Estimation for Household Surveys: Principles, Examples and Implementation,” Survey Methodology, 47, 313–347.
  • Bickel, P. J., and Freedman, D. A. (1984), “Asymptotic Normality and the Bootstrap in Stratified Sampling,” Annals of Statistics, 12, 470–482.
  • Breidt, F. J., and Opsomer, J. D. (2000), “Local Polynomial Regression Estimators in Survey Sampling,” Annals of Statistics, 28, 1026–1053.
  • Chambers, R., and Skinner, C. J. (2003), Analysis of Survey Data, Chichester: Wiley.
  • Chao, M.-T., and Lo, S.-H. (1985), “A Bootstrap Method for Finite Population,” Sankhya: Series A, 47, 399–405.
  • Chatterjee, S., and Bose, A. (2005), “Generalized Bootstrap for Estimating Equations,” Annals of Statistics, 33, 414–436.
  • Chauvet, G. (2015), “Coupling Methods for Multistage Sampling,” Annals of Statistics, 43, 2484–2506.
  • Chen, J., and Rao, J. (2007), “Asymptotic Normality Under Two-Phase Sampling Designs,” Statistica Sinica, 17, 1047–1064.
  • Chipperfield, J., and Preston, J. (2007), “Efficient Bootstrap for Business Surveys,” Survey Methodology, 33, 167–172.
  • Fuller, W. A. (2009), Sampling Statistics, New Jersey: Wiley.
  • Hájek, J. (1964), “Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population,” Annals of Mathematical Statistics, 35, 1491–1523.
  • Han, Q., and Wellner, J. A. (2021), “Complex Sampling Designs: Uniform Limit Theorems and Applications,” Annals of Statistics, 49, 459–485.
  • Isaki, C. T., and Fuller, W. A. (1982), “Survey Design under the Regression Superpopulation Model,” Journal of the American Statistical Association, 77, 89–96. DOI: 10.1080/01621459.1982.10477770.
  • Jiang, J. (2019), Robust Mixed Model Analysis, Singapore: World Scientific.
  • Kim, J. K., Michael Brick, J., Fuller, W. A., and Kalton, G. (2006), “On the Bias of the Multiple-Imputation Variance Estimator in Survey Sampling,” Journal of the Royal Statistical Society, Series B, 68, 509–521. DOI: 10.1111/j.1467-9868.2006.00546.x.
  • Kim, J. K., Park, S., and Lee, Y. (2017), “Statistical Inference using Generalized Linear Mixed Models under Informative Cluster Sampling,” Canadian Journal of Statistics, 45, 479–497. DOI: 10.1002/cjs.11339.
  • Korn, E. L., and Graubard, B. I. (1999), Analysis of Health Surveys, New York: Wiley.
  • Lumley, T. (2020), Analysis of Complex Survey Samples, R package version 3.37.
  • Lumley, T., and Scott, A. J. (2014), “Tests for Regression Models Fitted to Survey Data,” Australian & New Zealand Journal of Statistics, 56, 1–14. DOI: 10.1111/anzs.12065.
  • Mashreghi, Z., Haziza, D., and Léger, C. (2016), “A Survey of Bootstrap Methods in Finite Population Sampling,” Statistics Surveys, 10, 1–52. DOI: 10.1214/16-SS113.
  • Pfeffermann, D. (1993), “The Role of Sampling Weights when Modeling Survey Data,” International Statistical Review, 61, 317–337. DOI: 10.2307/1403631.
  • Praestgaard, J., and Wellner, J. A. (1993), “Exchangeably Weighted Bootstraps of the General Empirical Process,” Annals of Probability, 21, 2053–2086.
  • Rao, J. N. K., and Scott, A. J. (1981), “The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables,” Journal of the American Statistical Association, 76, 221–230. DOI: 10.1080/01621459.1981.10477633.
  • Rao, J. N. K., and Scott, A. J. (1984), “On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data,” Annals of Statistics, 12, 46–60.
  • Rao, J. N. K., Scott, A. J., and Skinner, C. J. (1998), “Quasi-Score Tests with Survey Data,” Statistica Sinica, 8, 1059–1070.
  • Rao, J. N. K., and Wu, C. F. J. (1988), “Resampling Inference with Complex Survey Data,” Journal of the American Statistical Association, 83, 231–241. DOI: 10.1080/01621459.1988.10478591.
  • Rao, J. N. K., Wu, C. F. J., and Yue, K. (1992), “Some Recent Work on Resampling Methods for Complex Surveys,” Survey Methodology, 18, 209–217.
  • Rosén, B. (1972a), “Asymptotic Theory for Successive Sampling with Varying Probabilities without Replacement, I,” Annals of Mathematical Statistics, 43, 373–397.
  • Rosén, B. (1972b), “Asymptotic Theory for Successive Sampling with Varying Probabilities without Replacement, II,” The Annals of Mathematical Statistics, 43, 748–776.
  • Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005), “On the Two-Phase Framework for Joint Model and Design-based Inference,” Annals of Statistics, 33, 2789–2810.
  • Särndal, C.-E., Swensson, B., and Wretman, J. (2003), Model Assisted Survey Sampling, New York: Springer.
  • Särndal, C.-E., Swensson, B., and Wretman, J. H. (1989), “The Weighted Residual Technique for Estimating the Variance of the General Regression Estimator of the Finite Population Total,” Biometrika, 76, 527–537. DOI: 10.1093/biomet/76.3.527.
  • Shao, J. (2003), Mathematical Statistics, New York: Springer.
  • Sverchkov, M., and Pfeffermann, D. (2004), “Prediction of Finite Population Totals based on the Sample Distribution,” Survey Methodology, 30, 79–92.
  • Thomas, D. R., and Rao, J. N. K. (1987), “Small-Sample Comparisons of Level and Power for Simple Goodness-of-Fit Statistics under Cluster Sampling,” Journal of the American Statistical Association, 82, 630–636. DOI: 10.1080/01621459.1987.10478476.
  • Toth, D., and Eltinge, J. L. (2011), “Building Consistent Regression Trees From Complex Sample Data,” Journal of the American Statistical Association, 106, 1626–1636. DOI: 10.1198/jasa.2011.tm10383.
  • Verret, F., Rao, J. N. K., and Hidiroglou, M. A. (2015), “Model-based Small Area Estimation Under Informative Sampling,” Survey Methodology, 41, 333–347.
  • Wu, C., and Thompson, M. E. (2020), Sampling Theory and Practice, Gewerbestrasse: Springer.