Full article: Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Standard statistical methods without taking proper account of the complexity of a survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the Type I error rates of hypotheses tests using standard methods can be much larger than the nominal significance level. Methods incorporating design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests that involve estimated covariance matrices of parameter estimates. In this article, we present a unified approach to hypothesis testing without requiring estimated covariance matrices or design effects, by constructing bootstrap approximations to quasi-likelihood ratio statistics and quasi-score statistics and establishing its asymptotic validity. The proposed method can be easily implemented without a specific software designed for complex survey sampling. We also consider hypothesis testing for categorical data and present a bootstrap procedure for testing simple goodness of fit and independence in a two-way table. In simulation studies, the Type I error rates of the proposed approach are much closer to their nominal significance level compared with the naive likelihood ratio test and quasi-score test. An application to an educational survey under a logistic regression model is also presented. Supplementary materials for this article are available online.

KEYWORDS:

1 Introduction

Testing statistical hypotheses is one of the fundamental problems in statistics. In the parametric model approach, hypothesis testing can be implemented using Wald test, likelihood ratio test, or score test. In each case, a test statistic is calculated and then compared with the $100 (1 - α) %$ -quantile of the reference distribution, which is the limiting distribution of the test statistic under the null hypothesis, where α is the nominal significance level. The limiting distribution is often a Chi-squared distribution (Shao Citation2003).

However, statistical inference with survey data involves additional steps incorporating the sampling design features. Korn and Graubard (Citation1999) and Chambers and Skinner (Citation2003) provided comprehensive overviews of the methods for analyzing survey data from complex sampling designs. In hypothesis testing with survey data, the limiting distribution of the test statistic is generally not a Chi-squared distribution. Rather, it can be expressed as a weighted sum of several independent random variables from $χ^{2} (1)$ , which is a Chi-square distribution with one degree of freedom, and the weights depend on unknown parameters. To handle such problems, one may consider some corrections to the test statistics to obtain a Chi-square limiting distribution approximately. Such an approach usually involves computing “design effects” associated with test statistics (Lumley and Scott Citation2014).

In this article, we use a different approach to computing the limiting distribution using a bootstrap approximation. Beaumont and Bocci (Citation2009) investigated a general weighted bootstrap method for hypothesis testing under complex sampling designs in the context of Wald tests for linear regression analysis. In this article, we generalize the bootstrap testing idea of Beaumont and Bocci (Citation2009) and present a unified bootstrap weight approach to obtain the limiting distribution of test statistics under complex sampling designs, including stratified multi-stage sampling with clusters selected with replacement.

The proposed method is a bootstrap weight method (Mashreghi, Haziza, and Léger Citation2016) under complex survey sampling. Although Praestgaard and Wellner (Citation1993) and Chatterjee and Bose (Citation2005) rigorously investigate bootstrap weight methods under regularity conditions, both may provide erroneous inferences under complex survey sampling. Praestgaard and Wellner (Citation1993) established central limit theorems for the weighted bootstrap empirical process, but they assumed that the sample is independently and identically distributed (iid). Although it is common to assume that the finite population is iid, elements in a sample may not be due to complex sampling designs; see Pfeffermann (Citation1993) for details. In addition, Praestgaard and Wellner (Citation1993) only focused on mean estimation, but we are interested in hypothesis testing for parameters that solve a certain estimating equation. Chatterjee and Bose (Citation2005) proposed a bootstrap weight method for estimating functions, and their theoretical results are valid even for correlated random variables. However, the bootstrap weight method of Chatterjee and Bose (Citation2005) is not applicable to informative sampling designs, since survey weights are not taken into consideration; see Pfeffermann (Citation1993) for details.

There exist bootstrap weight methods under complex survey sampling, but the proposed bootstrap method differs from them in the following aspects. Rao, Wu, and Yue (Citation1992) proposed a bootstrap weight method to estimate the variance of a function of population total estimators under stratified random sampling, but did not investigate the theoretical properties of their bootstrap method. Chipperfield and Preston (Citation2007) proposed a without replacement scaled bootstrap to achieve the same goal as Rao, Wu, and Yue (Citation1992) under stratified random sampling, but their method is only applicable when the parameter of interest is a smooth function of population totals. Moreover, neither method is applicable to other complex survey sampling. Bertail and Combris (Citation1997) proposed a bootstrap weight method to make inferences for population means under general sampling designs, and the corresponding theoretical properties were investigated. However, their results are not applicable to parameters obtained by solving an estimating equation. Although Beaumont and Patak (Citation2012) developed a general bootstrap weight method for parameter solving an estimating equation in general sampling designs, they focused only on variance estimation without a rigorous discussion about hypothesis testing. Antal and Tillé (Citation2011) proposed a one-one bootstrap method for variance estimation under general sampling designs. Similar to the one-one bootstrap method, Antal and Tillé (Citation2014) developed a doubled half-bootstrap to estimate the variance of the estimator for population means. However, neither Antal and Tillé (Citation2011) nor Antal and Tillé (Citation2014) considered parameters solving an estimating equation. Moreover, existing work did not establish limiting distributions for the corresponding bootstrap estimator, especially for the one solving an estimating equation. Thus, up to our knowledge, there is no theoretical guarantee for existing methods when used for hypothesis testing.

Since the use of bootstrap for hypothesis testing is not fully investigated in the literature, our goal is to fill this important gap under general sampling designs. To do this, we first establish a bootstrap central limit theorem under regularity conditions. The sampling design is allowed to be informative in the sense that the sampling mechanism depends on the response variables being sampled (Pfeffermann Citation1993). Once the bootstrap central limit theorem is established, the proposed bootstrap method can be applied to the quasi-likelihood-ratio test and the quasi-score test. As long as the bootstrap central limit theorem holds for the sampling design, the bootstrap replicates of the test statistics can be used to approximate their distributions under the null hypothesis. Besides, this article also has a practical contribution to survey sampling, since bootstrap weights are commonly provided in agencies, including Statistics Canada.

The article is organized as follows. In Section 2, the basic setup is introduced. The proposed bootstrap method is presented in Section 3. In Section 4, the quasi-likelihood ratio test and the quasi-score test are introduced. Test for categorical survey data is briefly discussed in Section 5. The results of two simulation studies are presented in Section 5. An application to data from an educational survey under a logistic regression model is presented in Section 6. Concluding remarks are made in Section 7.

2 Basic Setup

Suppose that a finite population $F_{N} = {(z_{i}, x_{i}, y_{i}) : i = 1, \dots, N}$ is a random sample of size N from a super-population model (Isaki and Fuller Citation1982; Pfeffermann Citation1993; Sverchkov and Pfeffermann Citation2004; Rubin-Bleuer and Schiopu-Kratina Citation2005), where y_i is the response of interest, and $x_{i}$ is the corresponding auxiliary vector of length d, and $z_{i}$ is the vector of design variables such that the first-order inclusion probability for unit i can be obtained by $π_{i} = π (z_{i}; ϕ_{N})$ , where $ϕ_{N}$ is a design parameter determined by ${z_{1}, \dots, z_{N}}$ . In other literature, the inclusion probability is expressed as $π_{i} = π (F_{N}, i)$ ; see Sverchkov and Pfeffermann (Citation2004) for details.

We are interested in hypothesis testing for a parameter $θ_{0} = \arg \max_{θ \in Θ} E {l (θ; X, Y)}$ , where $Θ \subset R^{p}$ is the parameter space, $(X, Y)$ is the random variable associated with elements of $F_{N}$ , the expectation is taken with respect to the super-population model, and $l (θ; x, y)$ is a pre-defined objective function. For example, if the finite population is a random sample generated from a super-population model with a density function $f (x, y; θ_{0})$ and if $θ_{0}$ is the parameter of interest, then $l (θ; x, y) = \log f (x, y; θ)$ . For regression problems, $θ_{0}$ can be the parameter in the conditional expectation of the response of interest given the auxiliary vectors.

From the finite population $F_{N}$ , a probability sample ${(x_{i}, y_{i}) : i \in A}$ of size n is selected by a probability sampling design with the first-order inclusion probability π_i, where A is the set of sample indexes. The design variables $z_{i}$ and the design parameter $ϕ_{N}$ are not available at the sample level. From the sample, we can compute a pseudo M-estimator of $θ_{0}$ by $\hat{θ} = \arg \max_{θ \in Θ} l_{w} (θ),$ where $l_{w} (θ) = N^{- 1} \sum_{i \in A} w_{i} l (θ; x_{i}, y_{i})$ is the survey-weighted objective function and $w_{i} = π_{i}^{- 1}$ is the sampling weight for $i \in A$ . Note that $l_{w} (θ)$ is design-unbiased for $l_{N} (θ) = N^{- 1} \sum_{i = 1}^{N} l (θ; x_{i}, y_{i})$ in the sense that $E {l_{w} (θ) | F_{N}} = l_{N} (θ)$ , where the conditional expectation conditioning on $F_{N}$ is with respect to the sampling mechanism. To be rigorous, a subscript N should also be used to index the sample, the sampling weights, the estimator and other quantities, but it is suppressed in the sequel for simplicity.

Remark 1.

When discussing a finite population in the preceding paragraph, we implicitly assume that elements of $F_{N}$ are iid. Such an assumption is widely adopted in survey sampling; see Corollary 1.3.2.1 and Section 2.2.1 of Fuller (Citation2009) for details. Besides, under a linear regression model, (1) $Y = X^{T} β_{0} + ϵ$ (1) with $E (ϵ | x) = 0$ , the finite population ${(x_{i}, y_{i}) : i = 1, \dots, N}$ satisfies the iid assumption; see Särndal, Swensson, and Wretman (Citation1989, sec. 5.1), Breidt and Opsomer (Citation2000), Särndal, Swensson, and Wretman (Citation2003, sec. 6.4), Kim et al. (Citation2006, sec. 5), Toth and Eltinge (Citation2011), Wu and Thompson (Citation2020, sec. 5.1) and Han and Wellner (Citation2021) for more general setups about the super-population model. If we are interested in the regression parameter $β_{0}$ in (1), an objective function is $l (θ; x, y) = - {(y - x^{T} β)}^{2}$ .

Although we implicitly make an iid assumption for the finite population, elements in the sample A may not be iid due to complex survey designs. That is, the informative sampling no longer preserves the iid structure in the sample; see Pfeffermann (Citation1993) for details. Under certain sampling designs, we can relax the iid assumption, and the super-population model can involve cluster random effects (Rubin-Bleuer and Schiopu-Kratina Citation2005); refer to Section S4 of supplementary material for an extension to stratified multi-stage sampling with cluster random effects. We focus on element sampling mainly in this article.

Since the parameter of interest is the superpopulation parameter, the final sample can be viewed as a realization of the two-phase sampling design, where the finite population is the first-phase sample and the final sample is the second-phase sample. Chen and Rao (Citation2007) and Rubin-Bleuer and Schiopu-Kratina (Citation2005) present rigorous asymptotic setups for two-phase sampling.

Often, the pseudo M-estimator $\hat{θ}$ can be obtained by solving (2) ${\hat{S}}_{w} (θ) = \frac{\partial}{\partial θ} l_{w} (θ) = \frac{1}{N} \sum_{i \in A} w_{i} S (θ; x_{i}, y_{i}) = 0,$ (2) where $S (θ; x, y) = \partial l (θ; x, y) / \partial θ$ is the tangent function of $θ$ . To discuss the asymptotic properties of the pseudo M-estimator, we assume the following regularity conditions for a sequence of finite populations and samples (Isaki and Fuller Citation1982).

C1. The parameter space Θ is an open and convex set containing $θ_{0}$ as an interior point. For $θ \in Θ, l (θ; x, y)$ is concave and twice-continuously differentiable with respect to $θ, E {| l (θ; X, Y) |} < \infty$ , and $E {l (θ; X, Y)}$ is uniquely maximized at $θ_{0}$ .

C2. For $θ \in Θ, V {l_{w} (θ) | F_{N}} \to 0$ in probability with respect to the super-population model as $n \to \infty$ , where $V {l_{w} (θ) | F_{N}}$ is the design variance of $l_{w} (θ)$ conditional on the finite population $F_{N}$ . In addition, there exists a nonnegative design variance estimator $\hat{V} {l_{w} (θ) | F_{N}}$ , such that $E [\hat{V} {l_{w} (θ) | F_{N}} | F_{N}] = V {l_{w} (θ) | F_{N}}$ almost surely.

C3. There exists a compact set $B \subset Θ$ containing $θ_{0}$ as an interior point, such that $\sup_{θ \in B} ‖ n V {{\hat{S}}_{w} (θ) | F_{N}} - Σ_{S} (θ) ‖ \to 0$ in probability with respect to the super-population model as $n \to \infty$ , where $V {{\hat{S}}_{w} (θ) | F_{N}}$ is the design variance of ${\hat{S}}_{w} (θ)$ conditional on the finite population, $‖ M ‖ = \sup_{x \in R^{m}, ‖ x ‖ = 1} ‖ Mx ‖$ is the operator norm of an m × m matrix $M$ associated with the Euclidean vector norm $‖ \cdot ‖$ , and $Σ_{S} (θ)$ is a positive-definitive nonstochastic matrix.

C4. A central limit theorem holds for the survey-weighted score function ${\hat{S}}_{w} (θ_{0})$ in (2). Specifically, $n^{1 / 2} {\hat{S}}_{w} (θ_{0}) \to N (0, Σ_{S} (θ_{0}))$ in distribution as $n \to \infty$ with respect to the super-population model and the sampling mechanism.

C5. The weak convergence of ${\hat{I}}_{w} (θ) = N^{- 1} \sum_{i \in A} w_{i} I (θ; x_{i}, y_{i})$ is uniform in $θ \in B$ , where $I (θ; x, y) = - \partial S (θ; x, y) / \partial θ^{T}$ . Specifically, $\sup_{θ \in B} ‖ {\hat{I}}_{w} (θ) - I (θ) ‖ \to 0$ in probability with respect to the super-population model and the sampling mechanism as $n \to \infty$ , where $I (θ) = E {I (θ; X, Y)}$ , and $I (θ_{0})$ is invertible.

Conditions C1–C2 are sufficient conditions for the weak consistency of $\hat{θ}$ . Conditions C3–C5 are used to establish the asymptotic normality of $\hat{θ}$ . In Conditions C3–C4, we consider the case where $V [E {{\hat{S}}_{w} (θ) | F_{N}}]$ is asymptotically negligible compared with $E [V {{\hat{S}}_{w} (θ) | F_{N}}]$ , and it is often the case when $n / N \to 0$ as $n \to \infty$ . That is, we can safely ignore the variability for generating the finite population and focus on the design variance of ${\hat{S}}_{w} (θ)$ given the finite population. This assumption is a building block of the proposed bootstrap method in the next section, and a similar assumption is also proposed by Beaumont and Bocci (Citation2009); see Assumption 5 and its discussion of Beaumont and Bocci (Citation2009) for details.

As mentioned by Han and Wellner (Citation2021), “establishing a finite-dimensional CLT … can be a nontrivial problem for general sampling designs,” and it is beyond our scope to prove Condition C4 under general sampling designs. In fact, there is no guarantee that the central limit theorem holds for general sampling designs. Instead, we simply consider the set of sampling designs such that the central limit theorem holds; see Hájek (Citation1964), Rosén (Citation1972a), Rosén (Citation1972b), Berger (Citation1998), Rubin-Bleuer and Schiopu-Kratina (Citation2005), sections 1.3.3–1.3.4 of Fuller (Citation2009), Chauvet (Citation2015), and Han and Wellner (Citation2021) for details. For simplicity, we use the same notation, “ $‖ \cdot ‖$ ”, to denote the matrix norm and the vector norm. Detailed comments on these regularity conditions are presented in Section S1 of supplementary material. For the proposed conditions, we implicitly assume that $N \to \infty$ as $n \to \infty$ . The following theorem establishes the asymptotic normality of $\hat{θ}$ . Its proof can be found in Section S2 of supplementary material.

Theorem 1.

Under Conditions C1–C5, we have (3) $\sqrt{n} (\hat{θ} - θ_{0}) \to N (0, Σ_{θ})$ (3) in distribution as $n \to \infty$ with respect to the joint distribution of the super-population model and the sampling mechanism, where $Σ_{θ} = ℐ {(θ_{0})}^{- 1} Σ_{S} (θ_{0}) {ℐ {(θ_{0})}^{- 1}}^{T} .$

Now, by the second-order Taylor expansion, we obtain (4) $\begin{matrix} l_{w} (θ_{0}) & = l_{w} (\hat{θ}) + {\hat{S}}_{w}^{T} (\hat{θ}) (θ_{0} - \hat{θ}) - \frac{1}{2} {(\hat{θ} - θ_{0})}^{T} {\hat{I}}_{w} (\hat{θ}) (\hat{θ} - θ_{0}) \\ + o_{p} (n^{- 1}) \\ = l_{w} (\hat{θ}) - \frac{1}{2} {(\hat{θ} - θ_{0})}^{T} {\hat{I}}_{w} (\hat{θ}) (\hat{θ} - θ_{0}) + o_{p} (n^{- 1}), \end{matrix}$ (4) where ${\hat{I}}_{w} (θ) = N^{- 1} \sum_{i \in A} w_{i} I (θ; x_{i}, y_{i}) .$ Define (5) $W (θ_{0}) = - 2 n {l_{w} (θ_{0}) - l_{w} (\hat{θ})} .$ (5)

By (4), we obtain $W (θ_{0}) = n {(\hat{θ} - θ_{0})}^{T} {\hat{I}}_{w} (\hat{θ}) (\hat{θ} - θ_{0}) + o_{p} (1) .$

Thus, by Theorem 1, we have (6) $W (θ_{0}) \to G = \sum_{i = 1}^{p} c_{i} Z_{i}^{2}$ (6) in distribution as $n \to \infty$ , and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where $c_{1}, \dots, c_{p}$ are the eigenvalues of $D = Σ_{θ} ℐ (θ_{0})$ , and $Z_{1}, \dots, Z_{p}$ are p independent random variables from the standard normal distribution. Result (6) was established by Rao and Scott (Citation1984) and Lumley and Scott (Citation2014), and it can be regarded as a version of the Wilks’ theorem for survey sampling. For p = 1, we can use $c_{1}^{- 1} W (θ_{0})$ as the test statistic with $χ^{2} (1)$ distribution as the limiting distribution under the null hypothesis, where $c_{1} = σ_{S}^{2} (θ_{0}) / I (θ_{0})$ , and $σ_{S}^{2} (θ)$ and $I (θ)$ are the counterparts of $Σ_{s} (θ)$ and $I (θ)$ , respectively. Unless the sampling design is simple random sampling and the sampling fraction is negligible, the limiting distribution does not reduce to the standard Chi-squared distribution with p degrees of freedom.

3 Bootstrap Calibration

We propose using a bootstrap weight method to approximate the limiting distribution in (6) and other test statistics under complex survey sampling. Such a bootstrap calibration is very attractive in practice since there is no need to derive the analytic form of the limiting distribution of the test statistic. That is, standard software that takes account of the weights can be used to implement bootstrap hypothesis testing from data files reporting weights and associated sets of bootstrap weights, as done in Statistics Canada and some other agencies. Given the sample A, the proposed bootstrap method is as follows:

Step 1 Generate a rescaling vector $r^{*}$ of size $n = | A |$ from a bootstrap distribution with $E_{*} (r^{*}) = 1_{n}$ , and the covariance of $r^{*}$ is determined by the sampling design, where $| A |$ is the cardinality of set A, $E_{*} (\cdot)$ is the expectation with respect to the bootstrap distribution conditional on the sample A, and $1_{n}$ is a vector of 1 with length n. The bootstrap weights are chosen in such a way that $E_{*} {S_{w}^{*} (θ)} = {\hat{S}}_{w} (θ)$ and $V_{*} {S_{w}^{*} (θ)} = \hat{V} {{\hat{S}}_{w} (θ) | F_{N}}$ , where ${\hat{S}}_{w}^{*} (θ) = \sum_{i \in A} w_{i}^{*} S (θ; x_{i}, y_{i}), w_{i}^{*} = r_{i}^{*} w_{i}, V_{*} (\cdot)$ is the bootstrap variance conditional on the sample A, and $\hat{V} {{\hat{S}}_{w} (θ) | F_{N}}$ is a nonnegative design variance estimator of ${\hat{S}}_{w} (θ)$ conditional on the finite population $F_{N}$ .
Step 2 Obtain a bootstrap replicate of $\hat{θ}$ , denoted as ${\hat{θ}}^{*}$ , by solving ${\hat{S}}_{w}^{*} (θ) = 0$ .

Different distributions may be used to generate $r^{*}$ under different sampling designs, but the generation of $r^{*}$ only depends on the inclusion probabilities and not on the specific form of the estimating function for parameter estimation. Thus, once the bootstrap weights are generated, they can be used for various inference problems based on the sample A. However, the commonly used linearization technique is based on the specific form of the estimating function, so different linearization is required for different hypothesis testing problems. Thus, compared with the linearization technique, the proposed bootstrap method is more appealing to the practitioners.

We now present additional regularity conditions to validate the proposed bootstrap method.

C6. For $θ \in Θ$ , the bootstrap weights are nonnegative and satisfy $E_{*} {{\hat{S}}_{w}^{*} (θ)} = {\hat{S}}_{w} (θ)$ and $V_{*} {{\hat{S}}_{w}^{*} (θ)} = \hat{V} {{\hat{S}}_{w} (θ)}$ conditional on the sample A, where $\hat{V} {{\hat{S}}_{w} (θ)}$ satisfies $n \sup_{θ \in B} ‖ \hat{V} {{\hat{S}}_{w} (θ)} - V {{\hat{S}}_{w} (θ)} ‖ \to 0$ in probability with respect to the super-population model and the sampling mechanism as $n \to \infty$ .

C7. A central limit theorem holds for the bootstrap weighted score function ${\hat{S}}_{w}^{*} (θ)$ uniformly over $B$ . Specifically, (7) $n^{1 / 2} {{\hat{S}}_{w}^{*} (θ) - {\hat{S}}_{w} (θ)} | A \to N (0, Σ_{S} (θ))$ (7) in distribution uniformly over $B$ , and the reference distribution is the bootstrap sampling distribution conditional on the sample A.

C8. $\sup_{θ \in B} ‖ {\hat{I}}_{w}^{*} (θ) - {\hat{I}}_{w} (θ) ‖ \to 0$ in probability with respect to the bootstrap sampling distribution conditional on the sample A as $n \to \infty$ , where ${\hat{I}}_{w}^{*} (θ) = N^{- 1} \sum_{i \in A} w_{i}^{*} I (θ; x_{i}, y_{i})$ .

Condition C6 guarantees that the scaled bootstrap variance of ${\hat{S}}_{w}^{*} (θ)$ converges to $Σ_{S} (θ)$ in Condition C3, and they are also used to show the bootstrap central limit theorem in Condition C7. Condition C8 is needed to establish the asymptotic normality of the bootstrap estimator ${\hat{θ}}^{*}$ . To check Condition C7, we may use a bootstrap version of the Lyapounov condition. As discussed in Condition C4, we do not claim that Condition C7 holds under general sampling designs, but we can validate both under certain sampling designs. For example, in Section S4 of the supplementary material, we use a Lyapounov-type condition to prove the two central limit theorems under stratified multi-stage sampling. Detailed comments on these regularity conditions are presented in Section S1 of the supplementary material. The high-level conditions are also verifiable under other commonly used sampling designs, including Poisson sampling.

Theorem 2.

Let the assumptions for Theorem 1 hold. Then, under Conditions C6–C8, we have (8) $\sqrt{n} ({\hat{θ}}^{*} - \hat{θ}) | A \to N (0, Σ_{θ})$ (8) in distribution as $n \to \infty$ , and the reference distribution on the left side of (8) is the bootstrap sampling distribution conditional on the sample A, where $Σ_{θ}$ is defined in Theorem 1.

Theorem 2 is proved under general sampling designs; see Section S3 of the supplementary material for details. Theorem 2 establishes the bootstrap central limit theorem for ${\hat{θ}}^{*}$ , and the limiting distribution coincides with the one in (3) for $\hat{θ}$ . Thus, the proposed bootstrap method can be used to approximate the sampling distribution of $\hat{θ}$ .

For estimating the mean of a finite population, Chao and Lo (Citation1985) first proved a bootstrap central limit theorem under simple random sampling without replacement, and Bickel and Freedman (Citation1984) proved the bootstrap central limit theorem under stratified random sampling without replacement. We now present stratified multi-stage sampling with clusters selected with replacement as an example to validate the bootstrap central limit theorem in Theorem 2. Stratified multi-stage sampling with clusters selected with replacement is extensively used in socio-economic surveys when the first-stage sampling fractions within strata are negligible, as is often the case with large-scale sample surveys.

Example 1.

Under stratified multi-stage sampling, a popular approach is to assume that $n_{h} (\geq 2)$ clusters are selected with replacement from the hth stratum, with the selection probabilities p_hi satisfying $\sum_{i = 1}^{N_{h}} p_{h i} = 1$ for $h = 1, \dots, H$ (Rao and Wu Citation1988), where N_h is the number of clusters in the hth stratum. As mentioned in Section 2, the selection probabilities can be random with respect to the super-population model. Within each sampled cluster, denote ${\hat{S}}_{w, h i} (θ)$ as a design-unbiased estimator of the cluster mean under sampling at the second and subsequent stages. For simplicity, we still use ${\hat{S}}_{w, h i} (θ)$ for the case when the (hi)th cluster is fully observed. Then, the weighted score function is ${\hat{S}}_{w} (θ) = \sum_{h = 1}^{H} W_{h} {\hat{S}}_{w, h} (θ)$ , where $W_{h} = M_{h} / M, M = \sum_{h = 1}^{H} M_{h}, M_{h} = \sum_{i = 1}^{N_{h}} M_{h i}$ , M_hi is the size of the (hi)th cluster, ${\hat{S}}_{w, h} (θ) = n_{h}^{- 1} \sum_{i = 1}^{n_{h}} {\tilde{S}}_{w, h i} (θ), {\tilde{S}}_{w, h i} (θ) = d_{a (h i)} p_{a (h i)}^{- 1} {\hat{S}}_{w, a (h i)} (θ), d_{a (h i)} = M_{a (h i)} / M_{h}$ , and a(hi) is the index of the ith selected cluster in the hth stratum.

The bootstrap replicate of ${\hat{S}}_{w} (θ)$ can be obtained in the following way. For the hth stratum, generate $n_{h}^{*} = {(n_{h 1}^{*}, \dots, n_{h n_{h}}^{*})}^{T}$ by a multinomial distribution with $n_{h} - 1$ trials and a success probability vector $n_{h}^{- 1} {(1, \dots, 1)}^{T}$ of length n_h. Then, the bootstrap weighted score function is ${\hat{S}}_{w}^{*} (θ) = \sum_{h = 1}^{H} W_{h} {\hat{S}}_{w, h}^{*} (θ)$ , where ${\hat{S}}_{w, h}^{*} (θ) = n_{h}^{- 1} \sum_{i = 1}^{n_{h}} r_{h i}^{*} {\tilde{S}}_{w, h i} (θ)$ , $r_{h i}^{*} = k_{h} n_{h i}^{*}$ , and $k_{h} = n_{h} / (n_{h} - 1)$ ; Rao, Wu, and Yue (Citation1992) proposed a similar rescaling bootstrap method under stratified multi-stage sampling, but they did not investigate the corresponding theoretical properties with respect to hypothesis testing of super-population model parameters. By Lemma S7 in Section S4 of supplementary material, we conclude that $E_{*} {{\hat{S}}_{w}^{*} (θ)} = {\hat{S}}_{w} (θ)$ and $V_{*} {{\hat{S}}_{w}^{*} (θ)} = \hat{V} {{\hat{S}}_{w} (θ) | F_{N}}$ , where $\hat{V} {{\hat{S}}_{w} (θ) | F_{N}} = \sum_{h = 1}^{H} W_{h}^{2} {n_{h} (n_{h} - 1)}^{- 1} {\tilde{S}}_{h, vec} (θ) (I_{n_{h}} - P_{1, n_{h}}) {\tilde{S}}_{h, vec}^{T} (θ), P_{1, n_{h}} = 1_{n_{h}} {(1_{n_{h}}^{T} 1_{n_{h}})}^{- 1} 1_{n_{h}}^{T}$ is the projection matrix to the linear space spanned by $1_{n_{h}}$ , $1_{n_{h}} = {(1, \dots, 1)}^{T}$ is a vector consisting of n_h 1’s, $I_{n_{h}}$ is an $n_{h} \times n_{h}$ identity matrix, and ${\tilde{S}}_{h, vec} (θ) = (d_{a (h 1)} p_{a (h 1)}^{- 1} {\hat{S}}_{w, a (h 1)} (θ), \dots, d_{a (h n_{h})} p_{a (h n_{h})}^{- 1} {\hat{S}}_{w, a (h n_{h})} (θ)) .$

In Section S4 of supplementary material, we show that the bootstrap central limit theorem in Theorem 2 holds with $n = \sum_{h = 1}^{H} n_{h}$ .

Under stratified simple random sampling without replacement (Bickel and Freedman Citation1984), a bootstrap method is proposed in Section S7 of the supplementary material, and it is essentially the same as the one considered by Rao and Wu (Citation1988) when estimating the population mean; see Rao and Wu (Citation1988) and Beaumont and Patak (Citation2012) for details.

To establish a bootstrap version of Wilks’ Theorem in (6), we use $W^{*} (\hat{θ}) = - 2 n {l_{w}^{*} (\hat{θ}) - l_{w}^{*} ({\hat{θ}}^{*})}$

as the bootstrap version of $W (θ_{0})$ in (5). Under the assumptions for Theorem 2, we can show that (9) $W^{*} (\hat{θ}) | A \to G$ (9) in distribution as $n \to \infty$ , and the reference distribution is the bootstrap sampling distribution conditional on the sample A, where $G$ is the limiting distribution (6) of the quasi-likelihood ratio test statistic $W (θ_{0})$ in (5) under the null hypothesis.

By (9), we conclude that the corresponding test statistic generated by the proposed bootstrap method has the same asymptotic distribution as in (5). Thus, we can use the bootstrap distribution of $W^{*} (\hat{θ})$ to approximate the sampling distribution of $W (θ_{0})$ .

Remark 2. A

s mentioned in Section 1, most existing bootstrap methods were proposed for variance estimation under survey sampling. There indeed exist some papers discussing bootstrap confidence intervals under survey sampling, but these methods do not apply to general hypothesis testing problems. For example, Rao, Wu, and Yue (Citation1992) proposed a bootstrap-t confidence interval for the parameter of interest, but their bootstrap confidence interval does not apply when the parameter of interest is multi-dimensional. In addition, their bootstrap method is only valid under the stratified simple random sampling. Beaumont and Patak (Citation2012) and Bertail and Combris (Citation1997) discussed bootstrap confidence intervals in their simulation studies, but it is not clear how the corresponding intervals are constructed. Also, neither Bertail and Combris (Citation1997) nor Beaumont and Patak (Citation2012) investigated their bootstrap confidence intervals theoretically. Theorem 2, on the other hand, is a building block to show that the proposed bootstrap method can be used for hypothesis testing under general sampling designs. Although we adopt almost the same technique to prove Theorems 1–2, we are the first to establish that the bootstrap estimator ${\hat{θ}}^{*}$ has the same asymptotic distribution as the original $\hat{θ}$ . Then, we can show that the corresponding bootstrap statistic $W^{*} (\hat{θ})$ shares the same asymptotic distribution as the original one $W (θ_{0})$ under the null hypothesis, so the proposed bootstrap method can be used for hypothesis testing without deriving the corresponding variance estimator as in Lumley and Scott (Citation2014). In addition, the proposed bootstrap method can be widely applied in commonly used sampling designs; see Sections S4–S5 of the supplementary material for details.

Remark 3.

In many practical situations, the design weights $w_{i} = π_{i}^{- 1}$ are further modified to incorporate population-level constraints (as in calibration weighting) or to handle unit nonresponse (as in nonresponse weighting adjustment). In this case, the bootstrap design weights can be modified in the same way to obtain the bootstrap final weights. See Bessonneau et al. (Citation2021) for a detailed illustration of constructing bootstrap final weights in household surveys. When the test statistic is computed using the final weights rather than the design weights, the bootstrap final weights can be applied to the same test statistics to obtain the bootstrap distribution of the test statistic.

4 Quasi-Likelihood-Ratio Test

Without loss of generality, we denote $θ$ as the true parameter $θ_{0}$ in this section. Let $Θ_{0} \subset Θ$ be the parameter space under the null hypothesis, so the null hypothesis can be written as $H_{0} : θ \in Θ_{0}$ . In this section, we consider $H_{0} : θ_{2} = θ_{2}^{(0)}$ for some known vector $θ_{2}^{(0)}$ , where $θ = {(θ_{1}^{T}, θ_{2}^{T})}^{T}$ . Thus, we have $Θ_{0} = {θ \in Θ; θ_{2} = θ_{2}^{(0)}} .$ Let ${\hat{θ}}_{1}^{(0)}$ be the profile pseudo M-estimator of $θ_{1}$ under $H_{0} : θ_{2} = θ_{2}^{(0)}$ , which is obtained by maximizing $l_{w} (θ_{1}, θ_{2}^{(0)})$ with respect to $θ_{1}$ .

The quasi-likelihood ratio test statistic for testing $H_{0} : θ_{2} = θ_{2}^{(0)}$ is defined as (10) $W (θ_{2}^{(0)}) = - 2 n {l_{w} ({\hat{θ}}^{(0)}) - l_{w} (\hat{θ})},$ (10) where ${\hat{θ}}^{(0)} = {({\hat{θ}}_{1}^{(0)}^{T}, θ_{2}^{(0)}^{T})}^{T}$ . Under simple random sampling with replacement, $W (θ_{2}^{(0)})$ in (10) is asymptotically distributed as $χ^{2} (q)$ with $q = p - p_{0}$ and $p_{0} = dim (Θ_{0})$ , and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where $dim (Θ)$ is the dimension of Θ. The following lemma, proved by Lumley and Scott (Citation2014), presents the limiting distribution of the quasi-likelihood ratio test statistic in (10) for general sampling designs.

Lemma 1.

Under $H_{0} : θ_{2} = θ_{2}^{(0)}$ and the assumptions for Theorem 2, (11) $W (θ_{2}^{(0)}) \to G_{1} = \sum_{i = 1}^{q} c_{i} Z_{i}^{2}$ (11) in distribution as $n \to \infty$ , and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where $c_{1} \geq c_{2} \geq \dots \geq c_{q} > 0$ are the eigenvalues of $P = Σ_{θ, 2} ℐ_{22 \cdot 1} (θ_{1}, θ_{2}^{(0)})$ , $Z_{1}, \dots, Z_{q}$ are q independent random variables from the standard normal distribution, $Σ_{θ, 2}$ is the asymptotic variance of $n^{1 / 2} {\hat{θ}}_{2}$ in (3), $ℐ_{22 \cdot 1} (θ_{1}, θ_{2}) = ℐ_{22} (θ_{1}, θ_{2}) - ℐ_{21} (θ_{1}, θ_{2}) {ℐ_{11} (θ_{1}, θ_{2})}^{- 1} ℐ_{12} (θ_{1}, θ_{2}),$ and $ℐ_{i j} (θ_{1}, θ_{2}) = E [- \partial^{2} l {(θ_{1}, θ_{2}); X, Y} / (\partial θ_{i} \partial θ_{j}^{T})]$ for $i, j = 1, 2$ .

Lumley and Scott (Citation2014) proposed to estimate the limiting distribution in (11) using a design-based estimator of $P = Σ_{θ, 2} ℐ_{22 \cdot 1} (θ_{1}, θ_{2}^{(0)})$ .

We consider an alternative test using a novel application of the bootstrap method discussed in Section 3. To do this, a bootstrap version of the quasi-likelihood ratio test statistic in (10) is obtained as (12) $W^{*} ({\hat{θ}}_{2}) = - 2 n {l_{w}^{*} ({\hat{θ}}_{1}^{* (0)}, {\hat{θ}}_{2}) - l_{w}^{*} ({\hat{θ}}^{*})},$ (12) where ${\hat{θ}}_{1}^{* (0)} = \arg \max_{θ_{1}} l_{w}^{*} (θ_{1}, {\hat{θ}}_{2})$ , and ${\hat{θ}}_{2}$ is the second component of $\hat{θ} = {({\hat{θ}}_{1}^{T}, {\hat{θ}}_{2}^{T})}^{T}$ . Under the conditions of Theorem 2, we can show that the limiting distribution for $W^{*} ({\hat{θ}}_{2})$ is the same as that in (11). Thus, the bootstrap distribution using $W^{*} ({\hat{θ}}_{2})$ in (12) can be used to obtain the reference distribution of the test statistic $W (θ_{2}^{(0)})$ in (10) under $H_{0} : θ_{2} = θ_{2}^{(0)}$ .

Remark 4.

In addition to the quasi-likelihood ratio test, the proposed bootstrap method is also applicable to quasi-score tests (Rao, Scott, and Skinner Citation1998). To develop a quasi-score test for $H_{0} : θ_{2} = θ_{2}^{(0)}$ , we first decompose the survey-weighted score function as (13) ${\hat{S}}_{w} (θ) = (\begin{matrix} {\hat{S}}_{w 1} (θ) \\ {\hat{S}}_{w 2} (θ) \end{matrix}) = (\begin{matrix} N^{- 1} \sum_{i \in A} w_{i} u_{1} (θ; x_{i}, y_{i}) \\ N^{- 1} \sum_{i \in A} w_{i} u_{2} (θ; x_{i}, y_{i}) \end{matrix})$ (13) with $u_{j} (θ; x_{i}, y_{i}) = (y_{i} - μ_{i}) {V_{0} (μ_{i})}^{- 1} (\partial μ_{i} / \partial θ_{j})$ for j = 1, 2, where $μ_{i} = μ (x_{i}; θ), μ (x; θ) = E (Y | x)$ is a known function with an unknown parameter $θ$ , and $V_{0} (μ_{i}) = V (Y | x)$ is a known function of μ_i. Note that quasi-score tests clearly specify the model without requiring a correctly specified density function, so they are more important in practice. Let ${\hat{θ}}_{1}^{(0)}$ be the solution to ${\hat{S}}_{w 1} (θ_{1}, θ_{2}^{(0)}) = 0,$ where ${\hat{S}}_{w 1} (θ)$ is defined in (13). The quasi-score test statistic for $H_{0} : θ_{2} = θ_{2}^{(0)}$ is (14) $X_{Q S}^{2} (θ_{2}^{(0)}) = {\hat{S}}_{w 2}^{T} ({\hat{θ}}^{(0)}) {{\hat{I}}_{w 22 \cdot 1} ({\hat{θ}}^{(0)})}^{- 1} {\hat{S}}_{w 2} ({\hat{θ}}^{(0)}),$ (14) where (15) ${\hat{I}}_{w 22 \cdot 1} (θ) = {\hat{I}}_{w 22} (θ) - {\hat{I}}_{w 21} (θ) {{\hat{I}}_{w 11} (θ)}^{- 1} {\hat{I}}_{w 12} (θ)$ (15)

and $(\begin{matrix} {\hat{I}}_{w 11} (θ) & {\hat{I}}_{w 12} (θ) \\ {\hat{I}}_{w 21} (θ) & {\hat{I}}_{w 22} (θ) \end{matrix}) = - (\begin{matrix} {\hat{S}}_{w 1} (θ) / \partial θ^{'}_{1} & {\hat{S}}_{w 1} (θ) / \partial θ^{'}_{2} \\ {\hat{S}}_{w 2} (θ) / \partial θ^{'}_{1} & {\hat{S}}_{w 2} (θ) / \partial θ^{'}_{2} \end{matrix}) .$

The bootstrap replicates of $X_{Q S}^{2} (θ_{2}^{(0)})$ can be constructed by replacing the sampling weights w_i and ${\hat{θ}}^{(0)}$ , respectively, by the bootstrap weights $w_{i}^{*}$ and the bootstrap estimator that solves ${\hat{S}}_{w}^{*} (θ) = N^{- 1} \sum_{i \in A} w_{i}^{*} {y_{i} - p_{i} (θ)} {(1, x_{i}^{T})}^{T} = 0$ subject to $θ_{2} = {\hat{θ}}_{2}$ . Then, the bootstrap version of the quasi-score test statistic $X_{Q S}^{2} (θ_{2}^{(0)})$ is (16) $X_{Q S}^{2 *} ({\hat{θ}}_{2}) = {\hat{S}}_{w 2}^{* T} ({\hat{θ}}^{* (0)}) {{\hat{I}}_{w 22 \cdot 1}^{*} ({\hat{θ}}^{* (0)})}^{- 1} {\hat{S}}_{w 2}^{*} ({\hat{θ}}^{* (0)}),$ (16) where ${\hat{θ}}^{* (0)} = {({\hat{θ}}_{1}^{* (0)}^{T}, {\hat{θ}}_{2}^{T})}^{T}$ , and ${\hat{I}}_{w 22 \cdot 1}^{*} ({\hat{θ}}^{* (0)})$ is computed similarly to (15) using bootstrap weights.

5 Test for Categorical Survey Data

5.1 Goodness-of-Fit Test

Suppose that a finite population U is partitioned into K categories with $U = U^{(1)} \cup \dots \cup U^{(K)}$ . Define $p_{k} = N_{k} / N$ to be the population proportion of the kth category, where N_k is the size of $U^{(k)}$ . Thus, we implicitly assume that the finite population is an iid realization of the super-population model following a multinomial distribution with K categories.

Suppose that we are interested in the simple goodness-of-fit test $H_{0} : p_{k} = p_{k}^{(0)}$ for $k = 1, \dots, K$ , where ${(p_{1}^{(0)}, \dots, p_{K}^{(0)})}^{T}$ is a pre-specified vector satisfying $\sum_{k = 1}^{K} p_{k}^{(0)} = 1$ . From the sample A, we compute ${\hat{p}}_{k} = {\hat{N}}_{k} / \hat{N}$ as an estimator of p_k, where ${\hat{N}}_{k} = \sum_{i \in A_{k}} w_{i}$ is a design-unbiased estimator of N_k, $A_{k} = A \cap U^{(k)}$ , and $\hat{N} = \sum_{k = 1}^{K} {\hat{N}}_{k} = \sum_{i \in A} w_{i}$ . The estimated proportions can be obtained by $\hat{p} = \arg \max_{p \in Θ} \sum_{i \in A} w_{i} l (p; x_{i})$ , where $l (p; x_{i}) = \sum_{k = 1}^{K} x_{i k} \log (p_{k})$ , with x_ik = 1 if the ith element belongs to the kth category and 0 otherwise, $p = {(p_{1}, \dots, p_{K})}^{T}$ .

Then, the Pearson Chi-squared goodness-of-fit test statistic for H₀ is $X^{2} (p^{(0)}) = n \sum_{k = 1}^{K} {({\hat{p}}_{k} - p_{k}^{(0)})}^{2} / p_{k}^{(0)},$

where $p^{(0)} = {(p_{1}^{(0)}, \dots, p_{K - 1}^{(0)})}^{T}$ . Under regularity conditions, according to Rao and Scott (Citation1981), we have (17) $X^{2} (p^{(0)}) \to G_{2} = \sum_{k = 1}^{K - 1} λ_{k} Z_{k}^{2}$ (17) in distribution as $n \to \infty$ under H₀, and the reference distribution is the joint distribution of the super-population model and the sampling mechanism, where $λ_{1} \geq λ_{2} \geq \dots \geq λ_{K - 1} > 0$ are the eigenvalues of the design effect matrix $D = P_{0}^{- 1} Σ_{p}, P_{0} = diag (p^{(0)}) - p^{(0)} {(p^{(0)})}^{T}, Σ_{p}$ is the asymptotic variance of $\sqrt{n} \hat{p}$ , and $Z_{1}, \dots, Z_{K - 1}$ are K – 1 independent random variables from the standard normal distribution. Under simple random sampling with replacement, the limiting distribution in (17) is a Chi-squared distribution with K – 1 degrees of freedom since $P_{0} = Σ_{p}$ .

We now apply the proposed bootstrap method to approximate the limiting distribution in (17), without computing the estimated covariance matrix ${\hat{Σ}}_{p}$ . To describe the proposed method, let ${\hat{p}}^{*}$ be the estimator of the population proportion $p$ based on the bootstrap weights $w_{i}^{*}$ . The proposed bootstrap statistic of $X^{2} (p^{(0)})$ is $X^{2 *} (\hat{p}) = n \sum_{i = 1}^{K} {({\hat{p}}_{i}^{*} - {\hat{p}}_{i})}^{2} / {\hat{p}}_{i} .$

We use ${\hat{p}}_{i}$ in place of $p_{i}^{(0)}$ in the bootstrap test statistics. Under the conditions of Theorem 2, we can show that the limiting distribution for $X^{2 *} (\hat{p})$ is the same as that in (17).

5.2 Test of Independence in a Two-Way Table

The proposed bootstrap can also be used to test independence in a two-way count table. Let $p_{i j} = N_{i j} / N$ be the population proportion for cell (i, j) with margins $p_{i +}$ and $p_{+ j}$ for $i = 1, \dots, R$ and $j = 1, \dots, C$ , where R and C are the numbers of rows and columns, and ${N_{i j}; i = 1, \dots, R, j = 1, \dots, C}$ is the set of population counts with margins $N_{i +}$ and $N_{+ j}$ . Let ${\hat{N}}_{i j}$ be a design-unbiased estimator of N_ij and ${\hat{p}}_{i j} = {\hat{N}}_{i j} / \hat{N}$ . By assuming a multinomial distribution for the super-population model, the Chi-squared statistic for testing independence $H_{0} : p_{i j} = p_{i +} p_{+ j}$ for all i and j is (18) $X_{I}^{2} = n \sum_{i = 1}^{R} \sum_{j = 1}^{C} \frac{{({\hat{p}}_{i j} - {\hat{p}}_{i +} {\hat{p}}_{+ j})}^{2}}{{\hat{p}}_{i +} {\hat{p}}_{+ j}} .$ (18)

The limiting distribution of $X_{I}^{2}$ is a mixture of $χ^{2} (1)$ distributions (Rao and Scott Citation1981).

To develop bootstrap replicates of the test statistics under H₀, let ${\hat{p}}_{i j}^{*}$ be the bootstrap cell proportion computed using the bootstrap weights, ${\hat{p}}_{i +}^{*} = \sum_{j = 1}^{C} {\hat{p}}_{i j}^{*}$ , and ${\hat{p}}_{+ j}^{*} = \sum_{i = 1}^{R} {\hat{p}}_{i j}^{*}$ . The proposed bootstrap version of $X_{I}^{2}$ is (19) $X_{I}^{2 *} = n \sum_{i = 1}^{R} \sum_{j = 1}^{C} \frac{{({\hat{p}}_{i j}^{*} - {\hat{p}}_{i +}^{*} {\hat{p}}_{+ j}^{*}) - ({\hat{p}}_{i j} - {\hat{p}}_{i +} {\hat{p}}_{+ j})}^{2}}{({\hat{p}}_{i +} {\hat{p}}_{+ j})} .$ (19)

Under H₀, the terms in the numerator of $X_{I}^{2}$ are identical to ${({\hat{p}}_{i j} - {\hat{p}}_{i +} {\hat{p}}_{+ j}) - (p_{i j} - p_{i +} p_{+ j})}^{2}$ . That is, the bootstrap test statistic is calculated by simply replacing ${{\hat{p}}_{i j}, {\hat{p}}_{i +}, {\hat{p}}_{+ j}}$ and ${p_{i j}, p_{i +}, p_{+ j}}$ with ${{\hat{p}}_{i j}^{*}, {\hat{p}}_{i +}^{*}, {\hat{p}}_{+ j}^{*}}$ and ${{\hat{p}}_{i j}, {\hat{p}}_{i +}, {\hat{p}}_{+ j}}$ , respectively. It can be shown that the limiting distribution of $X_{I}^{2 *}$ conditional on the sample is the same as that of $X_{I}^{2}$ .

6 Simulation Study

6.1 Single-Stage Sampling

In this section, we test the performance of the proposed bootstrap method under the PPS (probability proportional to size) sampling with replacement. A finite population of size N is generated as follows. For $i = 1, \dots, N$ , $x_{i} \sim U (- 5, 5)$ and $y_{i} = θ_{1} + θ_{2} x_{i} + ϵ_{i}$ , where $U (a, b)$ is a uniform distribution on the interval (a, b), $(θ_{1}, θ_{2}) = (1, 1)$ , and $ϵ_{i} \sim N (0, 2^{2})$ . To make the sampling design informative, the PPS sampling with replacement is carried out to generate a sample of size n with selection probabilities $p_{i} \propto (6 I_{i} + 1)$ and $\sum_{i = 1}^{N} p_{i} = 1$ , where I_i = 1 if $x_{i} ϵ_{i} > 0$ and 0 otherwise; see Section S6 of the supplementary material for details about the probability proportional to size sampling and the bootstrap weights. The sampling design is informative (Pfeffermann Citation1993), since the selection probabilities are partially determined by the residuals ${ϵ_{i} : i = 1, \dots, N}$ . A similar model setup was also used by Verret, Rao, and Hidiroglou (Citation2015) in the context of small area estimation. We consider two scenarios: $(N, n) = (2, 000, 200)$ and $(N, n) = (15, 000, 500)$ .

We are interested in testing $H_{0} : θ_{2} = θ_{2}^{(0)}$ with the nominal significance level of $α = 0.05$ and consider three different values for $θ_{2}^{(0)} \in {1, 1.1, 1.2}$ . The following testing methods are compared:

Naive likelihood ratio method with $W (θ_{2}^{(0)})$ in (10) and $χ^{2} (1)$ as test statistic and reference distribution, respectively.
Naive quasi-score method with test statistic (14) and $χ^{2} (1)$ as the reference distribution.
Lumley and Scott (Citation2014) method. The test statistic is $W (θ_{2}^{(0)}) / \hat{δ}$ , where $\hat{δ} = n \hat{V} ({\hat{θ}}_{2}) {\hat{I}}_{w, 22 \cdot 1}, {\hat{I}}_{w, 22 \cdot 1}$ is defined in (15), and $\hat{V} ({\hat{θ}}_{2})$ is obtained from ${\hat{I}}_{w}^{- 1} (\hat{θ}) \hat{V} {{\hat{S}}_{w} (θ)} {{\hat{I}}_{w}^{- 1} (\hat{θ})}^{T}$ . The reference distribution is $F_{1, k}$ , where $F_{ν_{1}, ν_{2}}$ is an F distribution with parameters ν₁ and ν₂, k is the degrees of freedom of the variance estimator based on the sampling design, and k is obtained by subtracting the number of parameters from the effective sample size associated with the sampling design.
Bootstrap quasi-likelihood-ratio method with $W (θ_{2}^{(0)})$ in (10) being the test statistic, and the reference distribution is approximated by the empirical distribution of $W^{*} ({\hat{θ}}_{2})$ in (12).
Bootstrap quasi-score method with $X_{Q S}^{2} (θ_{2}^{(0)})$ in (14) being the test statistic, and the reference distribution is approximated by the empirical distribution of the bootstrap test statistics,

When calculating the likelihood-ratio test statistics, we assume a normal distribution with a constant but unknown variance for the conditional distribution y_i given x_i. For each scenario, we generate 1000 Monte Carlo samples, and we consider M = 500 and M = 1000 iterations for both bootstrap methods. summarizes the simulation results. For both scenarios, the naive likelihood-ratio method and the naive quasi-score method have a significantly inflated Type I error rate when the null hypothesis is true. The corresponding design effects (Wu and Thompson Citation2020) are more than 2 for estimating θ₂ under both scenarios; see Section S8.1 of the supplementary material for details. Thus, the distributions of $W (θ_{2}^{(0)})$ and the quasi-score test statistic (14) are more “dispersed” than $χ^{2} (1)$ under the null hypothesis, and due to the lack of design-effect corrections, the two naive methods are not theoretically valid. The Type I error rates of the Lumley and Scott (Citation2014) method and the two proposed bootstrap testing methods are similar under H₀, regardless of the population and sample sizes. The Type I error rates of the three methods come closer to the nominal truth as the population and sample sizes increase. Furthermore, the powers of the two bootstrap testing methods are reasonable compared to the Lumley-Scott method. The naive methods have a slightly larger power at the expense of elevated Type I error rates. In addition, we get similar test results regardless of the number of bootstrap repetitions.

Table 1 Test power for the hypothesis test $H_{0} : θ_{2} = θ_{2}^{(0)}$ based on 1 000 Monte Carlo simulations, the true parameter is $θ_{2} = 1.00$ , and the nominal significance level is 0.05.

Display Table

Besides, Section S9 of the supplementary material reports results of an additional simulation study for a stratified two-stage sampling design.

6.2 Test of Independence

In this section, we consider test of independence in a 3 × 3 table of counts to check the performance of the proposed bootstrap testing method. For a finite population $U = {y_{i} : i = 1, \dots, N}$ , $y_{i}$ is generated by a multinomial distribution with one trial and a success probability vector $p$ , where $p = {(p_{11}, \dots, p_{i j}, \dots, p_{33})}^{T}$ , and p_ij is the success probability for the cell in the ith row and jth column for $i, j = 1, \dots, 3$ . That is, $y_{i}$ is a dummy variable consisting of eight 0’s and a 1. For the success probability vector $p$ , we consider three cases:

Case I: $p_{11} = 1 / 4, p_{12} = p_{13} = p_{21} = p_{31} = 1 / 8, p_{22} = p_{23} = p_{32} = p_{33} = 1 / 16$ .
Case II: $p_{11} = 1 / 4, p_{12} = p_{13} = (1.4) / 8, p_{21} = p_{31} = (0.6) / 8, p_{22} = p_{33} = 1 / 16, p_{23} = (1.4) / 16, p_{32} = (0.6) / 16$ .
Case III: $p_{11} = p_{2, 3} = p_{3, 2} = 1 / 6, p_{12} = p_{13} = p_{21} = p_{31} = p_{22} = p_{33} = 1 / 12$ .

Case I satisfies independence for the two-way table of counts, but Cases II–III do not. The level of nonindependence can be expressed by $γ = \sum_{i = 1}^{3} \sum_{j = 1}^{3} {(p_{i j} - p_{i +} p_{+ j})}^{2} / (p_{i +} p_{+ j})$ , and γ = 0 corresponds to independence. The values of γ are 0, 0.017 and 0.125 for Cases I–III, respectively.

For each $y_{i}$ , we generate an auxiliary variable $x_{i} = β^{T} y_{i}$ , where $β = {(β_{1}, \dots, β_{9})}^{T}, β_{j} = 0.5 + e_{j}$ for $j = 1, \dots, 9, e_{j} \sim Ex (1)$ . Probability proportional to size sampling with replacement is used to generate a sample of size n with selection probability proportional to x_i. We consider two scenarios for the population and sample sizes: $(N, n) = (2000, 75)$ and $(N, n) = (10, 000, 150)$ .

We are interested in testing independence in the two-way table with $α = 0.05$ nominal significance level. For each sample, we compare the following test methods:

Naive Pearson method based on $X_{I}^{2}$ in (18) with $χ^{2} (4)$ being the reference distribution.
The second-order Rao-Scott correction method using the Satterthwaite approximation (Thomas and Rao Citation1987). This method is widely used in analyzing two-way contingency table and is implemented by the command svychisq in the R package survey (Lumley Citation2020).
Bootstrap Pearson method using $X_{I}^{2}$ , and its distribution is approximated by that of $X_{I}^{2 *}$ in (19).

For each scenario, we generate 1000 Monte Carlo samples, and we consider M = 500 and M = 1000 iterations for both bootstrap methods. summarizes the simulation results. For Case I when the null hypothesis is true, the Type I error rate of the naive Pearson method is much larger than the nominal significance level 0.05 for different sample sizes, since the asymptotic distribution of the test statistic $X_{I}^{2}$ in (18) is more “dispersed” than a $χ^{2} (4)$ distribution under the null hypothesis; see Section S8.2 of the supplementary material for details. The performance of the Rao-Scott method with second-order correction is similar to the bootstrap testing method in terms of Type I error and power. The power of the bootstrap testing method increases with the value of γ. Similar to the previous simulation, the power of the naive method is larger at the expense of elevated Type I error rates. In addition, we get similar test results regardless of the number of bootstrap repetitions.

Table 2 Power of the test procedures for independence based on 1000 Monte Carlo simulation samples, Case I corresponds to the independence scenario, and the nominal significance level is 0.05.

Display Table

7 Application

We present an analysis of the 2011 Private Education Expenditure Survey (PEES) in South Korea using the proposed bootstrap testing methods. The purpose of this survey is to study the relationship between private education expenditure and academic performance of students before entering college.

A stratified two-stage sampling design was used for the 2011 PEES, and the strata consist of 16 first-tier administrative divisions, including most provinces and metropolitan cities in South Korea. For each stratum, PPS sampling with replacement was conducted in the first stage, and the primary sampling unit was the school. The students were randomly selected from the sample school in the second stage. There are about 1000 sample schools and 45,000 students involved in this survey.

For the student i in the sample A, let y_i be the academic performance assessed by the teacher, and it takes a value from 1 through 3 corresponding to low, middle and high academic performance, respectively. Associated with y_i, let $x_{i}$ be the vector of covariates of interest. As discussed by Kim, Park, and Lee (Citation2017), we consider the following covariates: after-school education, hours taking lessons provided by the school after regular classes in a month; father’s education, 1 for college or higher and 0 otherwise; gender, 1 for female and 0 for male; household income per month; mother’s education, 1 for college or higher and 0 otherwise; private education, hours taking private lessons in a month.

In this section, we study the academic performance of students in middle school and high school separately and are interested in studying the conditional probability of achieving high academic performance. Specifically, consider the following logistic model (20) $logit {pr (Y = 1 | x)} = (1, x^{T}) θ,$ (20) where $logit (x) = \log (x) - \log (1 - x)$ for $x \in (0, 1)$ , Y = 1 if high academic performance is achieved and 0 otherwise, $x$ is a vector of six covariates, and $θ = {(θ_{0}, θ_{1}, \dots, θ_{6})}^{T}$ . We are interested in testing the null hypotheses $H_{0, i} : θ_{i} = 0$ for $i = 1, \dots, 6$ with $α = 0.05$ nominal significance level. Since the Wald method is widely used in practice, the naive likelihood ratio method, naive quasi-score method, bootstrap quasi-likelihood ratio method, bootstrap quasi-score method with 1000 iterations and a two-sided Wald test are compared. The p-values for the two naive methods and the Wald test are obtained using reference distributions for simple random sampling. Specifically, the reference distribution for the two naive methods is $χ^{2} (1)$ , and that of the Wald test is a normal distribution with estimated variance (Fuller Citation2009, sec. 1.2.8) for $\hat{θ}$ using the sandwich formula. The p-values for the proposed bootstrap testing methods are obtained by bootstrap empirical distributions of the corresponding test statistics.

The results of our analysis are summarized in . The two bootstrap testing methods and the Wald test perform approximately the same in terms of p-values. The p-values of the two naive methods are approximately the same, but they differ from those of the bootstrap testing methods, especially for “after school education” at the middle school level and most covariates at the high school level, as the naive methods may not properly reflect the intra-cluster correlation due to cluster sampling.

Table 3 Estimates (Est) and the p-values (Unit: $10^{- 2}$ ) for testing $H_{0, i} : θ_{i} = 0$ , where $i = 1, \dots, 6$ , in (20) for the middle school and high school.

Display Table

Based on the two bootstrap testing methods in , we have the following conclusions under $α = 0.05$ nominal significance level. Controlling other covariates, the probability that female students achieve high academic performance is significantly higher than that of male students in middle school, but the gender effect is not significant in high school. Hours spent on private education and after-school education can significantly increase the probability of achieving high academic performance in both middle school and high school. The household income has a significantly positive influence on their child’s academic performance. However, the level of education of the father and mother has a significant influence during the middle school period only.

8 Concluding Remarks

Many statistical agencies provide microdata files containing survey weights and several sets of associated replication weights, in particular bootstrap weights. Standard statistical packages often permit the use of survey-weighted test statistics, and we have shown how to approximate their distributions under the null hypothesis by their bootstrap analogues computed from the bootstrap weights supplied in the data file. Using bootstrap weights, we can easily apply hypothesis testing without using specialized software designed for complex survey data, as noted by Beaumont and Bocci (Citation2009). The proposed bootstrap method is applied to quasi-likelihood ratio tests and quasi-score tests.

Our theories depend on establishing bootstrap central limit theorems under specified sampling designs, including stratified multi-stage sampling with clusters selected with replacement. Under some sampling designs, the central limiting theorem does not necessarily hold and the proposed method is not applicable. In the case of sampling designs that allow for the central limit theorem, the bootstrap central limit theorems can be established under some additional assumptions. Under the assumption that the bootstrap central limit theorem holds in the sampling design, the proposed hypothesis testing methods are applicable. The hypotheses we have discussed are special cases of the one considered in Section 4.2 of Jiang (Citation2019), so the theoretical properties of the proposed bootstrap method can also be investigated using the general method developed in Section 4.2 of Jiang (Citation2019). The proposed method can be applied to the case of categorical data by developing bootstrap procedures for testing simple goodness of fit and independence in a two-way table. We plan to extend our bootstrap method for categorical data to test hypotheses from multi-way tables of weighted counts or proportions, using a log-linear model approach proposed by Rao and Scott (Citation1984). Moreover, the bootstrap method can also be applied to nonresponse and weighting adjustments for variance estimation under regularity conditions, and extending the proposed bootstrap method for weight adjustments for hypothesis testing is also a promising project.

Supplementary Materials

Supplementary Material contains the following: comments on the regularity conditions (S1), proof of Theorem 1 (S2), proof of Theorem 2 (S3), verification of the proposed bootstrap method under stratified multi-stage sampling (S4), bootstrap method for Poisson sampling (S5) bootstrap method for probability proportional to size sampling with replacement (S6), stratified simple random sampling (S7), design effect (S8) and an additional simulation study: stratified two-stage sampling (S9).

Supplemental material

Supplemental Material

Download Zip (276 KB)

Acknowledgments

We are grateful to the associate editor and three anonymous referees for their constructive comments, which have greatly improved the article.

Additional information

Funding

Kim is partially supported by NSF (OAC-1931380). Rao is supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Wang is partially supported by NSFC (11901487, 71988101, 72033002, 12231011) and National Key R&D Program of China (2022YFA10038002).

References

Antal, E., and Tillé, Y. (2011), “A Direct Bootstrap Method for Complex Sampling Designs from a Finite Population,” Journal of the American Statistical Association, 106, 534–543. DOI: 10.1198/jasa.2011.tm09767.
Web of Science ®Google Scholar
Antal, E., and Tillé, Y. (2014), “A New Resampling Method for Sampling Designs Without Replacement: The Doubled Half Bootstrap,” Computational Statistics, 29, 1345–1363.
Web of Science ®Google Scholar
Beaumont, J.-F., and Bocci, C. (2009), “A Practical Bootstrap Method for Testing Hypotheses from Survey Data,” Survey Methodology, 35, 25–35.
Web of Science ®Google Scholar
Beaumont, J.-F., and Patak, Z. (2012), “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling,” International Statistical Review, 80, 127–148. DOI: 10.1111/j.1751-5823.2011.00166.x.
Web of Science ®Google Scholar
Berger, Y. G. (1998), “Rate of convergence to normal distribution for the Horvitz-Thompson estimator,” Journal of Statistical Planning and Inference, 67, 209–226. DOI: 10.1016/S0378-3758(97)00107-9.
Web of Science ®Google Scholar
Bertail, P., and Combris, P. (1997), “Bootstrap Généralisé d’un Sondage,” Annales d’Economie et de Statistique, 46, 49–83. DOI: 10.2307/20076068.
Google Scholar
Bessonneau, P., Brilhaut, G., Chauvet, G., and Garcia, C. (2021), “With-Replacement Bootstrap Variance Estimation for Household Surveys: Principles, Examples and Implementation,” Survey Methodology, 47, 313–347.
Web of Science ®Google Scholar
Bickel, P. J., and Freedman, D. A. (1984), “Asymptotic Normality and the Bootstrap in Stratified Sampling,” Annals of Statistics, 12, 470–482.
Web of Science ®Google Scholar
Breidt, F. J., and Opsomer, J. D. (2000), “Local Polynomial Regression Estimators in Survey Sampling,” Annals of Statistics, 28, 1026–1053.
Web of Science ®Google Scholar
Chambers, R., and Skinner, C. J. (2003), Analysis of Survey Data, Chichester: Wiley.
Google Scholar
Chao, M.-T., and Lo, S.-H. (1985), “A Bootstrap Method for Finite Population,” Sankhya: Series A, 47, 399–405.
Google Scholar
Chatterjee, S., and Bose, A. (2005), “Generalized Bootstrap for Estimating Equations,” Annals of Statistics, 33, 414–436.
Web of Science ®Google Scholar
Chauvet, G. (2015), “Coupling Methods for Multistage Sampling,” Annals of Statistics, 43, 2484–2506.
Web of Science ®Google Scholar
Chen, J., and Rao, J. (2007), “Asymptotic Normality Under Two-Phase Sampling Designs,” Statistica Sinica, 17, 1047–1064.
Web of Science ®Google Scholar
Chipperfield, J., and Preston, J. (2007), “Efficient Bootstrap for Business Surveys,” Survey Methodology, 33, 167–172.
Web of Science ®Google Scholar
Fuller, W. A. (2009), Sampling Statistics, New Jersey: Wiley.
Google Scholar
Hájek, J. (1964), “Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population,” Annals of Mathematical Statistics, 35, 1491–1523.
Google Scholar
Han, Q., and Wellner, J. A. (2021), “Complex Sampling Designs: Uniform Limit Theorems and Applications,” Annals of Statistics, 49, 459–485.
Web of Science ®Google Scholar
Isaki, C. T., and Fuller, W. A. (1982), “Survey Design under the Regression Superpopulation Model,” Journal of the American Statistical Association, 77, 89–96. DOI: 10.1080/01621459.1982.10477770.
Web of Science ®Google Scholar
Jiang, J. (2019), Robust Mixed Model Analysis, Singapore: World Scientific.
Google Scholar
Kim, J. K., Michael Brick, J., Fuller, W. A., and Kalton, G. (2006), “On the Bias of the Multiple-Imputation Variance Estimator in Survey Sampling,” Journal of the Royal Statistical Society, Series B, 68, 509–521. DOI: 10.1111/j.1467-9868.2006.00546.x.
Google Scholar
Kim, J. K., Park, S., and Lee, Y. (2017), “Statistical Inference using Generalized Linear Mixed Models under Informative Cluster Sampling,” Canadian Journal of Statistics, 45, 479–497. DOI: 10.1002/cjs.11339.
Web of Science ®Google Scholar
Korn, E. L., and Graubard, B. I. (1999), Analysis of Health Surveys, New York: Wiley.
Google Scholar
Lumley, T. (2020), Analysis of Complex Survey Samples, R package version 3.37.
Google Scholar
Lumley, T., and Scott, A. J. (2014), “Tests for Regression Models Fitted to Survey Data,” Australian & New Zealand Journal of Statistics, 56, 1–14. DOI: 10.1111/anzs.12065.
Web of Science ®Google Scholar
Mashreghi, Z., Haziza, D., and Léger, C. (2016), “A Survey of Bootstrap Methods in Finite Population Sampling,” Statistics Surveys, 10, 1–52. DOI: 10.1214/16-SS113.
Google Scholar
Pfeffermann, D. (1993), “The Role of Sampling Weights when Modeling Survey Data,” International Statistical Review, 61, 317–337. DOI: 10.2307/1403631.
Web of Science ®Google Scholar
Praestgaard, J., and Wellner, J. A. (1993), “Exchangeably Weighted Bootstraps of the General Empirical Process,” Annals of Probability, 21, 2053–2086.
Web of Science ®Google Scholar
Rao, J. N. K., and Scott, A. J. (1981), “The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables,” Journal of the American Statistical Association, 76, 221–230. DOI: 10.1080/01621459.1981.10477633.
Web of Science ®Google Scholar
Rao, J. N. K., and Scott, A. J. (1984), “On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data,” Annals of Statistics, 12, 46–60.
Web of Science ®Google Scholar
Rao, J. N. K., Scott, A. J., and Skinner, C. J. (1998), “Quasi-Score Tests with Survey Data,” Statistica Sinica, 8, 1059–1070.
Web of Science ®Google Scholar
Rao, J. N. K., and Wu, C. F. J. (1988), “Resampling Inference with Complex Survey Data,” Journal of the American Statistical Association, 83, 231–241. DOI: 10.1080/01621459.1988.10478591.
Web of Science ®Google Scholar
Rao, J. N. K., Wu, C. F. J., and Yue, K. (1992), “Some Recent Work on Resampling Methods for Complex Surveys,” Survey Methodology, 18, 209–217.
Google Scholar
Rosén, B. (1972a), “Asymptotic Theory for Successive Sampling with Varying Probabilities without Replacement, I,” Annals of Mathematical Statistics, 43, 373–397.
Google Scholar
Rosén, B. (1972b), “Asymptotic Theory for Successive Sampling with Varying Probabilities without Replacement, II,” The Annals of Mathematical Statistics, 43, 748–776.
Google Scholar
Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005), “On the Two-Phase Framework for Joint Model and Design-based Inference,” Annals of Statistics, 33, 2789–2810.
Web of Science ®Google Scholar
Särndal, C.-E., Swensson, B., and Wretman, J. (2003), Model Assisted Survey Sampling, New York: Springer.
Google Scholar
Särndal, C.-E., Swensson, B., and Wretman, J. H. (1989), “The Weighted Residual Technique for Estimating the Variance of the General Regression Estimator of the Finite Population Total,” Biometrika, 76, 527–537. DOI: 10.1093/biomet/76.3.527.
Web of Science ®Google Scholar
Shao, J. (2003), Mathematical Statistics, New York: Springer.
Google Scholar
Sverchkov, M., and Pfeffermann, D. (2004), “Prediction of Finite Population Totals based on the Sample Distribution,” Survey Methodology, 30, 79–92.
Google Scholar
Thomas, D. R., and Rao, J. N. K. (1987), “Small-Sample Comparisons of Level and Power for Simple Goodness-of-Fit Statistics under Cluster Sampling,” Journal of the American Statistical Association, 82, 630–636. DOI: 10.1080/01621459.1987.10478476.
Web of Science ®Google Scholar
Toth, D., and Eltinge, J. L. (2011), “Building Consistent Regression Trees From Complex Sample Data,” Journal of the American Statistical Association, 106, 1626–1636. DOI: 10.1198/jasa.2011.tm10383.
Web of Science ®Google Scholar
Verret, F., Rao, J. N. K., and Hidiroglou, M. A. (2015), “Model-based Small Area Estimation Under Informative Sampling,” Survey Methodology, 41, 333–347.
Web of Science ®Google Scholar
Wu, C., and Thompson, M. E. (2020), Sampling Theory and Practice, Gewerbestrasse: Springer.
Google Scholar

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

ABSTRACT

1 Introduction

2 Basic Setup

3 Bootstrap Calibration

4 Quasi-Likelihood-Ratio Test

5 Test for Categorical Survey Data

5.1 Goodness-of-Fit Test

5.2 Test of Independence in a Two-Way Table

6 Simulation Study

6.1 Single-Stage Sampling

Table 1 Test power for the hypothesis test $H_{0} : θ_{2} = θ_{2}^{(0)}$ based on 1 000 Monte Carlo simulations, the true parameter is $θ_{2} = 1.00$ , and the nominal significance level is 0.05.

6.2 Test of Independence

Table 2 Power of the test procedures for independence based on 1000 Monte Carlo simulation samples, Case I corresponds to the independence scenario, and the nominal significance level is 0.05.

7 Application

Table 3 Estimates (Est) and the p-values (Unit: $10^{- 2}$ ) for testing $H_{0, i} : θ_{i} = 0$ , where $i = 1, \dots, 6$ , in (20) for the middle school and high school.

8 Concluding Remarks

Supplementary Materials

Supplemental Material

Acknowledgments

Related Research Data

References

Information for

Open access

Opportunities

Help and information

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

ABSTRACT

1 Introduction

2 Basic Setup

3 Bootstrap Calibration

4 Quasi-Likelihood-Ratio Test

5 Test for Categorical Survey Data

5.1 Goodness-of-Fit Test

5.2 Test of Independence in a Two-Way Table

6 Simulation Study

6.1 Single-Stage Sampling

Table 1 Test power for the hypothesis test H0:θ2=θ2(0) based on 1 000 Monte Carlo simulations, the true parameter is θ2=1.00, and the nominal significance level is 0.05.

6.2 Test of Independence

Table 2 Power of the test procedures for independence based on 1000 Monte Carlo simulation samples, Case I corresponds to the independence scenario, and the nominal significance level is 0.05.

7 Application

Table 3 Estimates (Est) and the p-values (Unit: 10−2) for testing H0,i:θi=0, where i=1,…,6, in (20) for the middle school and high school.

8 Concluding Remarks

Supplementary Materials

Supplemental Material

Acknowledgments

Additional information

Funding

Related Research Data

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1 Test power for the hypothesis test $H_{0} : θ_{2} = θ_{2}^{(0)}$ based on 1 000 Monte Carlo simulations, the true parameter is $θ_{2} = 1.00$ , and the nominal significance level is 0.05.

Table 3 Estimates (Est) and the p-values (Unit: $10^{- 2}$ ) for testing $H_{0, i} : θ_{i} = 0$ , where $i = 1, \dots, 6$ , in (20) for the middle school and high school.