Search in:

Statistical Theory and Related Fields Volume 3, 2019 - Issue 2

Submit an article Journal homepage

Free access

176

Views

CrossRef citations to date

Altmetric

Listen

Articles

A resampling approach to estimation of the linking variance in the Fay–Herriot model

Snigdhansu ChatterjeeSchool of Statistics, University of Minnesota, Minneapolis, MN, USACorrespondence[email protected]

https://orcid.org/0000-0002-7986-0470 View further author information

Pages 170-177 | Received 27 Dec 2018, Accepted 30 Sep 2019, Published online: 14 Oct 2019

Cite this article
https://doi.org/10.1080/24754269.2019.1675408
CrossMark

In this article

ABSTRACT
1. Introduction
2. The resampling-based framework
3. Asymptotic consistency of proposed estimators
4. Some simulation studies
5. Conclusions
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In the Fay–Herriot model, we consider estimators of the linking variance obtained using different types of resampling schemes. The usefulness of this approach is that even when the estimator from the original data falls below zero or any other specified threshold, several of the resamples can potentially yield values above the threshold. We establish asymptotic consistency of the resampling-based estimator of the linking variance for a wide variety of resampling schemes and show the efficacy of using the proposed approach in numeric examples.

KEYWORDS:

Linking variance
Prasad–Rao estimator
paired bootstrap
m-out-of-n bootstrap
Bayesian bootstrap

1. Introduction

A typical survey on any aspect of human behaviour or natural phenomena is limited in scope by feasibility, ethics and cost constraints. Consequently, data are not always obtained in as fine a scale as required by stakeholders. In such cases, small area models are often useful for gathering resources across various domains and variables to obtain estimators and inferences with high precision. Detailed description of small area methods, principles and procedures may be found in the book (Rao & Molina, Citation2015), and in the papers (Chatterjee, Lahiri, & Li, Citation2008; Das, Jiang, & Rao, Citation2004; Datta, Hall, & Mandal, Citation2011; Datta, Rao, & Smith, Citation2005; Jiang & Lahiri, Citation2006; Jiang, Lahiri, & Wan, Citation2002; Li & Lahiri, Citation2010; Molina, Rao, & Datta, Citation2015; Pfeffermann, Citation2013; Pfeffermann & Glickman, Citation2004; Rao, Citation2015; Yoshimori & Lahiri,Citation2014a, Citation2014b).

In small area studies, arguably the most popular choice of a model for the observed data is the Fay–Herriot model (Fay & Herriot, Citation1979). This model is described by the following two-layered framework:

Sampling model: Conditional on the unknown and unobserved area-level effects $θ = (θ_{1}, \dots, θ_{n})^{T}$ , the sampled and observed data $Y_{n} = (Y_{1}, \dots, Y_{n})^{T}$ follows a n-variate Normal distribution with mean $θ$ and covariance matrix $D$ with known diagonal entries $D_{i} > 0$ and off-diagonal entries 0. This layer models the sampling level variability and distribution in the observations, conditional on the inherent characteristics $θ$ of the different small areas.
Linking model: The unobserved area-level effects $θ$ follows a n-variate Normal distribution with mean $X β$ for a known and non-random $n \times p$ matrix X and unknown but fixed vector $β \in R^{p}$ . The covariance matrix is $ψ I_{n}$ , where the matrix $I_{n}$ is the n dimensional identity matrix and $ψ > 0$ is an unknown positive-valued constant. This second layer of the Fay–Herriot model links the various small areas together by requiring that all the small areas share a common set of regression parameters $β \in R^{p}$ and a common variance component. We assume henceforth that $ψ > ε$ for some small known number ϵ.

Suppose $E \sim N_{n} (0, D)$ and $U \sim N_{n} (0, ψ I_{n})$ are mutually independent n-dimensional Gaussian random variables, respectively, denoting the errors and random effects. Then we may express the Fay–Herriot model as (1) $Y_{n} = X β + U + E \sim N_{n} (X β, D + ψ I_{n}) .$ (1) In the above Fay–Herriot model, the unknown parameters are the regression parameters $β \in R^{p}$ and a common linking variance component $ψ > 0$ . The distribution of the area-level effects $θ$ , conditional on the data $Y_{n}$ , is the primary quantity of interest in the Fay–Herriot model, and this distribution depends on β and ψ. One traditional approach is to estimate these parameters from data, and then use the estimators in subsequent prediction of $θ$ , conditional on the data $Y_{n}$ .

To assess the quality of prediction in a small area context, we may compute the mean squared prediction error or obtain a prediction interval. The typically Monte Carlo computation-driven technique of resampling is quite popular method for obtaining such intervals and errors bounds and a variety of other purposes, see for example, Chatterjee et al. (Citation2008), Datta et al. (Citation2011), Kubokawa and Nagashima (Citation2012). While most of these papers either use the parametric bootstrap or Efron's original bootstrap (Efron, Citation1979), other approaches to resampling are also viable, as discussed later in this paper.

One of the intermediate products in all such resampling computations are multiple copies of the estimators of β and ψ obtained from the resamples. The objective of this paper is to assess the quality of such resampling-driven estimators of the original parameters of the data, and we concentrate only on the linking variance parameter ψ here, since it is more critical in subsequent analysis. The immediate purpose of this study is to explore the possibility of using such resampling-based estimators of ψ in case the estimator from the original data takes too low a value.

We present new estimators of ψ, based on the resampling data in a Fay–Herriot model in Section 2, and related theoretical results in Section 3. Some simulation studies are presented in Section 4. Some concluding remarks are included in Section 5.

In the above description of the Fay–Herriot model and in the rest of this paper, we consider all vectors to be column vectors, and for any vector or matrix A, the notation $A^{T}$ denotes its transpose. For any vector a, the notation $| a |$ stands for its Euclidean norm. The notation $a_{i}$ will denote the ith element of vector a, similarly $A_{i j}$ will denote the $(i, j)$ th element of matrix A. The notation $t r (A)$ denotes the trace of a matrix A, $I_{A}$ is the indicator function of measurable set A, and $P, E, V$ are symbols denoting probability, expectation and variance generically. Other notations will be introduced and described as they arise.

We assume that p is fixed and p<n holds, and X is full column rank. We use the notation $x_{i}$ for the ith row of the matrix X, and $P_{x}$ for the projection matrix onto the columns of X. In fact, all our results are valid even when p increases with n as long as $p^{2} / n \to 0$ , but the proofs require some additional technicalities. We assume that $n^{- 1} X^{T} X$ tends towards a positive definite matrix. We also assume that $max_{1 \leq i \leq n} P_{x_{i i}} = O (p / n),$ that is, the maximum of the diagonal entries of the projection matrix $P_{x}$ is of the order of $n^{- 1} p$ . These conditions are very traditional and standard. We use the notation $D_{x}$ for the diagonal matrix that has the ith diagonal element as $P_{x_{i i}}$ , ie., $D_{x}$ and $P_{x}$ share the same diagonal elements but $D_{x}$ has all off-diagonal elements to be zero. Additionally, we assume that the sampling variances $D_{i}$ 's are bounded above, and away from zero.

2. The resampling-based framework

In the Fay–Herriot model, we implement resampling by associating a non-negative random weight, generically referred to as a resampling weight $W_{i}$ , with the ith observed data point $(Y_{i}, x_{i}, D_{i})$ , for $i = 1, \dots, n$ . We assume that these weights are exchangeable and independent of the data, and $E W_{1} = μ > 0$ and $V W_{1} = γ^{2} μ^{2} > 0$ , where $γ > 0$ . Henceforth we use the notation $E_{r}$ to denote resampling-expectation, that is, expectation with respect to the distribution of $W$ conditional on the observed data.

Define $W_{i} = γ^{- 1} μ^{- 1} (W_{i} - μ),$ for the centred and scaled version of these resampling weights. In minor abuses of notation, we also define $W$ as a diagonal matrix with the ith diagonal element being $W_{i}$ , and similarly W is a diagonal matrix whose ith diagonal element is $W_{i}$ . Thus we may write $W = μ I_{n} + γ μ W .$

Let $δ > 0$ be a pre-specified constant, and $A_{n}$ be the set on which the smallest eigenvalue of $X^{T} W X$ is higher than δ. Under standard regularity conditions the probability of the set $A_{n}^{C}$ is exponentially small, see for example, Chatterjee and Bose (Citation2000). Nevertheless, it is useful to consider the cases $A_{n}$ and $A_{n}^{C}$ separately, since differences arise occasionally with specific choices of $W$ in numeric computations with finite samples.

Using the resampling weights along with the observed data, the estimator for the regression slope parameters are ${\hat{β}}_{n r} = \{\begin{cases} (X^{T} W X)^{- 1} X^{T} W Y_{n} & on the set A_{n}, \\ {\hat{β}}_{n} & otherwise. \end{cases}$ In the above, the second subscript is to denote that this estimator is obtained in the resampling scenario. In the following we describe the case for the set $A_{n}$ only for brevity, the asymptotically negligible case $A_{n}^{C}$ can be addressed using routine algebra. The above leads to the resampling-based fitted values ${\hat{Y}}_{n r} = X {\hat{β}}_{n r}$ , and the resulting residuals $\begin{aligned} R_{n r} & = Y_{n} - {\hat{Y}}_{n r} = (I_{n} - X (X^{T} W X)^{- 1} X^{T} W) Y_{n} \\ = (I_{n} - X (X^{T} W X)^{- 1} X^{T} W) (E + U) . \end{aligned}$

In the rest of the paper, we study two different resampling-driven estimators of ψ. Let $T_{n r} = \frac{E_{r} | R_{n r} |^{2} - t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D (I_{n} - P_{x})}{t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x})}$

The resampling-based Prasad–Rao estimator for ψ is defined as (2) ${\hat{ψ}}_{n r} = T_{n r} I_{{T_{n r} > ε}} .$ (2)

The above estimator is based on Prasad and Rao (Citation1990), where we fit an ordinary least squares regression on the original data $Y_{n}$ using the covariates X, thus obtaining ${\hat{β}}_{n} = (X^{T} X)^{- 1} X^{T} Y_{n}$ . Based on the above estimator for β, define the vector of residuals $R = (I_{n} - P_{x}) Y_{n}$ . We use the notation $R_{i}$ for the ith element of R. The PR estimator for ψis given by (3) ${\hat{ψ}}_{n} = (n - p)^{- 1} \{| R |^{2} - \sum_{i = 1}^{n} D_{i} + \sum_{i = 1}^{n} (P_{x} D)_{i}\} .$ (3)

The second resampling-based estimator of ψ that we study is a variant of the above. Fix a small $ϵ_{0} > 0$ that is smaller than the lower bound on ψ. Define the set $\begin{aligned} A_{n r} & = {n^{- 1} | R_{n r} |^{2} \geq n^{- 1} t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D \\ \times (I_{n} - P_{x}) + ϵ_{0} / 2} . \end{aligned}$ The alternative resampling-based estimator that we propose is ${\tilde{ψ}}_{n r} = \frac{\begin{matrix} E_{r} | R_{n r} |^{2} I_{A_{n r}} - t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D \\ (I_{n} - P_{x}) \end{matrix}}{t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x})} .$ Note that on the set $A_{n r}$ , $| R_{n r} |^{2}$ is greater than $t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D (I_{n} - P_{x})$ , consequently ${\tilde{ψ}}_{n r} > 0$ almost surely. Essentially, the cases where the rth resample has $n^{- 1} | R_{n r} |^{2} \leq n^{- 1} t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D (I_{n} - P_{x}) + ϵ_{0} / 2$ and take an average over the rest of the cases.

This PR estimator is positive almost surely when the intercept is the only covariate but may return a negative value under some circumstances. In fact, an often-encountered challenge in small area studies is to get an estimate of ψ that is positive since standard estimation techniques like the Prasad–Rao method discussed above may result in a potential negative value as an estimate of variance. To address this issue, several techniques have been proposed, see for example Li and Lahiri (Citation2010), Yoshimori and Lahiri (Citation2014a), Hirose and Lahiri (Citation2017), Chatterjee (Citation2018).

Our resampling-based proposed estimators achieve such positive values several times even when ${\hat{ψ}}_{n} < ε$ . We present some numeric examples in Section 4.

Multiple traditional resampling scheme may be realised as special cases of the above framework. Consider a probability experiment where we have n boxes, and n balls, and we allocate balls to boxes randomly and independently, with each ball having the equal probability of 1/n of being allocated to any of the n boxes. The n-dimensional random vector obtained by counting the number of balls in the ith box, for $i = 1, \dots, n$ may be considered as an example of $(W_{1}, \dots, W_{n})$ , with $μ = 1$ and $γ = (1 - n^{- 1})^{1 / 2}$ . This example is mathematically identical to the famous paired bootstrap of Efron.

If we modify the above probability experiment slightly, and consider the case where we have only m balls, the above framework obtains $(W_{1}, \dots, W_{n})$ that corresponds to the m-out-of-n bootstrap (referred to as moon-bootstrap in the rest of this paper). If we consider the case where $W_{i}$ are independent and identically distributed exponential random variables, we obtain variations of the Bayesian bootstrap of Rubin. Other interesting special cases and additional discussion on using resampling weights may be found in Præstgaard and Wellner (Citation1993), Barbe and Bertail (Citation1995), Chatterjee and Bose (Citation2005) and multiple references therein, and in the related chapter of Shao and Tu (Citation1995). The latter classic text, along with the other classic (Efron & Tibshirani, Citation1993) may be consulted for details about resampling in general.

3. Asymptotic consistency of proposed estimators

We assume that the cross moments on the centred and scaled resampling weights, as in $E_{r} \prod_{j = 1}^{J} W_{j}^{k_{j}}$ for small values of J and $\sum_{j = 1}^{J} k_{j}$ , do not grow very fast with n. Precise rate at which these grow are available in Chatterjee (Citation1998), Chatterjee and Bose (Citation2000), Chatterjee and Bose (Citation2005) and in other places, and it can be shown easily that such conditions hold for the standard choices of resampling weights. Naturally, when the resampling weights are independent, and the above cross moments are bounded and are often zero, easily satisfy such requirements. We omit detailed discussion on these conditions. We assume that γ is bounded, which is also satisfied by several resampling schemes, but not by others like the moon-bootstrap. Our theoretical results are valid even when we use a controlled rate of growth of γ with n that is applicable to the moon-bootstrap, and our numeric studies validate this later in this paper. However, we discuss only the case of bounded γ to simplify some of the proofs. We now present our main results below.

Theorem 3.1

Under the conditions stated above, the resampling-based Prasad–Rao estimator is asymptotically consistent, that is ${\hat{ψ}}_{n r} \to ψ$ in probability, as $n \to \infty$ .

The different resampling-based estimator ${\tilde{ψ}}_{n r}$ , constructed by only considering those resamples where $| R_{n r} |^{2}$ exceeds a threshold, is consistent as well, apart from being always positive. We present this result in the following Theorem:

Theorem 3.2

Under the conditions stated above, ${\tilde{ψ}}_{n r} \to ψ$ in probability, as $n \to \infty$ .

Proof of Theorem 3.1

We omit several algebraic details below, especially those relating to proving asymptotically negligible terms are indeed negligible. Thus, the proof below is a sketch of the main arguments. Below, we use $R_{n k}$ for $k = 1, 2, 3, \dots$ as generic notation for negligible remainder terms. Also, we use c and K, with our without subscripts, as generic notation for constants.

Define $D = (X^{T} X)^{- 1 / 2} X^{T} W X (X^{T} X)^{- 1 / 2}$ . Then $\begin{aligned} X^{T} W X & = μ (X^{T} X + γ X^{T} W X) \\ = μ (X^{T} X)^{1 / 2} (I_{p} + γ (X^{T} X)^{- 1 / 2} X^{T} W X \\ \times (X^{T} X)^{- 1 / 2}) (X^{T} X)^{1 / 2} \\ = μ (X^{T} X)^{1 / 2} (I_{p} + γ D) (X^{T} X)^{1 / 2} . \end{aligned}$ Thus, $\begin{aligned} (X^{T} W X)^{- 1} & = μ^{- 1} (X^{T} X)^{- 1 / 2} (I_{p} + γ D)^{- 1} \\ \times (X^{T} X)^{- 1 / 2} . \end{aligned}$ Now notice that $(I_{p} + γ D)^{- 1} = I_{p} - γ D + γ^{2} D (I_{p} + γ D)^{- 1} D .$ Thus $\begin{aligned} (X^{T} W X)^{- 1} & = μ^{- 1} (X^{T} X)^{- 1 / 2} (I_{p} + γ D)^{- 1} (X^{T} X)^{- 1 / 2} \\ = μ^{- 1} (X^{T} X)^{- 1 / 2} {I_{p} - γ D \\ + γ^{2} D (I_{p} + γ D)^{- 1} D} (X^{T} X)^{- 1 / 2} \\ = μ^{- 1} [(X^{T} X)^{- 1} - γ (X^{T} X)^{- 1} X^{T} \\ \times W X (X^{T} X)^{- 1} + γ^{2} (X^{T} X)^{- 1} \\ (X^{T} W X) (X^{T} X)^{- 1 / 2} (I_{p} + γ D)^{- 1} \\ \times (X^{T} X)^{- 1 / 2} (X^{T} W X) (X^{T} X)^{- 1}] . \end{aligned}$ Thus we have $\begin{aligned} X (X^{T} W X)^{- 1} X^{T} & = μ^{- 1} [P_{x} - γ P_{x} W P_{x} + γ^{2} P_{x} \\ \times W X (X^{T} X)^{- 1 / 2} (I_{p} + γ D)^{- 1} \\ \times (X^{T} X)^{- 1 / 2} X^{T} W P_{x}] \\ = μ^{- 1} [P_{x} - γ P_{x} W P_{x} + γ^{2} R_{n 1}] . \end{aligned}$

Here, we have $\begin{aligned} R_{n 1} & = P_{x} W X (X^{T} X)^{- 1 / 2} (I_{p} + γ D)^{- 1} \\ (X^{T} X)^{- 1 / 2} X^{T} W P_{x} . \end{aligned}$ Note that for any non-zero $c \in R^{n}$ $\begin{aligned} E_{r} \frac{c^{T} R_{n 1} c}{c^{T} c} & \leq K \frac{c^{T} P_{x} D_{x} P_{x} c}{c^{T} c} \\ = O (n^{- 2}) . \end{aligned}$ Thus all eigenvalues of the expectation of $R_{n 1}$ are $O (n^{- 2})$ .

This leads us to $\begin{aligned} I_{n} - X (X^{T} W X)^{- 1} X^{T} W \\ = I_{n} - μ^{- 1} [P_{x} - γ P_{x} W P_{x} + γ^{2} R_{n 1}] \\ \times {μ (I_{n} + γ W)} \\ = I_{n} - [P_{x} - γ P_{x} W P_{x} + γ^{2} R_{n 1}] {I_{n} + γ W} \\ = I_{n} - P_{x} + γ P_{x} W P_{x} - γ^{2} R_{n 1} - γ P_{x} W \\ + γ^{2} P_{x} W P_{x} W - γ^{3} R_{n 1} W \\ = (I_{n} - P_{x}) - γ P_{x} W (I_{n} - P_{x}) \\ - γ^{2} (R_{n 1} + γ R_{n 1} W - P_{x} W P_{x} W) \\ = (I_{n} - γ P_{x} W) (I_{n} - P_{x}) - γ^{2} R_{n 2}, \end{aligned}$ where $R_{n 2} = R_{n 1} + γ R_{n 1} W - P_{x} W P_{x} W$ . We can show that for any non-zero $c \in R^{n}$ with $| c | = 1$ , we have $E c^{T} R_{n 2}^{T} R_{n 2} c = O (n^{- 4})$ . This is algebraic, we omit the details.

So that eventually we get $\begin{aligned} R_{n r} & = (I_{n} - γ P_{x} W) (I_{n} - P_{x}) (U + E) \\ - γ^{2} R_{n 2} (U + E) . \end{aligned}$

Hence we can write $\begin{aligned} | R_{n r} |^{2} & = t r {(I_{n} - γ P_{x} W)^{T} (I_{n} - γ P_{x} W) \\ \times (I_{n} - P_{x}) (U + E) (U + E)^{T} (I_{n} - P_{x})} \\ - 2 γ^{2} (U + E)^{T} R_{n 2}^{T} (I_{n} - γ P_{x} W) \\ \times (I_{n} - P_{x}) (U + E) + γ^{2} (U + E)^{T} \\ \times R_{n 2}^{T} R_{n 2} (U + E) \\ = t r {(I_{n} - γ P_{x} W)^{T} (I_{n} - γ P_{x} W) \\ \times (I_{n} - P_{x}) (U + E) (U + E)^{T} (I_{n} - P_{x})} \\ + γ^{2} R_{n 3} \\ = t r [{I_{n} - γ P_{x} W - γ W P_{x} + γ^{2} W P_{x} W} \\ \times {(I_{n} - P_{x}) (U + E) (U + E)^{T} (I_{n} - P_{x})}] \\ + γ^{2} R_{n 3} . \end{aligned}$ The exact expression of $R_{n 3}$ is available from the above expansion, we do not re-write it to avoid repetitions.

The resampling-expectation of the above is $\begin{aligned} E_{r} | R_{n r} |^{2} & = t r E_{r} [{I_{n} - γ P_{x} W - γ W P_{x} + γ^{2} W P_{x} W} \\ \times {(I_{n} - P_{x}) (U + E) (U + E)^{T} (I_{n} - P_{x})}] \\ + γ^{2} E_{r} R_{n 3} \\ = t r [{I_{n} + γ^{2} D_{x}} {(I_{n} - P_{x}) \\ \times (U + E) (U + E)^{T} (I_{n} - P_{x})}] \\ + γ^{2} E_{r} R_{n 3} \\ = t r [{I_{n} + γ^{2} D_{x}} {(I_{n} - P_{x}) \\ \times (U + E) (U + E)^{T} (I_{n} - P_{x})}] \\ + γ^{2} E_{r} R_{n 3} \\ = t r [{I_{n} + γ^{2} D_{x}} (I_{n} - P_{x}) \\ \times {D + ψ I_{n} + (U U^{T} - ψ I_{n}) \\ + (E E^{T} - D) + U E^{T} + E U^{T}} (I_{n} - P_{x})] \\ + γ^{2} E_{r} R_{n 3} . \end{aligned}$

Also notice that $\begin{aligned} t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) \\ = t r {I_{n} - P_{x} + γ^{2} D_{x} - γ^{2} \sum_{i = 1}^{n} {P_{x_{i i}}}^{2}} \\ = O (n) \end{aligned}$ with the first term contributing $O (n)$ , the second and third terms contributing $O (p) = O (1)$ and the last term contributing $O (n^{- 1} p^{2}) = O (n^{- 1})$ .

We can show, using the computations on $R_{n 1}$ and $R_{n 2}$ partially presented above, that $E (E_{r} R_{n 3})^{2} = O (n) .$ This computation is algebraic, we present a few instances of the main arguments we use for this step.

Consider the term $T_{n 1} = t r {I_{n} + γ^{2} D_{x}} (I_{n} - P_{x}) U E^{T} (I_{n} - P_{x}) .$ We have $\begin{aligned} T_{n 1} & = E^{T} (I_{n} - P_{x}) (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) U \\ = \sum_{a, b, c = 1}^{n} (I_{n} - P_{x})_{a b} (I_{n} + γ^{2} D_{x})_{b b} (I_{n} - P_{x})_{b c} E_{a} U_{c} \\ = \sum_{a, i, c = 1}^{n} (I_{n} - P_{x})_{a i} (1 + γ^{2} P_{x_{i i}}) (I_{n} - P_{x})_{i c} E_{a} U_{c} . \end{aligned}$ Thus we have $\begin{aligned} E T_{n 1}^{2} & = E \{\sum_{i = 1}^{n} (1 + γ^{2} P_{x_{i i}}) \sum_{a = 1}^{n} (I_{n} - P_{x})_{i a} E_{a} \\ \times {\sum_{c = 1}^{n} (I_{n} - P_{x})_{i c} U_{c}\}}^{2} \\ = E \{\sum_{i, j = 1}^{n} (1 + γ^{2} P_{x_{i i}}) (1 + γ^{2} P_{x_{j j}}) \\ \times \sum_{a_{1}, a_{2} = 1}^{n} (I_{n} - P_{x})_{i a_{1}} (I_{n} - P_{x})_{j a_{2}} E_{a_{1}} E_{a_{2}} \\ \times \sum_{c_{1}, c_{2} = 1}^{n} (I_{n} - P_{x})_{i c_{1}} (I_{n} - P_{x})_{i c_{2}} U_{c_{1}} U_{c_{2}}\} \\ = ψ E \{\sum_{i, j = 1}^{n} (1 + γ^{2} P_{x_{i i}}) (1 + γ^{2} P_{x_{j j}}) \\ \times \sum_{a = 1}^{n} (I_{n} - P_{x})_{i a} (I_{n} - P_{x})_{j a} D_{a} \\ \times \sum_{c = 1}^{n} (I_{n} - P_{x})_{i c} (I_{n} - P_{x})_{j c}\} . \end{aligned}$

As an example, consider the second component in the last term above. First, let us deal with the case i=j. Then this term becomes $\begin{aligned} \sum_{c = 1}^{n} (I_{n} - P_{x})_{i c}^{2} & = (1 - P_{x_{i i}})^{2} + \sum_{c = 1}^{n (\neq i)} {P_{x}}_{i c}^{2} \\ = (1 - P_{x_{i i}})^{2} - P_{x_{i i}} + \sum_{c = 1}^{n} {P_{x}}_{i c}^{2} \\ = (1 - P_{x_{i i}})^{2} = O (1) . \end{aligned}$

For the $i \neq j$ case, we have $\begin{aligned} \sum_{c = 1}^{n} (I_{n} - P_{x})_{i c} (I_{n} - P_{x})_{j c} & = - (1 - P_{x_{i i}}) {P_{x}}_{i j} \\ - (1 - P_{x_{j j}}) {P_{x}}_{i j} \\ + \sum_{c = 1}^{n (\neq i, j)} {P_{x}}_{i c} {P_{x}}_{j c} \\ = - {P_{x}}_{i j} - {P_{x}}_{i j} \\ + \sum_{c = 1}^{n} {P_{x}}_{i c} {P_{x}}_{j c} \\ = - {P_{x}}_{i j} - {P_{x}}_{i j} + {P_{x}}_{i j} \\ = - P_{x_{i j}} \end{aligned}$ Thus, we essentially repeatedly leverage the properties of a projection matrix for the algebraic computations. We omit further algebraic details, they are typically routine.

If we now write $R_{n 4} = \frac{E_{r} | R_{n r} |^{2} - t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D}{t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x})} - ψ,$ the above computations establish that $E R_{n 4}^{2} \to 0,$ as $n \to \infty$ , thus establishing the result.

Proof of Theorem 3.2

This proof contains several calculations in common with the proof of Theorem 3.1, consequently we only discuss the main ideas, and omit most algebraic details.

Recall from the proof of Theorem 3.1 that $\begin{aligned} n^{- 1} | R_{n r} |^{2} & = n^{- 1} t r [{I_{n} + γ^{2} D_{x}} (I_{n} - P_{x}) \\ \times {D + ψ I_{n} + (U U^{T} - ψ I_{n}) \\ + (E E^{T} - D) + U E^{T} + E U^{T}} (I_{n} - P_{x})] \\ + n^{- 1} γ^{2} E_{r} R_{n 3} \\ = X_{1} + Y_{1}, (s a y) . \end{aligned}$

For convenience, let us use the notation $ν = n^{- 1} t r (I_{n} + γ^{2} D_{x}) (I_{n} - P_{x}) D (I_{n} - P_{x})$ . We thus have $\begin{aligned} P [A_{n r}^{C}] & = P [n^{- 1} | R_{n r} |^{2} \leq ν + ϵ_{0} / 2] \\ \leq P [n^{- 1} | R_{n r} |^{2} \leq ν + ϵ_{0} / 2, | Y_{1} | > ϵ_{0} / 2] \\ + P [n^{- 1} | R_{n r} |^{2} \leq ν + ϵ_{0} / 2, | Y_{1} | \leq ϵ_{0} / 2] \\ \leq P [| Y_{1} | > ϵ_{0} / 2] + P [X_{1} \leq ν + ϵ_{0}] . \end{aligned}$

Our previous computations from the proof of Theorem 3.1 now ensure that $E (Y_{1}^{2}) = O (n^{- 1})$ as well as $V (X_{1}) = O (n^{- 1})$ , and hence both the above probabilities tend to zero, we omit some of the algebraic details here. Some additional algebraic details are then required to establish that the expectation of the square of the difference between ${\hat{ψ}}_{n r}$ and ${\tilde{ψ}}_{n r}$ is also $O (n^{- 1})$ , thus concluding the proof.

4. Some simulation studies

To understand the finite sample performance of the proposed estimator of ψ, we conducted a simulation study. In this study, we consider a framework based closely on the studies of Datta et al. (Citation2005). We use n=15 observations, and p=3 covariates, one of which is the intercept term, and the other two covariates are random generated in each replication of the experiment and fixed. We fix $ψ = 1$ and $β = (2, 3, 4)^{T}$ . We conduct the experiments with two different choices of the sampling variances. In Experiment 1, we fix the $D_{i}$ 's to be 2.0, 0.6, 0.5, 0.4, 0.2, each repeated 3 times. In Experiment 2, we fix the $D_{i}$ 's to be $20.0, 0.6, 0.5, 0.4, 0.2$ , each repeated 3 times.

We consider three different resampling schemes in this study. First, we consider $W$ to be the multinomial random variable corresponding to randomly assigning n balls in n boxes, each with equal probability. This corresponds to the paired bootstrap of Efron. Next, we consider $W$ to be the multinomial random variable corresponding to randomly assigning $m = [n^{0.8}]$ balls in n boxes, each with equal probability, which corresponds to the moon-bootstrap. Third, we consider each element of $W$ to have an exponential distribution with mean 1, thus realising the Bayesian bootstrap. The resampling Monte Carlo size is fixed at 1000 in all cases. The simulation experiment was replicated 200 times, and the results presented below are based on these 200 replications.

First, we enumerate the percentage of times estimators of ψ took values less than $ε = 10^{- 4}$ . This number varies on the various parameters used in the simulation experiment, and can approach even $80 - 90 %$ when we change only a single $D_{i}$ value or a parameter value within reasonable range.

In Experiment 1, it can be seen from Table that in approximately $5 %$ of the simulated datasets, the Prasad–Rao estimator based on the original data obtained a value less than ϵ. In these $6 %$ of the simulated datasets where the Prasad–Rao estimator took a value less than ϵ, different resampling schemes obtained between 19% and 25% cases where the Prasad–Rao estimator based on the resampling data were above ϵ. In the other $95 %$ cases where the Prasad–Rao estimator from the original data was greater than ϵ, its resampling data-based versions were typically $93 - 99 %$ times greater than ϵ as well. This suggests that using the Prasad–Rao estimator based on the resampling data may significantly improve the numeric performance of estimates of the conditional distribution of $θ$ given $Y_{n}$ , or related prediction intervals, conditional expectations and so on. In Experiment 2, it can be seen from Table that in approximately $52 %$ of the simulated datasets, the Prasad–Rao estimator based on the original data obtained a value less than ϵ. Other details are consistent with the findings from Experiment 1.

Table 1. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 1.

Display Table

Table 2. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 2.

Display Table

We obtain several plots from Experiment 2. In Figure , in the top panel, we present in plot (1) the boxplot of ${\hat{ψ}}_{n}$ from the 200 replications of the simulation experiment. We also select two such replications at random, one corresponding to a dataset where ${\hat{ψ}}_{n} \leq ε$ and the other where ${\hat{ψ}}_{n} > ε$ . In the top panel of Figure , plots (2) and (3) are the resampling estimator ${\hat{ψ}}_{n r}$ obtained using multinomial weights corresponding to the paired bootstrap in the randomly selected dataset for which ${\hat{ψ}}_{n} \leq ε$ . Plot (2) is from all resamples, and plot (3) is based on only the resamples where ${\hat{ψ}}_{n r} > ε$ . Plots (4) and (5) are ${\hat{ψ}}_{n r}$ obtained using multinomial weights corresponding to the paired bootstrap in the randomly selected dataset for which ${\hat{ψ}}_{n} > ε$ , respectively, based on all resamples and only the resamples where ${\hat{ψ}}_{n r} > ε$ . Plot (6) is from the ‘Mix’ estimator proposed by Rubin-Bleuer and You (Citation2016), which is always guaranteed to be positive. While the Mix estimator is always positive, notice that is also seems quite biased.

Figure 1. Boxplots of the Prasad–Rao (PR) estimators from Experiment-2. In the top panel, plot (1) is the boxplot of ${\hat{ψ}}_{n}$ from the 200 replications of the simulation experiment. The rest of the boxplots are from two randomly selected instances of these replications. Plots (2) and (3) are ${\hat{ψ}}_{n r}$ obtained using multinomial weights corresponding to the paired bootstrap in a case where ${\hat{ψ}}_{n} \leq ε$ . Plots (4) and (5) are ${\hat{ψ}}_{n r}$ obtained using multinomial weights corresponding to the paired bootstrap in a case where ${\hat{ψ}}_{n} > ε$ . Plot (6) is the Mix estimator. In the bottom panel, we consider a randomly selected case where ${\hat{ψ}}_{n} > ε$ . The first two boxplots, are from multinomial bootstrap, corresponding respectively to cases where we use all the resamples, and only the ones where ${\hat{ψ}}_{n r} > ε$ . The third and fourth are from moon bootstrap and fifth and sixth are from Bayesian bootstrap for all resamples and from resamples with ${\hat{ψ}}_{n r} > ε$ , respectively. Plot (7) is the Mix estimator.

Figure 1. Boxplots of the Prasad–Rao (PR) estimators from Experiment-2. In the top panel, plot (1) is the boxplot of ψˆn from the 200 replications of the simulation experiment. The rest of the boxplots are from two randomly selected instances of these replications. Plots (2) and (3) are ψˆnr obtained using multinomial weights corresponding to the paired bootstrap in a case where ψˆn≤ε. Plots (4) and (5) are ψˆnr obtained using multinomial weights corresponding to the paired bootstrap in a case where ψˆn>ε. Plot (6) is the Mix estimator. In the bottom panel, we consider a randomly selected case where ψˆn>ε. The first two boxplots, are from multinomial bootstrap, corresponding respectively to cases where we use all the resamples, and only the ones where ψˆnr>ε. The third and fourth are from moon bootstrap and fifth and sixth are from Bayesian bootstrap for all resamples and from resamples with ψˆnr>ε, respectively. Plot (7) is the Mix estimator.

In the bottom panel of Figure , we consider a randomly selected case where ${\hat{ψ}}_{n} > ε$ . The first two boxplots are from multinomial bootstrap, corresponding, respectively, to cases where we use all the resamples, and only the ones where ${\hat{ψ}}_{n r} > ε$ . The third and fourth are from moon bootstrap and fifth and sixth are from Bayesian bootstrap for all resamples and from resamples with ${\hat{ψ}}_{n r} > ε$ , respectively. The last plot is from the Mix estimator, which is again quite biased. The red horizontal line is at the value of ${\hat{ψ}}_{n}$ for that dataset. It can be seen that all the resampling estimators perform well, especially in cases where ${\hat{ψ}}_{n} > ε$ . For the cases where ${\hat{ψ}}_{n} \leq ε$ , the resampling-based estimators offer the alternative of obtaining a higher value of the estimator, while retaining their theoretical consistency properties. Plots from Experiment 1, which we do not present for brevity, are consistent with the above findings.

5. Conclusions

In this paper, we have only studied the resampling-based estimator for the Prasad–Rao formulation: further studies are needed for other choices of estimators. Considerable additional numeric studies are needed. Studies are needed, both from a theoretical as well as computational viewpoint, on the effect of the choice of a resampling-based ψ for prediction of $θ$ . One interesting special case of using resampling weights that needs additional exploration is when γ tends to zero with n, as in the case discussed in Chatterjee and Bose (Citation2002).

Acknowledgments

The author thanks the reviewers and editors for their comments, which helped improve the paper.

Disclosure statement

No potential conflict of interest was reported by the author.

ORCID

Snigdhansu Chatterjee http://orcid.org/0000-0002-7986-0470

Additional information

Funding

This research is partially supported by the National Science Foundation (NSF) [grant numbers # DMS-1622483 and # DMS-1737918].

Notes on contributors

Snigdhansu Chatterjee

Dr. Snigdhansu Chatterjee is Professor in the School of Statistics, and the Director of the Institute for Research in Statistics and its Applications (IRSA, http://irsa.stat.umn.edu/) at the University of Minnesota.

References

Barbe, P., & Bertail, P (1995). The weighted bootstrap, volume 98 of Lecture Notes in Statistics. New York, NY: Springer-Verlag.
Google Scholar
Chatterjee, S. (1998). Another look at the jackknife: Further examples of generalized bootstrap. Statistics and Probability Letters, 40(4), 307–319.
Web of Science ®Google Scholar
Chatterjee, S. (2018). On modifications to linking variance estimators in the Fay-Herriot model that induce robustness. Statistics And Applications, 16(1), 289–303.
Web of Science ®Google Scholar
Chatterjee, S., & Bose, A. (2000). Variance estimation in high dimensional regression models. Statistica Sinica, 10(2), 497–516.
Web of Science ®Google Scholar
Chatterjee, S., & Bose, A. (2002). Dimension asymptotics for generalised bootstrap in linear regression. Annals of the Institute of Statistical Mathematics, 54(2), 367–381.
Web of Science ®Google Scholar
Chatterjee, S., & Bose, A. (2005). Generalized bootstrap for estimating equations. The Annals of Statistics, 33(1), 414–436.
Web of Science ®Google Scholar
Chatterjee, S., Lahiri, P., & Li, H. (2008). Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models. The Annals of Statistics, 36(3), 1221–1245.
Web of Science ®Google Scholar
Das, K., Jiang, J., & Rao, J. N. K. (2004). Mean squared error of empirical predictor. The Annals of Statistics, 32(2), 818–840.
Web of Science ®Google Scholar
Datta, G. S., Hall, P., & Mandal, A. (2011). Model selection by testing for the presence of small-area effects, and application to area-level data. Journal of the American Statistical Association, 106(493), 362–374.
Web of Science ®Google Scholar
Datta, G. S., Rao, J. N. K., & Smith, D. D. (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika, 92(1), 183–196.
Web of Science ®Google Scholar
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26.
Web of Science ®Google Scholar
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Boca Raton, FL: Chapman & Hall/CRC press.
Google Scholar
Fay III, R. E., & Herriot, R. A. (1979). Estimates of income for small places: an application of James-Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269–277.
Web of Science ®Google Scholar
Hirose, M. Y., & Lahiri, P (2017). A new model variance estimator for an area level small area model to solve multiple problems simultaneously. arXiv preprint arXiv:1701.04176.
Google Scholar
Jiang, J., & Lahiri, P. (2006). Mixed model prediction and small area estimation (with discussion). Test, 15(1), 1–96.
Web of Science ®Google Scholar
Jiang, J., Lahiri, P., & Wan, S.-M. (2002). A unified jackknife theory for empirical best prediction with M-estimation. The Annals of Statistics, 30(6), 1782–1810.
Web of Science ®Google Scholar
Kubokawa, T., & Nagashima, B. (2012). Parametric bootstrap methods for bias correction in linear mixed models. Journal of Multivariate Analysis, 106, 1–16.
Web of Science ®Google Scholar
Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882–892.
PubMed Web of Science ®Google Scholar
Molina, I., Rao, J. N. K., & Datta, G. S. (2015). Small area estimation under a Fay–Herriot model with preliminary testing for the presence of random area effects. Survey Methodology, 41(1), 1–19.
Web of Science ®Google Scholar
Pfeffermann, D. (2013). New important developments in small area estimation. Statistical Science, 28(1), 40–68.
Web of Science ®Google Scholar
Pfeffermann, D., & Glickman, H. (2004). Mean square error approximation in small area estimation by use of parametric and nonparametric bootstrap. ASA Section on Survey Research Methods Proceedings, Alexandria, VA, pp. 4167–4178.
Google Scholar
Prasad, N. G. N., & Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409), 163–171.
Web of Science ®Google Scholar
Præstgaard, J., & Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empirical process. The Annals of Probability, 21(4), 2053–2086.
Web of Science ®Google Scholar
Rao, J. N. K. (2015). Inferential issues in model-based small area estimation: Some new developments. Statistics in Transition, 16(4), 491–510.
Google Scholar
Rao, J. N. K., & Molina, I. (2015). Small area estimation. New York, NY: Wiley.
Google Scholar
Rubin-Bleuer, S., & You, Y. (2016). Comparison of some positive variance estimators for the Fay–Herriot small area model. Survey Methodology, 42(1), 63–85.
Web of Science ®Google Scholar
Shao, J., & Tu, D. (1995). The jackknife and bootstrap. New York, NY: Springer-Verlag.
Google Scholar
Yoshimori, M., & Lahiri, P. (2014a). A new adjusted maximum likelihood method for the Fay–Herriot small area model. Journal of Multivariate Analysis, 124, 281–294.
Web of Science ®Google Scholar
Yoshimori, M., & Lahiri, P. (2014b). A second-order efficient empirical Bayes confidence interval. The Annals of Statistics, 42(4), 1233–1261.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

A resampling approach to estimation of the linking variance in the Fay–Herriot model

ABSTRACT

1. Introduction

2. The resampling-based framework

3. Asymptotic consistency of proposed estimators

Proof of Theorem 3.1

Proof of Theorem 3.2

4. Some simulation studies

Table 1. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 1.

Table 2. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 2.

5. Conclusions

Acknowledgments

Disclosure statement

Notes on contributors

Snigdhansu Chatterjee

References

Information for

Open access

Opportunities

Help and information

A resampling approach to estimation of the linking variance in the Fay–Herriot model

ABSTRACT

1. Introduction

2. The resampling-based framework

3. Asymptotic consistency of proposed estimators

Proof of Theorem 3.1

Proof of Theorem 3.2

4. Some simulation studies

Table 1. Percentages of cases where different estimators of ψ was below or above ε=10−4 in Experiment 1.

Table 2. Percentages of cases where different estimators of ψ was below or above ε=10−4 in Experiment 2.

5. Conclusions

Acknowledgments

Disclosure statement

ORCID

Additional information

Funding

Notes on contributors

Snigdhansu Chatterjee

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 1.

Table 2. Percentages of cases where different estimators of ψ was below or above $ε = 10^{- 4}$ in Experiment 2.