Full article: Simultaneous Inference for Empirical Best Predictors With a Poverty Study in Small Areas

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Today, generalized linear mixed models (GLMM) are broadly used in many fields. However, the development of tools for performing simultaneous inference has been largely neglected in this domain. A framework for joint inference is indispensable to carry out statistically valid multiple comparisons of parameters of interest between all or several clusters. We therefore develop simultaneous confidence intervals and multiple testing procedures for empirical best predictors under GLMM. In addition, we implement our methodology to study widely employed examples of mixed models, that is, the unit-level binomial, the area-level Poisson-gamma and the area-level Poisson-lognormal mixed models. The asymptotic results are accompanied by extensive simulations. A case study on predicting poverty rates illustrates applicability and advantages of our simultaneous inference tools.

Keywords:

1 Introduction

Generalized linear mixed models (GLMM) are suitable for modeling clustered and correlated data with categorical or count outcomes. They are ubiquitous in applied statistics, for example, in biometrics or small area estimation (SAE). In the latter, they serve to analyze surveys on a disaggregated level. Despite an increasing interest, for example, to guide resource allocation, the development of methods for simultaneous inference for predictors is missing. It is surprising as only those would make joint considerations of clusters valid. Available $(1 - α)$ -confidence intervals (CI) for mixed parameters (except the credibility intervals of Ganesh Citation2009) are constructed such that for each study at least $α 100 %$ of them do not contain the true value. Undoubtedly, practitioners do compare, but so far without valid statistical tools. We aim to close this distressing gap, not to improve any existing method.

Specifically, we introduce simultaneous confidence interval (SCI) and multiple test procedure (MTP) for the empirical best predictor (EBP) of Jiang (Citation2003). They are based on max-type statistics combined with extreme value theory. We prove asymptotic convergence of SCI and MTP for nested (or hierarchical) GLMM within the exponential family. We study the numerical performance of our SCI and MTP for two area-level and one unit-level mixed models that are widely used, for example, for studying local poverty rates (Pratesi Citation2016). All introduced methods show a satisfactory performance within considered modeling frameworks. Even though our estimates under the area-level models appear to be less volatile, one can argue that EBPs are not directly comparable because different methods and model classes are used. Finally, under area-level Poisson-gamma model, we derive a new mean squared error (MSE) estimator which is of crucial interest in SAE.

The amount of literature on estimation and testing under GLMM is considerable, see, i.a., the review of Tuerlinckx et al. (Citation2006), the monograph of Jiang (Citation2007), and the article of Ghosh et al. (Citation1998) which is particularly interesting within the context of SAE. Furthermore, researchers put forward several methodologies broadly used in the analysis of count data. Molina, Saei, and Lombardía (Citation2007) and Scealy (Citation2010) studied the estimation of labor force status using multinomial logistic models, whereas Saei and Taylor (Citation2012) focused on the same target parameter, and examined the performance of a bivariate random components model. Chandra, Chambers, and Salvati (Citation2012) and Franco and Bell (Citation2015) provided extensions for modeling proportions using logistic unit- and area-level models. Hobza and Morales (Citation2016) implemented the EBP for unit-level, and Boubeta, Lombardía, and Morales (Citation2016) for area-level GLMM to study poverty in small areas. Chambers, Salvati, and Tzavidis (Citation2012) and Tzavidis et al. (Citation2015) extended the M-quantile inference for robust estimation and prediction of count data. Yet, to the best of our knowledge, no one addresses the issue of simultaneous inference for clusters-level parameters when applying GLMM. Likewise, little research has been carried out on simultaneous inference for cluster level parameters in linear mixed effects models (LMM). Ganesh (Citation2009) developed credibility intervals for a mixed parameter in a particular area-level model. Reluga, Lombardía, and Sperlich (Citation2019) proposed bootstrap SCI and MTP for mixed parameter under LMM, whereas Kramlinger, Krivobokova, and Sperlich (Citation2018) developed a framework for marginal and conditional inference with quadratic forms.

After an introduction of a model and estimators in Section 2, we propose the construction of SCI and MTP for a general EBP, followed by the theoretical justifications in Section 3. Sections 4 and 5 present simulations and a case study. Conclusions are drawn in Section 6. More details are deferred to appendix and our supplementary material (SM).

2 Best Prediction for GLMM

Let D be the number of clusters or areas with $d \in [D]$ , n_d the number of sampled units in each area $j \in [n_{d}]$ with $n = \sum_{d = 1}^{D} n_{d}$ , N_d the known population sizes with $N = \sum_{d = 1}^{D} N_{d}, [A] = {1, \dots, A}$ . Since in our context the notion of cluster and area can be used synonymously, we proceed with the latter. Suppose that ${v_{d} : d = 1, \dots, D}$ is a set of independent and identically distributed (iid) random effects with unknown variance $δ^{2}, δ > 0$ , which is often parameterized as $v_{d} = δ u_{d}$ with $u_{d} \sim N (0, 1)$ . The target variable y_dj represents the jth sample observation from the dth area. Furthermore, we consider nested data structures such that $y_{d j} \neq y_{d' j}$ for $d \neq d'$ . In full generalization, we assume that random variables Y_dj, conditionally on a random effect u_d, are independent with a probability density function (pdf) from the exponential family $y_{d j} | u_{d} \sim Exp . Family (θ)$ $\begin{matrix} y_{d j} | u_{d} \sim indep . g_{Y_{d j} | u_{d}} (y_{d j} | u_{d}), \\ g_{d j} (y_{d j} | u_{d}, θ) = exp {φ^{- 1} [y_{d j} γ_{d j} - b (γ_{d j})] + c (y_{d j}, φ)}, \end{matrix}$ where $θ = {(β^{t}, δ, φ)}^{t}$ with δ the variability parameter, $β = {(β_{1}, \dots, β_{p})}^{t}$ regression parameters of auxiliary variables $x_{d j} = {(x_{d j 1}, \dots, x_{djp})}^{t}$ for which typically $x_{d j 1} = 1, \forall j \in [n_{d}] \forall d \in [D]$ , and γ_dj, $φ$ are canonical and scale parameters, respectively. Link function M relates $E (Y_{d j} | u_{d})$ to a linear mixed model such that $γ_{d j} = M {E (Y_{d j} | u_{d})} = x_{d j}^{t} β + δ u_{d}$ .

2.1 Estimation and Computation

Let $y_{d} = {(y_{d 1}, \dots, y_{d n_{d}})}^{t}$ for all $d \in [D]$ be the vector of outcomes, and $y = {(y_{1}^{t}, \dots, y_{D}^{t})}^{t}$ . A conditional pdf of y and the likelihood contribution from each area d are given by(1) $\begin{matrix} L_{d} (θ) : = f_{d} (y_{d} | θ) = \int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d} \\ = \int \prod_{j = 1}^{n_{d}} g_{d j} (y_{d j} | u_{d}, θ) h (u_{d}) d u_{d}, \end{matrix}$ (1) where $θ$ can be derived from $L (θ) : = \prod_{d = 1}^{D} L_{d} (θ) = \prod_{d = 1}^{D} \int \prod_{j = 1}^{n_{d}} g_{d j} (y_{d j} | u_{d}, θ) h (u_{d}) d u_{d} .$

In case of area-level models, n_d = 1, and EquationEquation (1)(1) $\begin{matrix} L_{d} (θ) : = f_{d} (y_{d} | θ) = \int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d} \\ = \int \prod_{j = 1}^{n_{d}} g_{d j} (y_{d j} | u_{d}, θ) h (u_{d}) d u_{d}, \end{matrix}$ (1) simplifies accordingly. For a concise presentation, we assume that there is a single random effect for each area such that the integral in EquationEquation (1)(1) $\begin{matrix} L_{d} (θ) : = f_{d} (y_{d} | θ) = \int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d} \\ = \int \prod_{j = 1}^{n_{d}} g_{d j} (y_{d j} | u_{d}, θ) h (u_{d}) d u_{d}, \end{matrix}$ (1) is one-dimensional. Extensions to multidimensional random effects follow immediately with some changes of notations and more complicated computation. Finding an analytical solution to EquationEquation (1)(1) $\begin{matrix} L_{d} (θ) : = f_{d} (y_{d} | θ) = \int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d} \\ = \int \prod_{j = 1}^{n_{d}} g_{d j} (y_{d j} | u_{d}, θ) h (u_{d}) d u_{d}, \end{matrix}$ (1) is difficult unless the integral can be simplified. Often one evaluates the integral numerically by Laplace approximation (LA) (De Bruijn Citation1981), Gaussian quadrature (GQ) (Naylor and Smith Citation1982) or adaptive GQ (AGQ) (Pinheiro and Bates Citation1995). In what follows, we proceed with AGQ as it is a higher order version of LA, that is, it gives smaller approximation errors (Bianconcini Citation2014). An alternative is the quasi-likelihood (Breslow and Clayton Citation1993) which suffers from a nondecreasing bias (Tuerlinckx et al. Citation2006), and the method of moments (Jiang Citation1998). In addition, researchers considerably advanced in developing methods to compute maximum likelihood (ML) estimators under GLMM. (Jiang Citation2007, sec. 4.1) proposed an expectation–maximization algorithm, whereas Lele, Nadeem, and Schmuland (Citation2010) developed the so-called data cloning subsequently implemented by Torabi (Citation2012).

Since we consider a prediction problem of possibly nonlinear mixed effects $ζ_{d} = ζ_{d} (β, u_{d})$ , we use the best predictor (BP) $\tilde{ζ_{d}}$ in the sense of minimizing the area-specific MSE in EquationEquation (3)(3) $\begin{matrix} MSE ({\hat{ζ}}_{d}) = E [{{\tilde{ζ}}_{d} (\hat{θ}) - ζ_{d}}^{2}] \\ = E [{{\tilde{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] + E [{{\tilde{ζ}}_{d} (θ) - ζ_{d}}^{2}] \\ = : g_{2 d} + g_{1 d}, \end{matrix}$ (3) which is actually the area-specific mean squared prediction error:(2) $\begin{matrix} {\tilde{ζ}}_{d} = \tilde{ζ_{d}} (θ) : = E {ζ_{d} (β, u_{d}) | y} = E {ζ_{d} (β, u_{d}) | y_{d}} \\ = \frac{\int ζ_{d} (β, u_{d}) g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d}}{\int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d}} . \end{matrix}$ (2)

Simplification of EquationEquation (2)(2) $\begin{matrix} {\tilde{ζ}}_{d} = \tilde{ζ_{d}} (θ) : = E {ζ_{d} (β, u_{d}) | y} = E {ζ_{d} (β, u_{d}) | y_{d}} \\ = \frac{\int ζ_{d} (β, u_{d}) g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d}}{\int g_{d} (y_{d} | u_{d}, θ) h (u_{d}) d u_{d}} . \end{matrix}$ (2) is possible by choosing the pdf of u_d accordingly. If we replace $θ$ by a consistent estimator, then we obtain EBP ${\hat{ζ}}_{d} : = {\tilde{ζ}}_{d} (\hat{θ})$ . Note that in order to obtain the consistency for random effects, one needs to assume that $n_{d} \to \infty$ for each ${\hat{ζ}}_{d}, d = 1, \dots, D$ (Jiang and Lahiri Citation2001).

Regarding the estimation of the variability of the EBP ${\hat{ζ}}_{d}$ , MSE is by far the most popular measure. Well known techniques to estimate MSE are the analytical approximation based on a Taylor expansion (Jiang Citation2003), and parametric bootstrap approaches (Boubeta, Lombardía, and Morales Citation2016; Hobza and Morales Citation2016). Consider the following MSE decomposition:(3) $\begin{matrix} MSE ({\hat{ζ}}_{d}) = E [{{\tilde{ζ}}_{d} (\hat{θ}) - ζ_{d}}^{2}] \\ = E [{{\tilde{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] + E [{{\tilde{ζ}}_{d} (θ) - ζ_{d}}^{2}] \\ = : g_{2 d} + g_{1 d}, \end{matrix}$ (3) which can be derived applying the law of iterated expectations (for details, see Jiang Citation2003, and our SM). The analytical formulas of MSE estimators are model dependent. Bootstrapping permits to obtain estimators that do not vary with the model assumed. In what follows, we denote with $E^{*}$ , $V {ar}^{*}$ , and ${MSE}^{*}$ , the corresponding bootstrap operators for expected value, variance and MSE and define(4) $\begin{matrix} {MSE}_{B}^{*} ({\hat{ζ}}_{d}) = E^{*} {{({\hat{ζ}}_{d}^{*} - ζ_{d}^{*})}^{2}} \\ \approx B_{1}^{- 1} \sum_{b_{1} = 1}^{B_{1}} {({\hat{ζ}}_{d}^{* (b_{1})} - ζ_{d}^{* (b_{1})})}^{2} = : {mse}_{B} ({\hat{ζ}}_{d}), \end{matrix}$ (4) which is a bootstrap equivalent of EquationEquation (3)(3) $\begin{matrix} MSE ({\hat{ζ}}_{d}) = E [{{\tilde{ζ}}_{d} (\hat{θ}) - ζ_{d}}^{2}] \\ = E [{{\tilde{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] + E [{{\tilde{ζ}}_{d} (θ) - ζ_{d}}^{2}] \\ = : g_{2 d} + g_{1 d}, \end{matrix}$ (3) . In their article, Hall and Maiti (Citation2006) pointed out that (4) tends to underestimate the MSE, and propose a double-bootstrap bias-correction(5) $\begin{matrix} {MSE}_{BC}^{*} ({\hat{ζ}}_{d}) = 2 {MSE}_{B}^{*} ({\hat{ζ}}_{d}) - {MSE}_{B 2}^{*} ({\hat{ζ}}_{d}) \\ \approx 2 {mse}_{B} ({\hat{ζ}}_{d}) - {mse}_{B 2} ({\hat{ζ}}_{d}), \end{matrix}$ (5) where ${MSE}_{B 2}^{*} ({\hat{ζ}}_{d})$ is the second-stage bootstrap MSE estimator, that is $\begin{matrix} {MSE}_{B 2}^{*} ({\hat{ζ}}_{d}) = E^{* *} {{({\hat{ζ}}_{d}^{* *} - ζ_{d}^{* *})}^{2}} \\ \approx B_{1}^{- 1} B_{2}^{- 1} \sum_{b_{1} = 1}^{B_{1}} \sum_{b_{2} = 1}^{B_{2}} {({\hat{ζ}}_{d}^{* * (b_{1}, b_{2})} - ζ_{d}^{* * (b_{1}, b_{2})})}^{2} \\ = : {mse}_{B 2} ({\hat{ζ}}_{d}) . \end{matrix}$

The computation of ${MSE}_{B 2}^{*} ({\hat{ζ}}_{d})$ involves selecting B₂ bootstrap replicates from each first-stage bootstrap sample. In this article we do not aim for a precise estimation of the variability of EBP, but the construction of narrow SCI and reliable MTPs. It turns out that for doing this, the use of an estimate of $g_{1 d}$ as in EquationEquation (3)(3) $\begin{matrix} MSE ({\hat{ζ}}_{d}) = E [{{\tilde{ζ}}_{d} (\hat{θ}) - ζ_{d}}^{2}] \\ = E [{{\tilde{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] + E [{{\tilde{ζ}}_{d} (θ) - ζ_{d}}^{2}] \\ = : g_{2 d} + g_{1 d}, \end{matrix}$ (3) yields better results than using an estimate of the entire MSE (see Section 4), similarly as in Chatterjee, Lahiri, and Li (2008) under LMM.

2.2 Popular Examples of GLMM and Their Properties

2.2.1 Poisson-Gamma Area-Level Model

The Poisson-gamma model is widely applied for modeling counts in the presence of overdispersion (see Cameron and Trivedi Citation2013, Section 4.2.2). Within the SAE context, Chen, Jiang, and Nguyen (Citation2015) investigated the observed best prediction and bootstrap MSE estimation for small area mean counts. Among others, they also consider a Poisson-gamma specification. We propose a different model formulation, focus on the EBP of $ζ_{d} : = μ_{d}^{PG}$ and develop a plug-in MSE estimator. Let $y_{d} | u_{d} \sim Poiss (μ_{d}^{PG})$ , $d = 1, \dots, D$ , where $μ_{d}^{PG} > 0, n_{d} = 1 \forall d \in [D]$ , with canonical parameter $log μ_{d}^{PG} = x_{d}^{t} β + u_{d}$ , and $w_{d} : = exp (u_{d}) \sim Gamma (δ, δ)$ such that $E (y_{d} | u_{d}) = μ_{d}^{PG} = λ_{d} w_{d} = exp (x_{d}^{t} β) w_{d} = exp (x_{d}^{t} β + u_{d})$ . Since Gamma pdf is conjugate to the Poisson, their mixture yields a negative binomial $y_{d} \sim N B (λ_{d}, δ^{- 1})$ with likelihood(6) $\begin{matrix} L^{PG} (θ) : = f^{PG} (y | θ) = \prod_{d = 1}^{D} \frac{Γ (y_{d} + δ)}{Γ (y_{d} + 1) Γ (δ)} \\ \times {(\frac{δ}{δ + λ_{d}})}^{δ} {(\frac{λ_{d}}{δ + λ_{d}})}^{y_{d}}, \end{matrix}$ (6) where $E (y_{d}) = λ_{d}$ and $V ar (y_{d}) = λ_{d} + δ^{- 1} λ_{d}^{2}$ . The marginal mean of y_d is the same as in the Poisson case, but the random effect increases the variance. Suppose that this model holds for all areas of population $P$ of size N partitioned into subpopulations $P_{1}, P_{2}, \dots, P_{D}$ of sizes $N_{1}, N_{2}, \dots, N_{D}$ . We can show that the BP for counts ${\tilde{μ}}_{d}^{PG} (θ) : = E (μ_{d}^{PG} | y_{d})$ is(7) $\begin{matrix} E (μ_{d}^{PG} | y_{d}) = \frac{\int_{0}^{\infty} λ_{d} w_{d} g (y_{d} | w_{d}) h (w_{d}) d w_{d}}{\int_{0}^{\infty} g (y_{d} | w_{d}) h (w_{d}) d w_{d}} \\ = \frac{A_{d}^{PG} (y_{d}, θ)}{C_{d}^{PG} (y_{d}, θ)} = \frac{λ_{d} (y_{d} + δ)}{(λ_{d} + δ)} \\ = : ψ_{d}^{PG} (y_{d}, θ) . \end{matrix}$ (7)

EquationEquation (7)(7) $\begin{matrix} E (μ_{d}^{PG} | y_{d}) = \frac{\int_{0}^{\infty} λ_{d} w_{d} g (y_{d} | w_{d}) h (w_{d}) d w_{d}}{\int_{0}^{\infty} g (y_{d} | w_{d}) h (w_{d}) d w_{d}} \\ = \frac{A_{d}^{PG} (y_{d}, θ)}{C_{d}^{PG} (y_{d}, θ)} = \frac{λ_{d} (y_{d} + δ)}{(λ_{d} + δ)} \\ = : ψ_{d}^{PG} (y_{d}, θ) . \end{matrix}$ (7) follows from the conjugation of the Gamma pdf to the Poisson pdf, while $\begin{matrix} A_{d}^{PG} = \int_{0}^{\infty} λ_{d} w_{d} \frac{exp (- λ_{d} w_{d}) λ_{d}^{y_{d}} w_{d}^{y_{d}} δ^{δ} w_{d}^{δ - 1} exp (- w_{d} δ)}{y_{d}! Γ (δ)} d w_{d} \\ = \frac{λ_{d}^{y_{d} + 1} δ^{δ} Γ (y_{d} + 1 + δ)}{Γ (δ) y_{d}! {(λ_{d} + δ)}^{y_{d} + 1 + δ}} . \end{matrix}$

The EBP ${\hat{μ}}_{d}^{PG}$ is obtained by replacing the vector of unknown parameters $θ$ in EquationEquation (7)(7) $\begin{matrix} E (μ_{d}^{PG} | y_{d}) = \frac{\int_{0}^{\infty} λ_{d} w_{d} g (y_{d} | w_{d}) h (w_{d}) d w_{d}}{\int_{0}^{\infty} g (y_{d} | w_{d}) h (w_{d}) d w_{d}} \\ = \frac{A_{d}^{PG} (y_{d}, θ)}{C_{d}^{PG} (y_{d}, θ)} = \frac{λ_{d} (y_{d} + δ)}{(λ_{d} + δ)} \\ = : ψ_{d}^{PG} (y_{d}, θ) . \end{matrix}$ (7) with a consistent estimator $\hat{θ}$ . Under the Poisson-gamma model $φ = 1$ and $θ = (β, δ)$ . We derive an analytical plug-in MSE estimator to measure the variability of our EBP.

Proposition 1.

Let $V {ar}_{d} (θ) = D E {(\hat{θ} - θ) {(\hat{θ} - θ)}^{t}}$ . An analytical MSE decomposition with its corresponding practical plug-in estimator are given by(8) $\begin{matrix} {MSE}_{PG} ({\tilde{μ}}_{d}^{PG}) = g_{PG 1 d} + \frac{1}{D} c_{d} (θ) + o (1 / D) and \\ {mse}_{PG} ({\hat{μ}}_{d}^{PG}) = {\hat{g}}_{PG 1 d} + \frac{1}{D} {\hat{c}}_{d} (\hat{θ}), \end{matrix}$ (8) (9) $\begin{matrix} g_{PG 1 d} = κ_{1 d} (θ) - κ_{2 d} (θ), \\ {\hat{g}}_{PG 1 d} = κ_{1 d} (\hat{θ}) - {\hat{κ}}_{2 d} (\hat{θ}), d \in [D], \\ with κ_{1 d} (θ) = \frac{λ_{d}^{2} (δ + 1)}{δ} and \\ κ_{2 d} (θ) = \sum_{j = 0}^{\infty} \frac{λ_{d}^{2} {(j + δ)}^{2}}{{(λ_{d} + δ)}^{2}} P (y_{d} = j), \\ as well as c_{d} (θ) = \sum_{j = 0}^{\infty} {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)}^{t} \\ \times V {ar}_{d} (θ) {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)} P (y_{d} = j) . \end{matrix}$ (9)

${\hat{c}}_{d} (θ)$ is a Monte Carlo approximation of $c_{d} (θ), {\hat{κ}}_{2 d}$ refers to $κ_{2 d}$ with an infinite series truncated at a large term and $θ$ replaced by $\hat{θ}$ . To estimate $κ_{1 d}$ we need only the latter.

One can estimate $V {ar}_{d} (θ)$ using any reasonable method. In Section 4 we use bootstrap estimators defined in (24). Details on the derivation of EquationEquations (6)(6) $\begin{matrix} L^{PG} (θ) : = f^{PG} (y | θ) = \prod_{d = 1}^{D} \frac{Γ (y_{d} + δ)}{Γ (y_{d} + 1) Γ (δ)} \\ \times {(\frac{δ}{δ + λ_{d}})}^{δ} {(\frac{λ_{d}}{δ + λ_{d}})}^{y_{d}}, \end{matrix}$ (6) and Equation(8)(8) $\begin{matrix} {MSE}_{PG} ({\tilde{μ}}_{d}^{PG}) = g_{PG 1 d} + \frac{1}{D} c_{d} (θ) + o (1 / D) and \\ {mse}_{PG} ({\hat{μ}}_{d}^{PG}) = {\hat{g}}_{PG 1 d} + \frac{1}{D} {\hat{c}}_{d} (\hat{θ}), \end{matrix}$ (8) are deferred to our SM.

2.2.2 Poisson-Lognormal Area-Level Model

The Poisson-lognormal model has been thoroughly examined by, among others, Cameron and Trivedi (Citation2013), Section 4.2.4, Franco and Bell (Citation2015) and Boubeta, Lombardía, and Morales (Citation2016). For $u_{d} \sim N (0, 1)$ , let $y_{d} | u_{d} \sim Poiss (μ_{d}^{PL}), d = 1, \dots, D$ , where $μ_{d}^{PL} > 0$ , n_d = 1 for all $d \in [D]$ . In addition, $μ_{d}^{PL} = ν_{d} ρ_{d}$ , where ν_d is a known size variable and ρ_d a binomial probability. The canonical parameter is $log μ_{d}^{PL} = log ν_{d} + x_{d}^{t} β + δ u_{d}$ for all $d \in [D]$ . Typically $ζ_{d} : = ρ_{d}$ for which we have $ρ_{d} = exp (x_{d}^{t} β + δ u_{d})$ with $θ = (β^{t}, δ)$ . In this case, the likelihood is $\begin{matrix} L^{PL} (θ) : = f^{PL} (y | θ) = {(2 π)}^{- D / 2} \\ \prod_{d = 1}^{D} \int_{R}^{} \frac{exp (- ν_{d} ρ_{d}) ν_{d}^{y_{d}} exp {y_{d} (x_{d}^{t} β + δ u_{d})}}{y_{d}!} exp (\frac{- u_{d}^{2}}{2}) d u . \end{matrix}$

Once $θ$ is estimated, we obtain BPs ${\tilde{μ}}_{d}^{PL}, {\tilde{ρ}}_{d}$ , and EBPs ${\hat{μ}}_{d}^{PL}, {\hat{ρ}}_{d}$ using the formulas from Boubeta, Lombardía, and Morales (Citation2016). In Section 4.1 we estimate their MSE by bootstrap.

2.2.3 Logit Unit-Level Model

The unit-level logit model is a popular choice for binary responses, comprehensively discussed by Hobza and Morales (Citation2016). Under this setting, $y_{d j} | u_{d} \sim Bin (m_{d j}, p_{d j}), u_{d} \sim N (0, 1)$ with m_dj a known size parameter for a logistic regression. The natural parameter is $p_{d j} / (1 - p_{d j}) = x_{d j}^{t} β + δ u_{d}, d \in [D]$ , $j \in [n_{d}]$ where $p_{d j} = {exp (x_{d j}^{t} β + δ u_{d})} / {1 + exp (x_{d j}^{t} β + δ u_{d})}$ . We assume that the unit-level logit model holds for all units of population $P$ of size N, partitioned into D subpopulations $P_{d}$ of sizes N_d, $d \in [D]$ . Let $ζ_{d} : = μ_{d}^{U} = \sum_{j = 1}^{N_{d}} p_{d j}$ . As for the Poisson models, we have $φ = 1$ and therefore $θ = (β^{t}, δ)$ . The likelihood is given by(10) $\begin{matrix} L^{U} (θ) : = f^{U} (y | β, δ) \\ = {(2 π)}^{- D / 2} \prod_{d = 1}^{D} \int_{R}^{} exp [\sum_{j = 1}^{n_{d}} log (\begin{matrix} m_{d j} \\ y_{d j} \end{matrix}) + \sum_{j = 1}^{n_{d}} y_{d j} (x_{d j}^{t} β + δ u_{d}) \\ - \frac{u_{d}^{2}}{2} - \sum_{j = 1}^{n_{d}} m_{d j} log {1 + exp (x_{d j}^{t} β + δ u_{d})}] d u_{d} . \end{matrix}$ (10)

We can proceed with the estimation of the BP ${\tilde{p}}_{d j} (θ)$ and ${\tilde{μ}}_{d}^{U} = \sum_{j = 1}^{N_{d}} {\tilde{p}}_{d j}$ only if we have access to the information on each population unit. In practice, however, the auxiliary information is available only for the sample units. Then, following the suggestion of Hobza and Morales (Citation2016), we can still estimate the population quantity of interest by using only categorical covariates. Suppose that they take a finite number of values $x_{d j} \in {z_{1}, \dots, z_{L}}$ for $d \in [D]$ and $j \in [n_{d}]$ with $z_{l}$ denoting the resulting covariate class. We then define(11) $\begin{matrix} {\bar{μ}}_{d}^{U} = \frac{μ_{d}^{U}}{N_{d}}, μ_{d}^{U} = \sum_{j = 1}^{N_{d}} p_{d j} = \sum_{l = 1}^{L} N_{d l} r_{d l}, with \\ r_{d l} = \frac{exp (z_{l} β + δ u_{d})}{1 + exp (z_{l} β + δ u_{d})}, \end{matrix}$ (11) where $N_{d l} = # {l \in P_{d} : x_{d j} = z_{l}}$ is the known size of class $z_{l}$ in area d. Hobza and Morales (Citation2016) derived BP ${\tilde{μ}}_{d}^{U} (θ)$ and EBP ${\hat{μ}}_{d}^{U} (\hat{θ})$ for $μ_{d}^{U}$ as well as for other quantities in EquationEquation (11)(11) $\begin{matrix} {\bar{μ}}_{d}^{U} = \frac{μ_{d}^{U}}{N_{d}}, μ_{d}^{U} = \sum_{j = 1}^{N_{d}} p_{d j} = \sum_{l = 1}^{L} N_{d l} r_{d l}, with \\ r_{d l} = \frac{exp (z_{l} β + δ u_{d})}{1 + exp (z_{l} β + δ u_{d})}, \end{matrix}$ (11) . Due to the computational burden of the analytical estimator, in Section 4.2 we use bootstrap for obtaining an estimate of MSE.

3 Simultaneous Intervals and Multiple Testing

To construct CI for $ζ_{d}$ that account for the effect of estimates from other areas, we need to find a region $I_{1 - α}$ such that $P (ζ_{d} \in I_{1 - α} \forall d \in [D]) = 1 - α$ . Define(12) $S_{0} = \max_{d = 1, \dots, D} | S_{0 d} |, with S_{0 d} = \frac{{\hat{ζ}}_{d} - ζ_{d}}{\hat{σ} ({\hat{ζ}}_{d})}, \forall d \in [D],$ (12) (13) $q_{S_{0}}^{(1 - α)} = \inf {t \in R : P (S_{0} \leq t) \geq 1 - α},$ (13) with $\hat{σ} ({\hat{ζ}}_{d})$ being an estimate of the variability of EBP ${\hat{ζ}}_{d}$ . We then consider(14) $\begin{matrix} α = P (| {\hat{ζ}}_{d} - ζ_{d} | > q_{S_{0}}^{(1 - α)} \hat{σ} ({\hat{ζ}}_{d}) for some d \in [D]) \\ = P (\max_{d = 1, \dots, D} | \frac{{\hat{ζ}}_{d} - ζ_{d}}{\hat{σ} ({\hat{ζ}}_{d})} | > q_{S_{0}}^{(1 - α)}) . \end{matrix}$ (14)

Constructing SCI boils down to the estimation of $q_{S_{0}}^{(1 - α)}$ , as one can define then(15) $I_{1 - α}^{S} = \times_{d = 1}^{D} I_{d, 1 - α}^{S}, with I_{d, 1 - α}^{S} = {{\hat{ζ}}_{d} \pm q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} ,$ (15) where × denotes a generalized Cartesian product. $I_{1 - α}^{S}$ covers all ζ_d with probability $1 - α$ , that is, its joint confidence level is $1 - α$ . In contrast, for each $q_{S_{0 d}}^{(1 - α)}$ defined analogously to $q_{S_{0}}^{(1 - α)}$ , with S₀ replaced by $| S_{0 d} |$ , individual area CI (iCI) are given by(16) $I_{d, 1 - α}^{iCI} = {{\hat{ζ}}_{d} \pm q_{S_{0 d}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} \forall d \in [D] .$ (16)

By construction, iCI does not contain ζ_d for at least $100 α %$ of all areas.

Remark 1.

$I_{d, 1 - α}^{iCI}$ is designed to cover ζ_d at an individual confidence level. Consequently, the joint coverage probability of iCIs decreases in a cumulative way for increasing D. This highlights the need to construct SCI. Nevertheless, maintaining $1 - α$ simultaneous confidence level of SCI $I_{1 - α}^{S}$ makes its constituents $I_{d, 1 - α}^{S}$ wider than corresponding iCIs $I_{d, 1 - α}^{iCI}$ . This is not surprising because $I_{d, 1 - α}^{iCI}$ and $I_{d, 1 - α}^{S}$ were constructed to cover different sets which serve distinct inferential purposes. It is worth mentioning that the length of $I_{d, 1 - α}^{S}$ stabilizes as for growing D we observe two opposite trends: the increase of area parameters to cover and the decrease of MSE (see and ).

Table 1 ECP, AIW, and AIWV of SCI under area-level models.

Display Table

Table 3 ECP, AIW and AIWV of 95% SCI under the unit-level model.

Display Table

The SCI defined in EquationEquation (15)(15) $I_{1 - α}^{S} = \times_{d = 1}^{D} I_{d, 1 - α}^{S}, with I_{d, 1 - α}^{S} = {{\hat{ζ}}_{d} \pm q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} ,$ (15) is not operational as the distribution of S₀ is unknown. The problem can be circumvented by bootstrap approximation: for $b_{1} = 1, \dots, B_{1}$ set(17) $S_{B}^{(b_{1})} = \max_{d = 1, \dots, D} | S_{B d}^{(b_{1})} |, S_{B d}^{(b_{1})} = \frac{{\hat{ζ}}_{d}^{* (b_{1})} - ζ_{d}^{* (b_{1})}}{{\hat{σ}}^{* (b_{1})} ({\hat{ζ}}_{d}^{*})},$ (17) and approximate the critical value $q_{S_{B}}^{(1 - α)} = \inf {t \in R : P (S_{B} \leq t | (y, X)) \geq 1 - α},$ by a $[(1 - α) B_{1} + 1]$ th-order statistic of the $S_{B}^{(b_{1})}$ . Then the bootstrap equivalent of EquationEquation (15)(15) $I_{1 - α}^{S} = \times_{d = 1}^{D} I_{d, 1 - α}^{S}, with I_{d, 1 - α}^{S} = {{\hat{ζ}}_{d} \pm q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} ,$ (15) is(18) $\begin{matrix} I_{1 - α}^{B} = \times_{d = 1}^{D} I_{d, 1 - α}^{B}, where \\ I_{d, 1 - α}^{B} = {{\hat{ζ}}_{d} \pm q_{S_{B}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} . \end{matrix}$ (18)

An alternative approach to EquationEquation (12)(12) $S_{0} = \max_{d = 1, \dots, D} | S_{0 d} |, with S_{0 d} = \frac{{\hat{ζ}}_{d} - ζ_{d}}{\hat{σ} ({\hat{ζ}}_{d})}, \forall d \in [D],$ (12) could be to take computationally simpler nonstudentized statistics. Yet, already DiCiccio and Efron (Citation1996) pointed out that the lack of studentization results in slower convergence rates. Since the application of nonstudentized SCI did not yield satisfactory results, we decided not to include them.

Our methodology is also applicable for hypothesis testing. Consider the test problem(19) $H_{0} : Bζ = b vs . H_{1} : Bζ \neq b,$ (19)

where $B \in R^{D' \times D}, D' \leq D, b \in R^{D'}$ . We are interested in max-type statistic t_H such that(20) $\begin{matrix} t_{H} = \max_{d = 1, \dots, D'} | t_{H_{d}} |, t_{H_{d}} = \frac{{\hat{ζ}}_{d}^{H} - b_{d}}{\hat{σ} ({\hat{ζ}}_{d}^{H})}, \\ S_{H_{0}} = \max_{d = 1, \dots, D'} | S_{H_{0} d} |, S_{H_{0} d} = \frac{{\hat{ζ}}_{d}^{H} - ζ_{d}^{H}}{\hat{σ} ({\hat{ζ}}_{d}^{H})}, \end{matrix}$ (20) where $ζ^{H} = {(ζ_{1}^{H}, \dots, ζ_{D'}^{H})}^{t} = Bζ \in R^{D'}$ with ${\hat{ζ}}^{H}$ being its estimator. One rejects H₀ at the α-level if $t_{H} \geq q_{H_{0}}^{(1 - α)}$ with $q_{H_{0}}^{(1 - α)} = \inf {t \in R : P (S_{H_{0}} \leq t) \geq 1 - α}$ .

In practice, we might use such a test to examine differences between area characteristics. Similarly as for SCI, we approximate $q_{H_{0}}^{(1 - α)}$ applying bootstrap to a modified version of statistic S_B, namely(21) $q_{{BH}_{0}}^{(1 - α)} = \inf {t \in R : P (S_{{BH}_{0}} \leq t | (y, X)) \geq 1 - α},$ (21) where $S_{{BH}_{0}}$ in the $b_{1}^{t h}$ bootstrap sample is(22) $S_{{BH}_{0}}^{(b_{1})} = \max_{d = 1, \dots, D'} | S_{{BH}_{0} d}^{(b_{1})} |, S_{{BH}_{0} d}^{(b_{1})} = \frac{{\hat{ζ}}_{d}^{* H (b_{1})} - ζ_{d}^{* H (b_{1})}}{{\hat{σ}}^{*} ({\hat{ζ}}_{d}^{* H (b_{1})})},$ (22) with $ζ^{* H (b_{1})} = (ζ_{1}^{* H (b_{1})}, \dots, ζ_{D'}^{* H (b_{1})}) = B ζ^{* (b_{1})} \in R^{D'}$ and ${\hat{ζ}}^{* H (b_{1})} = ({\hat{ζ}}_{1}^{* H (b_{1})}, \dots, {\hat{ζ}}_{D'}^{* H (b_{1})})$ its corresponding estimated version.

We provide the consistency of our bootstrap-based CI and tests, as well as asymptotic convergence and coverage probability. Proofs are deferred to Appendix A.2 and A.3. Suppose that $\hat{θ}$ is consistent such that $| | \hat{θ} - θ | | = O_{P} (n^{- c})$ , c > 0. Since for the GLMM with clustered random effects the log-likelihood can be expressed as the sum of independent random components, the consistency of $\hat{θ}$ estimated by ML follows assuming a classical theory. The consistency under a general GLMM had been an open problem for many years until it was solved by Jiang (Citation2013). Bianconcini (Citation2014) and Huber, Ronchetti, and Victoria-Feser (Citation2004) investigated the consistency of $\hat{θ}$ once we compute it using AGQ and LA respectively. For our purpose, we need to prove the bootstrap consistency

Proposition 2.

Under Assumptions 1–5 from Appendix A.1 it holds that $\begin{matrix} E^{*} (y_{d j}^{*}) - E (y_{d j}) = o_{P^{*}} (1), \\ V {ar}^{*} (y_{d}^{*}) - V ar (y_{d}) = {[o_{P^{*}} (1)]}_{n_{d} \times n_{d}}, \\ | | {\hat{θ}}^{*} - \hat{θ} | | = O_{P^{*}} (n^{- c}) . \end{matrix}$

Given Proposition 2, we can derive the consistency of $I_{1 - α}^{B}$ based on results from extreme value theory and asymptotic expansions of the standardized statistics using ideas from Chatterjee, Lahiri, and Li (2008). Let us assume $\hat{σ} ({\hat{ζ}}_{d}) = \sqrt{{\hat{g}}_{1 d} ({\hat{ζ}}_{d})}$ , though similar results are immediate for $\hat{σ} ({\hat{ζ}}_{d}) = \sqrt{{mse}_{(\cdot)} ({\hat{μ}}_{d}^{(\cdot)})}$ where $(\cdot)$ stands for different types of estimators. We use $q : = q_{S_{0}}^{(1 - α)}$ where unambiguous, and denote the cumulative distribution function (cdf) of $S_{0 d}$ and S_Bd by $G_{d} (w) = P (S_{0 d} \leq w)$ and $G_{B d} (w) = P (S_{B d} \leq w)$ . In Appendix A.3 we provide asymptotic expansions for both. Define $(S_{0 (d + 1)} \dots, S_{0 (2 D)}) = (- S_{01}, \dots, - S_{0 D})$ , and observe that $\max_{d = 1, \dots, D} | S_{0 d} | = \max_{d = 1, \dots, 2 D} (S_{01}, \dots, S_{0 D}, - S_{01}, \dots, - S_{0 D})$ . From EquationEquation (14)(14) $\begin{matrix} α = P (| {\hat{ζ}}_{d} - ζ_{d} | > q_{S_{0}}^{(1 - α)} \hat{σ} ({\hat{ζ}}_{d}) for some d \in [D]) \\ = P (\max_{d = 1, \dots, D} | \frac{{\hat{ζ}}_{d} - ζ_{d}}{\hat{σ} ({\hat{ζ}}_{d})} | > q_{S_{0}}^{(1 - α)}) . \end{matrix}$ (14) , we have(23) $\begin{matrix} T_{D} (q) = P (S_{0} \leq q) \\ = P (S_{01} \leq q, \dots, S_{0 D} \leq q, - S_{01} \leq q, \dots, - S_{0 D} \leq q) \\ = \prod_{d = 1}^{2 D} G_{d} (q) . \end{matrix}$ (23)

As $D \to \infty$ , unless standardized, the distribution in EquationEquation (23)(23) $\begin{matrix} T_{D} (q) = P (S_{0} \leq q) \\ = P (S_{01} \leq q, \dots, S_{0 D} \leq q, - S_{01} \leq q, \dots, - S_{0 D} \leq q) \\ = \prod_{d = 1}^{2 D} G_{d} (q) . \end{matrix}$ (23) converges to 0 or 1. In Appendix A.3, we show that $G_{d} (w)$ is asymptotically normal, such that $P (S_{0} \leq q) \approx Φ^{2 D} (q)$ . Since the cdf of the maxima of the standard normal random variables is in the domain of attraction of the Gumbel law, it follows that $\lim_{D \to \infty} Φ^{2 D} (q / b_{D} + b_{D}) = exp (exp (- q)) = T_{0} (q)$ , for all $q \in R$ where b_D is a sequence of constants (see Leadbetter, Lindgren, and Rootzén Citation2012, theor.1.5.3). Unfortunately, this approximation has a poor convergence rate, but bootstrap is again a remedy. Notice that a similar representation holds for S_B, substituting P with $P^{*}$ and replacing the true parameters by their estimates. Application of Poyla’s theorem that combines the convergence in distribution with a convergence in $\sup$ norm results in our next proposition.

Proposition 3.

Define $T_{D}^{*} (w) = P^{*} (S_{B} \leq q)$ which is a bootstrap analogue of $T_{D} (w)$ in EquationEquation (23)(23) $\begin{matrix} T_{D} (q) = P (S_{0} \leq q) \\ = P (S_{01} \leq q, \dots, S_{0 D} \leq q, - S_{01} \leq q, \dots, - S_{0 D} \leq q) \\ = \prod_{d = 1}^{2 D} G_{d} (q) . \end{matrix}$ (23) . Under Assumptions 1–5 from Appendix A.1 it holds that $\sup_{w \in R} | T_{D} (w) - T_{D}^{*} (w) | = o_{P} (1) .$

Corollary 1.

Proposition 3 implies that under the same assumptions, $P (ζ_{d} \in I_{1 - α}^{B} \forall d \in [D]) \to 1 - α .$

Since we use almost identical max-type statistics in EquationEquations (12)(12) $S_{0} = \max_{d = 1, \dots, D} | S_{0 d} |, with S_{0 d} = \frac{{\hat{ζ}}_{d} - ζ_{d}}{\hat{σ} ({\hat{ζ}}_{d})}, \forall d \in [D],$ (12) and Equation(20)(20) $\begin{matrix} t_{H} = \max_{d = 1, \dots, D'} | t_{H_{d}} |, t_{H_{d}} = \frac{{\hat{ζ}}_{d}^{H} - b_{d}}{\hat{σ} ({\hat{ζ}}_{d}^{H})}, \\ S_{H_{0}} = \max_{d = 1, \dots, D'} | S_{H_{0} d} |, S_{H_{0} d} = \frac{{\hat{ζ}}_{d}^{H} - ζ_{d}^{H}}{\hat{σ} ({\hat{ζ}}_{d}^{H})}, \end{matrix}$ (20) , the construction of MTP follows almost immediately from the correspondence between tests and CI. In fact, the acceptance region of our test is $I_{1 - α}^{H_{0}} = \times_{d = 1}^{D} I_{d, 1 - α}^{H_{0}}$ , where $I_{d, 1 - α}^{H_{0}} = {{\hat{ζ}}_{d} - q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d}) \leq b_{d} \leq {\hat{ζ}}_{d} + q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})},$ that is, we reject H₀ if $b \notin I_{1 - α}^{H_{0}}$ . We can write $P (h_{d} \in I_{1 - α}^{H_{0}} \forall d \in [D]) = 1 - α$ . Since this probability statement is true for any $h_{d} = ζ_{d}$ , we obtain the CI defined in EquationEquation (15)(15) $I_{1 - α}^{S} = \times_{d = 1}^{D} I_{d, 1 - α}^{S}, with I_{d, 1 - α}^{S} = {{\hat{ζ}}_{d} \pm q_{S_{0}}^{(1 - α)} \times \hat{σ} ({\hat{ζ}}_{d})} ,$ (15) by inverting the test.

Corollary 2.

Let H₀ be the null hypothesis defined in EquationEquation (19)(19) $H_{0} : Bζ = b vs . H_{1} : Bζ \neq b,$ (19) and $α \in (0, 1)$ . Under Proposition 3, we have $P (t_{H} > q_{{BH}_{0}}) \leq α + o (1) .$

Remark 2.

Our single-step testing procedure in EquationEquation (19)(19) $H_{0} : Bζ = b vs . H_{1} : Bζ \neq b,$ (19) with a bootstrap critical value in EquationEquation (21)(21) $q_{{BH}_{0}}^{(1 - α)} = \inf {t \in R : P (S_{{BH}_{0}} \leq t | (y, X)) \geq 1 - α},$ (21) controls weakly for the family-wise error rate (FWER), and might be limited in detecting false null hypotheses once we deal with a large $D'$ . Yet, we can readily extend our test to a bootstrap-based step-down procedure of Romano, Shaikh, and Wolf (Citation2008) which controls the false discovery rate with a better power to detect false $H_{0 d}$ than FWER.

4 Empirical Reliability Study

We performed intensive simulation studies to assess the reliability of our methods. SCI and MTP for EBP were constructed with different estimators of variability under the models presented in Sections 2.2.1–2.2.3. First, we examined the relative bias and relative root-MSE of fixed effects $\hat{β}$ and variability parameter $\hat{δ}$ . Then, the performance of EBP was evaluated comparing bias, average absolute bias and MSE for $D = 26, 52, and 78$ . Since they did not show any atypical pattern, the results under Poisson area-level models and logistic unit-level model were deferred to the SM. Regarding SCIs, we calculated empirical coverage probability (ECP), average interval width (AIW), and the AIW variation (AIWV): $\begin{matrix} ECP = \frac{1}{K} \sum_{k = 1}^{K} 1 {ζ_{d}^{(k)} \in I_{1 - α}^{S} \forall d \in [D]}, \\ AIW = \frac{1}{D K} \sum_{d = 1}^{D} \sum_{k = 1}^{K} ω_{d}^{(k)}, ω_{d}^{(k)} = 2 q_{(\cdot)}^{(1 - α)}^{(k)} {\hat{σ}}^{(k)} ({\hat{ζ}}_{d}), \\ AIWV = \frac{1}{D (K - 1)} \sum_{d = 1}^{D} \sum_{k = 1}^{K} {(ω_{d}^{(k)} - {\bar{ω}}_{d})}^{2}, \\ {\bar{ω}}_{d} = \frac{1}{K} \sum_{k = 1}^{K} ω_{d}^{(k)}, d = 1 \dots, D . \end{matrix}$

For each simulation run k we record the widths of the SCI and check whether they cover all EBPs. ECP is then computed by averaging over K simulation runs and is aimed to be close to $1 - α$ . AIW is obtained by averaging over the simulation runs and areas. Narrower intervals are preferable if its ECP is close to the nominal level. These are standard measures to assess the quality of interval estimators (Chatterjee, Lahiri, and Li 2008; Ganesh Citation2009). Lower AIWV values indicate that the length is stable and does not depend on the simulation run.

4.1 Finite Sample Performance Under Area-Level Models

Under the Poisson-gamma model we set $y_{d} \sim Poiss (μ_{d}^{PL}), μ_{d}^{PL} = λ_{d} w_{d}$ . Covariates, parameters and sample sizes are taken from our case study in Section 5, that is, we set $θ = (β^{t}, δ) = {(10.038, 7.747, - 3.136, 11.317, - 2.466, 2.480)}^{t}$ , and $D = {26, 52, 78}$ , n_d = 1, $\forall d \in [D]$ , n = D. For D = 52 we take covariates from the original sample, for D = 26 we randomly select the areas using simple random sampling without replacement, and for D = 78, we take the original sample plus 26 randomly selected areas, that is, these areas enter at most twice. Parameter of interest is the area proportion of individuals below the poverty line, ${\bar{μ}}_{d}^{PL} = μ_{d}^{PL} / N_{d}$ . The EBP for $μ_{d}^{PL}$ is given in EquationEquation (7)(7) $\begin{matrix} E (μ_{d}^{PG} | y_{d}) = \frac{\int_{0}^{\infty} λ_{d} w_{d} g (y_{d} | w_{d}) h (w_{d}) d w_{d}}{\int_{0}^{\infty} g (y_{d} | w_{d}) h (w_{d}) d w_{d}} \\ = \frac{A_{d}^{PG} (y_{d}, θ)}{C_{d}^{PG} (y_{d}, θ)} = \frac{λ_{d} (y_{d} + δ)}{(λ_{d} + δ)} \\ = : ψ_{d}^{PG} (y_{d}, θ) . \end{matrix}$ (7) . Since N_d is usually unknown, in practice it is replaced by its estimate, see EquationEquation (25)(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) in Section 5. We apply double bootstrap with $B_{1} = 1000$ first-stage and $B_{2} = 1$ second-stage bootstrap replicates (the choice of the latter is motivated by Erciulescu and Fuller Citation2014). We generate K = 1000 samples with the same areas and fixed covariates, but randomly drawn w_d and y_d. SCIs and iCIs are constructed as follows:

Fit the model to the data and obtain estimates $\hat{θ} = (\hat{β}, \hat{δ})$ .
For $b_{1} = 1, \dots, B_{1}$ bootstrap samples, generate $w_{d}^{* (b_{1})} \sim Gamma (\hat{δ}, \hat{δ})$ iid and set $μ_{d}^{PG * (b_{1})} = {\hat{λ}}_{d} w_{d}^{* (b_{1})} and y_{d}^{* (b_{1})} \sim Poisson (μ_{d}^{PG * (b_{1})}) .$
For each bootstrap sample calculate ${\hat{θ}}^{* (b_{1})}, {\hat{μ}}_{d}^{PG * (b_{1})} ({\hat{θ}}^{* (b_{1})})$ , $A D_{PG, d}^{(b_{1})} = | {\hat{μ}}_{d}^{PG * (b_{1})} - μ_{d}^{PG * (b_{1})} |$ .
For $b_{2} = 1, \dots, B_{2}$ generate samples $w_{d}^{* * (b_{1}, b_{2})} \sim Gamma ({\hat{δ}}^{* (b_{1})}, {\hat{δ}}^{* (b_{1})})$ iid and $\begin{matrix} μ_{d}^{PG * * (b_{1}, b_{2})} = {\hat{λ}}_{d}^{* (b_{1})} w_{d}^{* * (b_{1}, b_{2})}, \\ y_{d}^{* * (b_{1}, b_{2})} \sim Poisson (μ_{d}^{PG * * (b_{1}, b_{2})}) . \end{matrix}$
For each bootstrap sample calculate ${\hat{θ}}^{* * (b_{1}, b_{2})}$ and ${\hat{μ}}_{d}^{PG * * (b_{1}, b_{2})} ({\hat{θ}}^{* * (b_{1}, b_{2})})$ .
Set ${mse}_{d}^{(b_{1})} = \frac{1}{B_{2}} \sum_{b_{2} = 1}^{B_{2}} {({\hat{μ}}_{d}^{PG * * (b_{1}, b_{2})} - μ_{d}^{PG * * (b_{1}, b_{2})})}^{2}$ .
Calculate bootstrap estimates ${\hat{g}}_{PG 1 d} ({\hat{θ}}^{* (b_{1})})$ as in EquationEquation (9)(9) $\begin{matrix} g_{PG 1 d} = κ_{1 d} (θ) - κ_{2 d} (θ), \\ {\hat{g}}_{PG 1 d} = κ_{1 d} (\hat{θ}) - {\hat{κ}}_{2 d} (\hat{θ}), d \in [D], \\ with κ_{1 d} (θ) = \frac{λ_{d}^{2} (δ + 1)}{δ} and \\ κ_{2 d} (θ) = \sum_{j = 0}^{\infty} \frac{λ_{d}^{2} {(j + δ)}^{2}}{{(λ_{d} + δ)}^{2}} P (y_{d} = j), \\ as well as c_{d} (θ) = \sum_{j = 0}^{\infty} {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)}^{t} \\ \times V {ar}_{d} (θ) {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)} P (y_{d} = j) . \end{matrix}$ (9) as well as $\begin{matrix} {mse}_{B} ({\hat{μ}}_{d}^{PG}) = \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} {({\hat{μ}}_{d}^{PG * (b_{1})} - μ_{d}^{PG * (b_{1})})}^{2}, \\ {mse}_{BC} ({\hat{μ}}_{d}^{PG}) = 2 {mse}_{B} ({\hat{μ}}_{d}^{PG}) - \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} {mse}_{d}^{(b_{1})} . \end{matrix}$
Calculate statistic $S_{PG, B}$ with the critical value $q_{PG, S_{B}}^{(1 - α)}$ obtained from the bootstrap sample $S_{PG, B} = {(S_{PG, B}^{(1)}, \dots S_{PG, B}^{(B_{1})})}^{t}$ , where $\begin{matrix} S_{PG, B}^{(b_{1})} = \max_{d = 1, \dots, D} A D_{PG, d}^{* (b_{1})} / {\hat{σ}}^{* (b 1)} ({\hat{μ}}_{d}^{PG * (b 1)}) and \\ q_{PG, S_{B}}^{(1 - α)} = Q_{1 - α} (S_{PG, B}) \end{matrix}$

as well as a variance estimate for

\hat{θ}

(24)

\begin{matrix} \hat{var} (\hat{θ}) = \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} ({\hat{θ}}^{* (b_{1})} - \bar{θ}) {({\hat{θ}}^{* (b_{1})} - \bar{θ})}^{t} with \\ \bar{θ} = \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B 1} {\hat{θ}}^{* (b_{1})} . \end{matrix}

(24)

We compare the performance of SCI and MTP for different variability estimates $\hat{σ} ({\hat{μ}}_{d}^{PG})$ and their bootstrap equivalents ${\hat{σ}}^{*} ({\hat{μ}}_{d}^{* PG})$ , namely for $\hat{σ} ({\hat{μ}}_{d}^{PG}) = \sqrt{{\hat{g}}_{PG 1 d}}$ and $\hat{σ} ({\hat{μ}}_{d}^{PG}) = \sqrt{{mse}_{(\cdot)} ({\hat{μ}}_{d}^{PG})}$ . Here, ${mse}_{(\cdot)}$ refers to either the plug-in ${mse}_{P}$ , the ${mse}_{B}$ or the ${mse}_{BC}$ , defined in EquationEquations (8)(8) $\begin{matrix} {MSE}_{PG} ({\tilde{μ}}_{d}^{PG}) = g_{PG 1 d} + \frac{1}{D} c_{d} (θ) + o (1 / D) and \\ {mse}_{PG} ({\hat{μ}}_{d}^{PG}) = {\hat{g}}_{PG 1 d} + \frac{1}{D} {\hat{c}}_{d} (\hat{θ}), \end{matrix}$ (8) , Equation(4)(4) $\begin{matrix} {MSE}_{B}^{*} ({\hat{ζ}}_{d}) = E^{*} {{({\hat{ζ}}_{d}^{*} - ζ_{d}^{*})}^{2}} \\ \approx B_{1}^{- 1} \sum_{b_{1} = 1}^{B_{1}} {({\hat{ζ}}_{d}^{* (b_{1})} - ζ_{d}^{* (b_{1})})}^{2} = : {mse}_{B} ({\hat{ζ}}_{d}), \end{matrix}$ (4) , and Equation(5)(5) $\begin{matrix} {MSE}_{BC}^{*} ({\hat{ζ}}_{d}) = 2 {MSE}_{B}^{*} ({\hat{ζ}}_{d}) - {MSE}_{B 2}^{*} ({\hat{ζ}}_{d}) \\ \approx 2 {mse}_{B} ({\hat{ζ}}_{d}) - {mse}_{B 2} ({\hat{ζ}}_{d}), \end{matrix}$ (5) . Steps 3(a)–(c) of the algorithm refer to the second-stage bootstrap which is only necessary to obtain bias-corrected ${mse}_{BC}$ . Under the Poisson-gamma model, we are interested in the estimation of poverty rates. We thus consider ${\hat{\bar{μ}}}_{d}^{PG} = {\hat{μ}}_{d}^{PG} / N_{d}$ , ${\hat{\bar{g}}}_{PG 1 d} = {\hat{g}}_{PG 1 d} / N_{d}^{2}$ and ${mse}_{(\cdot)} ({\hat{\bar{μ}}}_{d}^{PG}) = {mse}_{(\cdot)} ({\hat{\bar{μ}}}_{d}^{PG}) / N_{d}^{2}$ .

For the Poisson-lognormal model with $y_{d} \sim Poisson (μ_{d}^{PL}), μ_{d}^{PL} = ν_{d} ρ_{d}$ , the parameter of interest is ρ_d with $N_{d} = ν_{d}$ estimated by EBP derived by Boubeta, Lombardía, and Morales (Citation2016). We take the fixed parameters from Section 5, that is, $(β^{t}, δ) = {(- 2.264, 3.480, - 0.870, 4.842, 0.125, 0.322)}^{t}$ . Covariates, sample sizes, number of simulation runs and bootstrap replicates are the same as in case of the Poisson-gamma model. The variability of ${\hat{ρ}}_{d}$ was estimated using bootstrap MSEs, that is ${mse}_{B}$ and ${mse}_{BC}$ . To obtain estimates of SCI and iCI one can use almost the same algorithm as above by changing the way we generate $y_{d}^{* (b_{1})}$ .

summarizes the performance of 95% SCI for ${\hat{\bar{μ}}}_{d}^{PG}$ constructed with ${mse}_{B}$ (B), ${mse}_{BC}$ (BC), plug-in ${mse}_{P}$ (P) and ${\hat{g}}_{PG 1 d}$ (G). For ${\hat{ρ}}_{d}$ , they were constructed using ${mse}_{B}$ (B) and ${mse}_{BC}$ (BC). All methods perform very well regarding the coverage ECP, even for D = 26. In contrast, SCIs constructed using a Bonferroni procedure yield unacceptably low ECP. For instance, for D = 52 and ${mse}_{B}$ it equals 78% for the Poisson-gamma, and 88% for the Poisson-lognormal model. Therefore, we do not further report them.

presents 95% SCI and iCI estimates for a randomly selected simulation under the Poisson-gamma model. The plot is divided into five panels according to the number of units ${\hat{N}}_{d}^{dir}$ defined in EquationEquation (25)(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) with the first presenting the results for the areas with the fewest observations. The black and red dots represent the true proportions known in a simulation. The color red indicates true parameters not covered by theirs iCIs. In , that holds for four of the true values ( $\approx 7.7 %$ ). This illustrates well the difference between individual and simultaneous inference as well as a particular relevance of the latter. We obtain similar figures for other simulations (see our SM).

Fig. 1 95% iCI and SCI for proportions with D = 52. Red dots indicate true parameters outside iCI, whereas black dots indicate true parameters inside their iCI.

Finally, we studied the performance of our test (19) under Poisson-gamma and Poisson-lognormal models. Results for the latter are in our SM as they reveal the same features. Consider $H_{0} : {\bar{μ}}^{PG} = b$ vs. $H_{1} : {\bar{μ}}^{PG} = b + 1_{D} Δ$ , where $b : = \bar{μ}$ for the same data-generating processes as before. Critical values are obtained from the bootstrap analogues of $S_{H_{0}}$ calculated similarly as in Step 5 of the algorithm above. shows the power functions of our test based on different variability estimates. They are visibly indistinguishable, which is not surprising given the similar ECPs and AIWs in . For D = 52, that is, the sample size of the real data, the nominal level of 5% is attained almost exactly under H₀.

Fig. 2 Simulated powers for multiple test $H_{0} : {\bar{μ}}^{PG} = b$ versus $H_{1} : {\bar{μ}}^{PG} = b + 1_{D} Δ$ under the area-level Poisson-gamma model; (left) D = 26, (middle) D = 52, (right) D = 78.

Fig. 2 Simulated powers for multiple test H0:μ¯PG=b versus H1:μ¯PG=b+1DΔ under the area-level Poisson-gamma model; (left) D = 26, (middle) D = 52, (right) D = 78.

4.2 Finite Sample Performance Under the Unit-Level Model

Under the unit-level model we assume $y_{d j} \sim Bin (m_{d j}, p_{d j})$ with $p_{d j} = {exp (x_{d j}^{t} β + δ u_{d})} / {1 + exp (x_{d j}^{t} β + δ u_{d})}$ , m_dj = 1, $u_{d} \sim N (0, 1)$ . In our context, y_dj is binary and value 1 indicates an individual below the poverty threshold defined in Section 5. The regression parameters are taken from our case study: $θ = (β^{t}, δ) = {(- 2.048, 0.989, 0.172, 0.760, 0.100, 0.348)}^{t}$ . Four categorical covariates result in 16 covariate classes $x_{d j} \in {z_{1}, \dots, z_{16}}$ for which we need to estimate N_dl using EquationEquation (25)(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) , $l = 1, \dots, 16$ . We considered $D = {26, 52, 78}$ containing unit-level information with $n = {11423, 23628, 35818}$ , respectively. Summary statistics for all samples are presented in . Furthermore, for D = 52, n_d, N_d, $x_{d j}$ , $z_{l}$ are the same as in our case study. The areas for $D = {26, 78}$ were selected in the same way as in Section 4.1. In addition, for D = 78, within each of the additional area we sampled with replacement n_d units (i.e., 26 newly sampled areas contained different units in comparison to the original sample). The parameter of interest is the area poverty proportion ${\bar{μ}}_{d}^{U}$ defined in EquationEquation (11)(11) $\begin{matrix} {\bar{μ}}_{d}^{U} = \frac{μ_{d}^{U}}{N_{d}}, μ_{d}^{U} = \sum_{j = 1}^{N_{d}} p_{d j} = \sum_{l = 1}^{L} N_{d l} r_{d l}, with \\ r_{d l} = \frac{exp (z_{l} β + δ u_{d})}{1 + exp (z_{l} β + δ u_{d})}, \end{matrix}$ (11) . Given that the original sample size was n = 23, 628, under the unit-level model we restrict our simulations to K = 200, $B_{1} = 500$ , and $B_{2} = 1$ . As far as the algorithm for constructing SCI and iCI is concerned, it follows almost the same steps as in Section 4.1. The exact algorithm can be found in the SM.

Table 2 Summary statistics of n_d under different scenarios in the simulation study under the unit-level model.

Download CSV Display Table

presents the performance of SCI constructed using ${mse}_{B}$ (B) and ${mse}_{BC}$ (BC). The coverage probability is somewhat lower than the nominal level. In addition, it slightly decreases with increasing D, whereas the AIW increase stabilizes as expected (see Remark 1). The undercoverage might be related to the simulation design. Even though the latter is popular in SAE, it is suboptimal for random effects from the asymptotic point of view ( $n_{d} ↛ \infty, d \in [D]$ , recall Section 2.1). The results in do not demonstrate any inconsistencies with respect to the theoretical developments, nor they exhibit unexpected findings. Due to their limited impact, the equivalents of and for this simulation are deferred to our SM. In comparison to the area-level models, the coverage probability is worse and the average width of SCIs is much larger (it is also the case for the iCI, see our SM). Moreover, fitting unit-level models is computationally more expensive. In our case the estimation of MSE and construction of intervals took about 900–1000 times longer. Since the data-generation processes are different, the numerical results in our simulations are not directly comparable. However, our empirical studies suggest to give some preference to the area-level modeling in the considered GLMM settings.

Our simulations lead us to following conclusions. First, for a given sample size and data, our SCI attains the nominal coverage probability, almost independently from the choice of the estimator of variability. In particular, the area-level models yield very accurate results even for small samples. Second, the distinction between SCI and iCI is crucial, and the latter should not be employed in comparative studies. Third, the numerical performance of our test for comparative studies is satisfying. Given the simplicity of SCI and tests based on $\sqrt{{\hat{g}}_{1 d}}$ , we restrict further presentations to them.

Remark 3.

In our simulation study, we do not analyze the performance of direct estimators for proportions, because our goal is to study the numerical performance of our MTP and SCIs, and to compare them to existing iCIs. Since MTP and SCI are the first tools for simultaneous inference with GLMM-based mixed parameter, we concentrate on their implementation and application to the well-known model-based estimators. These have been thoroughly examined in comparative analyses which included direct estimators (see for instance Boubeta, Lombardía, and Morales Citation2016; Hobza and Morales Citation2016). In our case study in Section 5, we include direct estimators in order to have an almost model-free benchmark.

5 Predicting Poverty Rates in Galicia

Poverty prediction is of great interest for statistical offices. It provides a basis on which local or central authorities can decide about resource allocation and related polices. The interest is not in individual, randomly chosen small areas but in the total picture. Resource distribution requires comparative statistics, and one would thus provide SCI instead of iCI. We illustrate our methodology calculating point estimates, iCIs and SCIs for the poverty rates in each county of Galicia, that is, the proportions of inhabitants who live under a poverty line. We make use of a general part of the Structural Survey for Homes (SSH) in Galicia in 2015 with 23,628 individuals within 9203 households located in 52 counties (small areas). The survey does not produce official estimates at the area level, but we managed to recover the direct estimates of the totals of people below the poverty line $(Y_{d})$ , as well as the number of inhabitants $(N_{d})$ for each county. For the area-level models, we need to calculate the number of units which fall into a particular category $(X_{d i})$ , for example, number of employees or of graduates in each county of Galicia, $i = 1, \dots, p$ . The latter are used to obtain the proportions of individuals in each category ${\bar{X}}_{d i} = X_{d i} / N_{d}$ . For the unit-level model, we need to obtain the number of units N_dl falling into artificially created categories $z_{d l}, d = 1, \dots, D, l = 1, \dots, L$ , see Section 2.2.3. The explicit formulas are(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) where $R_{d} \in P_{d}$ are the sample elements belonging to area d, $d \in [D]$ , w_dj sampling weights, and y_dj binary variables with 1 indicating that an individual is below the poverty line. The poverty threshold is calculated from the survey. It is set to 0.6 of the median household income per capita in Galicia, that is, we do not use county specific poverty lines. This income is calculated in each household according to scale developed by the Organisation for Economic Co-operation and Development (the same technique is used by Eurostat). The model-based approach of this paper assumes that the estimates in EquationEquation (25)(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) are considered to be known, nonrandom quantities, following López-Vizcaíno, Lombardía, and Morales (Citation2015). SSH provides many categorical, auxiliary variables. Under the unit-level model these are binary variables with 1 indicating that a person belongs to a particular category, whereas under area-level models we use the county proportions. We considered four variables for labor status: children (ls0), employed (ls1), unemployed (ls2), inactive (ls3), and four covariates for education: less than primary (ed0), primary (ed1), first- and second-level secondary (ed2), higher education (ed3). Furthermore, we analyzed three variables for the size of municipality: less than 10,000 (sm1), 10,000–50,000 (sm2), more than 50,000 (sm3). We have also investigated the effect of two variables indicating the nationality, that is, Spanish (n1), not Spanish (n2). Finally, we examined five age variables: < 15 (age1), $15 - 24$ (age2), $25 - 49$ (age3), $50 - 64$ (age4), $> = 65$ (age5). We are interested in ${\bar{μ}}_{d}^{(\cdot)} : = μ_{d}^{(\cdot)} / N_{d}$ with $(\cdot)$ standing for PG or U in case of Poisson-gamma and binomial model, respectively, and in ρ_d in case of the Poisson-lognormal model. We first compute estimates of proportions and their variances using the same formulas as Boubeta, Lombardía, and Morales (Citation2016)(26) $\begin{matrix} {\hat{p}}_{d}^{dir} = {\hat{\bar{Y}}}_{d}^{dir} = \frac{{\bar{Y}}_{d}^{dir}}{{\hat{N}}_{d}^{dir}}, \\ \hat{var} ({\hat{p}}_{d}^{dir}) = \frac{1}{{({\hat{N}}_{d}^{dir})}^{2}} \sum_{j \in R_{d}} w_{j} (1 - w_{j}) {(y_{j} - {\hat{p}}_{d}^{dir})}^{2} . \end{matrix}$ (26)

We used estimates in EquationEquation (26)(26) $\begin{matrix} {\hat{p}}_{d}^{dir} = {\hat{\bar{Y}}}_{d}^{dir} = \frac{{\bar{Y}}_{d}^{dir}}{{\hat{N}}_{d}^{dir}}, \\ \hat{var} ({\hat{p}}_{d}^{dir}) = \frac{1}{{({\hat{N}}_{d}^{dir})}^{2}} \sum_{j \in R_{d}} w_{j} (1 - w_{j}) {(y_{j} - {\hat{p}}_{d}^{dir})}^{2} . \end{matrix}$ (26) to construct design-based iCI intervals (Dir) displayed in . Following López-Vizcaíno, Lombardía, and Morales (Citation2015), we then proceed with a variable selection inspired by the simulation results. More specifically, under the Poisson-gamma model we check if any of the levels of categorical variables for labor status, education and age are significant at the $α = 0.05$ level. We examined these covariates in the first place, because they turned out to be important in earlier studies on poverty rates (see, for instance, Boubeta, Lombardía, and Morales Citation2016). In this way, we selected ls2, ed2, and age2. Afterwards, we tested the levels of variables nationality and the size of the municipality and we additionally retained sm1 which was significant after the selection of ls2, ed2, and age2. The same categories were then used to other models, see . As we do not carry out a causality analysis, we refrain ourselves from a discussion of the magnitude or signs of estimates. We only notice that under the Poisson-gamma model, the signs are consistent with our expectations; unemployment and young age are associated with higher poverty rates, whereas higher level of studies or living in a small municipality is associated with lower poverty rates.

Fig. 3 Design and model-based 95% iCIs.

Table 4 Estimates of regression parameters under the area- and the unit-level models with ${\hat{δ}}^{PG} = 2.48, {\hat{δ}}^{PL} = 0.32$ and ${\hat{δ}}^{U} = 0.35$ , respectively.

Display Table

shows point and iCI estimates of proportions under Poisson-gamma (PG), Poisson-lognormal (PL), and binomial (Unit) models together with direct estimates (Dir). In this plot, we compare point estimates within four modeling frameworks; we do not compare them across different areas within the same model. First, the variability reflected by the width of iCIs decreases with the number of units in each area ${\hat{N}}_{d}^{dir}$ defined in EquationEquation (25)(25) $\begin{matrix} {\hat{Y}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j} y_{j}, {\hat{N}}_{d}^{dir} = \sum_{j \in R_{d}} w_{j}, \\ {\hat{N}}_{d l}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j} 1_{{x_{j} = z_{l}}}, \\ {\hat{X}}_{d i}^{dir} = \sum_{j \in R_{d}} w_{j} x_{j i}, and {\hat{\bar{X}}}_{d i}^{dir} = {\hat{X}}_{d i}^{dir} / {\hat{N}}_{d}^{dir}, \end{matrix}$ (25) . Second, even though the area sample sizes n_d, $d \in [D]$ are not that small, the iCI of direct estimates are wider than model-based estimates, which is in accordance with the literature. The width difference is especially pronounced when comparing area-level-based with design-based direct estimates—the latter entirely cover the former. Unit-level model-based point and interval estimates are different with much wider iCIs than under area-level models, but still overlapping with the direct estimates. Only in one case (sixth area in the third panel), the iCIs under area-level models do not overlap with the iCI under the unit-level model which indicates a possible bias in one of the approaches. In contrast, both area-level models produce almost identical estimates.

presents bootstrap iCI and SCI for ${\hat{\bar{μ}}}_{d}^{PG}, d = 1, \dots, D$ constructed with $\hat{σ} ({\hat{ζ}}_{d}) = \sqrt{{\hat{g}}_{PG 1 d}}$ as defined in EquationEquation (9)(9) $\begin{matrix} g_{PG 1 d} = κ_{1 d} (θ) - κ_{2 d} (θ), \\ {\hat{g}}_{PG 1 d} = κ_{1 d} (\hat{θ}) - {\hat{κ}}_{2 d} (\hat{θ}), d \in [D], \\ with κ_{1 d} (θ) = \frac{λ_{d}^{2} (δ + 1)}{δ} and \\ κ_{2 d} (θ) = \sum_{j = 0}^{\infty} \frac{λ_{d}^{2} {(j + δ)}^{2}}{{(λ_{d} + δ)}^{2}} P (y_{d} = j), \\ as well as c_{d} (θ) = \sum_{j = 0}^{\infty} {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)}^{t} \\ \times V {ar}_{d} (θ) {\frac{\partial}{\partial θ} ψ_{d}^{PG} (y_{d}, θ)} P (y_{d} = j) . \end{matrix}$ (9) . The plot is divided into five panels according to the numbers of units in each area obtained by direct estimates of county inhabitants ${\hat{N}}_{d}^{dir}$ in (25). serves as an illustration of the differences between individual and simultaneous inference. When comparing iCI and SCI, in many cases (e.g., first and second county of the first panel in ) iCI would insinuate statistically different poverty rates, whereas SCI does not confirm this claim. Such multiple comparisons are valid only if we use SCI. In addition, at least 5% of the true poverty rates are not contained in their iCIs. Analogous figures under the Poisson-lognormal and binomial models lead to the same conclusions. They are thus deferred to the SM. Further model selection and specification testing might be interesting but they are beyond the scope of this article.

Fig. 4 95% bootstrap iCI and SCI estimates for poverty rates in counties of Galicia.

Since we do not know which model is closer to the real data-generating process, we proceed with the Poisson-gamma area-level model, as it is reliable and the least computer intensive. Left and middle panel of depict the resulting maps of the counties with the corresponding lower and upper bounds of our SCI. We observe a higher rate of poverty in the interior and a south-western part of the region whereas a lower level is typical for the northern part. These conclusions are similar to those drawn by Boubeta, Lombardía, and Morales (Citation2017).

Fig. 5 SCI of EBP poverty proportions: (left) lower boundary, (middle) upper boundary; (right) significant differences in poverty rates between women (F) and men (M).

Finally, we investigate whether men and women are equally affected by poverty. We wish to test for equality on the county level across Galicia. Testing for each county individually at $α = 5 %$ error level results in rejection of at least 5% of the hypotheses of no significant difference. We thus use our MTP and consider clusters created from the cross section of sex and county such that $ζ \in R^{104}$ . We test $H_{0} : Bζ = 0_{52}$ vs. $H_{1} : Bζ \neq 0_{52}$ where $B \in R^{52 \times 104}$ with rows being vectors with 1 on the $2 d - 1$ place, –1 on 2d place, and 0 elsewhere. The max-type test statistic yields $t_{H} = \max_{d = 1, \dots, 104} | B \hat{ζ} | / \hat{σ} (\hat{ζ}) \approx 20.489$ while the bootstrap critical value under H₀ is $q_{{BH}_{0}}^{(1 - α)} \approx 2.999$ . Thus, we strongly reject H₀. However, our test does not support the hypothesis that women are more affected than men, or vice versa, see the right panel of . Additional results are deferred to our SM.

Remark 4.

Imagine that Galician counties were considered as a part of a macro region, for example, Spain with D_S counties, and consider two inferential problems: (a) the calculation of SCI for the poverty rates in all D_S Spanish counties, (b) the calculation of SCI only for D Galician counties, but using all data. Following Remark 1, we expect that the widths of our SCI in would increase in case (a) to maintain the joint coverage probability of 95% for all $D_{S} > D$ counties. In contrast, they would most likely slightly decrease in case (b). In fact, the simultaneous coverage probability of 95% would be requested for the set of D counties, but SCI would be constructed using a more precise estimate of MSE computed using a larger dataset with D_S counties.

6 Conclusions

We developed a methodology that allows for statistically valid simultaneous inference for EBP under GLMM. We constructed SCI and MTP applying a combination of max-type statistics and consistent bootstrap estimation of its distribution. These tools enable practitioners to make comparisons between areas. In contrast, the iCIs are not suitable for such comparative analyses because they are constructed at individual confidence level and disregard an additional variation which arises in joint studies. We do not claim that SCI and MTP are better than iCIs or t-tests. The former simply complete the toolbox for statistical inference for mixed parameter ζ_d. Similarly, the simultaneous inference completes the individual inference for fixed parameters. We introduced various versions of statistics to construct SCI and MTP. Within our framework, all of them exhibited similar performances without indicating a clear winner.

Our methodology can be extended to more complicated data structures such as GLMM with spatial or temporal correlation (see, e.g., Hobza, Morales, and Santamaría Citation2018; Chandra, Chambers, and Salvati Citation2019). One could also consider spatio-temporal or nonparametric models to build SCI by adjusting the statistic S₀ and choosing a bootstrap procedure accordingly. Apart from a mathematical challenge to develop a valid asymptotic theory, these extensions would require a construction of an appropriate bootstrap scheme and its computationally efficient implementation.

Supplemental material

Supplemental Material

Download Zip (601.1 KB)

Supplementary Materials

The supplementary materials consist of: (a) a document with further developments, in particular additional MSE decomposition, a proof of Proposition 1, the derivations of estimators under area-level Poisson and unit-level binomial models, additional numerical results and a data analysis which completes the case study in Section 5, (b) codes for replicating the results in the main document, and (c) a document which contains additional information on the data set and the description of the codes.

Ministerio de Econom?a y Competitividad (Spain);

Additional information

Funding

The authors gratefully acknowledge the support from the Swiss National Science Foundation for the project 200021-192345. In addition, they acknowledge the support from the MINECO grants MTM2017-82724-R and MTM2014-52876-R, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF. The computations were performed at the University of Geneva on the Baobab cluster.

References

Bianconcini, S. (2014), “Asymptotic Properties of Adaptive Maximum Likelihood Estimators in Latent Variable Models,” Bernoulli, 20, 1507–1531. DOI: 10.3150/13-BEJ531.
Web of Science ®Google Scholar
Boubeta, M., Lombardía, M. J., and Morales, D. (2016), “Empirical Best Prediction Under Area-Level Poisson Mixed Models,” Test, 25, 548–569. DOI: 10.1007/s11749-015-0469-8.
Web of Science ®Google Scholar
Boubeta, M., Lombardía, M. J., and Morales, D. (2017), “Poisson Mixed Models for Studying the Poverty in Small Areas,” Computational Statistics & Data Analysis, 107, 32–47.
Web of Science ®Google Scholar
Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference in Generalized Linear Mixed Models,” Journal of the American Statistical Association, 88, 9–25.
Web of Science ®Google Scholar
Cameron, A. C., and Trivedi, P. K. (2013), Regression Analysis of Count Data, Cambridge: Cambridge University Press.
Google Scholar
Chambers, R., Salvati, N., and Tzavidis, N. (2012), “M-Quantile Regression for Binary Data With Application to Small Area Estimation,” Centre for Statistical and Survey Methodology, University of Wollongong, Working Paper, 1–24.
Google Scholar
Chandra, H., Chambers, R., and Salvati, N. (2012), “Small Area Estimation of Proportions in Business Surveys,” Journal of Statistical Computation and Simulation, 82, 783–795. DOI: 10.1080/00949655.2011.554834.
Web of Science ®Google Scholar
Chandra, H., Chambers, R., and Salvati, N. (2019), “Small Area Estimation of Survey Weighted Counts Under Aggregated Level Spatial Model,” Survey Methodology, 45, 31–59.
Web of Science ®Google Scholar
Chatterjee, S., Lahiri, P., and Li, H. (2008), “Parametric Bootstrap Approximation to the Distribution of EBLUP and Related Prediction Intervals in Linear Mixed Models,” Annals of Statistics, 36, 1221–1245.
Web of Science ®Google Scholar
Chen, S., Jiang, J., and Nguyen, T. (2015), “Observed Best Prediction for Small Area Counts,” Journal of Survey Statistics and Methodology, 3, 136–161. DOI: 10.1093/jssam/smv001.
Google Scholar
De Bruijn, N. G. (1981), Asymptotic Methods in Analysis, New York: Dover Publications, Inc.
Google Scholar
DiCiccio, T. J., and Efron, B. (1996), “Bootstrap Confidence Intervals,” Statistical Science, 11, 189–228. DOI: 10.1214/ss/1032280214.
Web of Science ®Google Scholar
Erciulescu, A. L., and Fuller, W. A. (2014), “Parametric Bootstrap Procedures for Small Area Prediction Variance,” in Proceedings of the Joint Statistical Meeting-Survey Research Methods Section, Boston, pp. 3307–3318.
Google Scholar
Franco, C., and Bell, W. R. (2015), “Borrowing Information Over Time in Binomial/Logit Normal Models for Small Area Estimation,” Statistics in Transition New Series, 16, 563–584. DOI: 10.21307/stattrans-2015-033.
Google Scholar
Ganesh, N. (2009), “Simultaneous Credible Intervals for Small Area Estimation Problems,” Journal of Multivariate Analysis, 100, 1610–1621. DOI: 10.1016/j.jmva.2009.01.009.
Web of Science ®Google Scholar
Ghosh, M., Natarajan, K., Stroud, T. W. F., and Carlin, B. P. (1998), “Generalized Linear Models for Small-Area Estimation,” Journal of the American Statistical Association, 93, 273–282. DOI: 10.1080/01621459.1998.10474108.
Web of Science ®Google Scholar
Hall, P., and Maiti, T. (2006), “On Parametric Bootstrap Methods for Small Area Prediction,” Journal of the Royal Statistical Society, Series B, 68, 221–238. DOI: 10.1111/j.1467-9868.2006.00541.x.
Google Scholar
Hobza, T., and Morales, D. (2016), “Empirical Best Prediction Under Unit-Level Logit Mixed Models,” Journal of Official Statistics, 32, 661–692. DOI: 10.1515/jos-2016-0034.
Web of Science ®Google Scholar
Hobza, T., Morales, D., and Santamaría, L. (2018), “Small Area Estimation of Poverty Proportions Under Unit-Level Temporal Binomial-Logit Mixed Models,” Test, 27, 270–294. DOI: 10.1007/s11749-017-0545-3.
Web of Science ®Google Scholar
Huber, P., Ronchetti, E., and Victoria-Feser, M.-P. (2004), “Estimation of Generalized Linear Latent Variable Models,” Journal of the Royal Statistical Society, Series B, 66, 893–908. DOI: 10.1111/j.1467-9868.2004.05627.x.
Google Scholar
Jiang, J. (1998), “Consistent Estimators in Generalized Linear Mixed Models,” Journal of American Statistical Association, 93, 720–729. DOI: 10.1080/01621459.1998.10473724.
Web of Science ®Google Scholar
Jiang, J. (2003), “Empirical Best Prediction for Small-Area Inference Based on Generalized Linear Mixed Models,” Journal of Statistical Planning Inference, 111, 117–127.
Web of Science ®Google Scholar
Jiang, J. (2007), Linear and Generalized Linear Mixed Models and Their Applications, Springer Science & Business Media, New York: Springer.
Google Scholar
Jiang, J. (2013), “The Subset Argument and Consistency of MLE in GLMM: Answer to an Open Problem and Beyond,” Annals of Statistics, 41, 177–195.
Web of Science ®Google Scholar
Jiang, J., and Lahiri, P. (2001), “Empirical Best Prediction for Small Area Inference With Binary Data,” Annals of the Institute of Statistical Mathematics, 53, 217–243. DOI: 10.1023/A:1012410420337.
Web of Science ®Google Scholar
Kramlinger, P., Krivobokova, T., and Sperlich, S. (2018), “Marginal and Conditional Multiple Inference in Linear Mixed Models,” arXiv:1812.09250.
Google Scholar
Leadbetter, M. R., Lindgren, G., and Rootzén, H. (2012), Extremes and Related Properties of Random Sequences and Processes, Springer Science & Business Media. New York: Springer.
Google Scholar
Lele, S. R., Nadeem, K., and Schmuland, B. (2010), “Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning,” Journal of the American Statistical Association, 105, 1617–1625. DOI: 10.1198/jasa.2010.tm09757.
Web of Science ®Google Scholar
López-Vizcaíno, E., Lombardía, M. J., and Morales, D. (2015), “Small Area Estimation of Labour Force Indicators Under a Multinomial Model With Correlated Time and Area Effects,” Journal of Royal of Statistical Society, Series A, 178, 535–565. DOI: 10.1111/rssa.12085.
Web of Science ®Google Scholar
Molina, I., Saei, A., and Lombardía, M. J. (2007), “Small Area Estimates of Labour Force Participation Under a Multinomial Logit Mixed Model,” Journal of the Royal Statistical Society, Series A, 170, 975–1000. DOI: 10.1111/j.1467-985X.2007.00493.x.
Web of Science ®Google Scholar
Naylor, J. C., and Smith, A. F. (1982),“Applications of a Method for the Efficient Computation of Posterior Distributions,” Journal of Royal Statistical Society, Series C, 31, 214–225. DOI: 10.2307/2347995.
Web of Science ®Google Scholar
Pinheiro, J. C., and Bates, D. M. (1995), “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model,” Journal of Computational and Graphical Statistics, 4, 12–35.
Google Scholar
Pratesi, M. (2016), Analysis of Poverty Data by Small Area Estimation, New Jersey: Wiley.
Google Scholar
Reluga, K., Lombardía, M. J., and Sperlich, S. A. (2019), “Simultaneous Inference for Mixed and Small Area Parameters,” arXiv:1903.02774.
Google Scholar
Romano, J. P., Shaikh, A. M., and Wolf, M. (2008), “Control of the False Discovery Rate Under Dependence Using the Bootstrap and Subsampling,” Test, 17, 417–442. DOI: 10.1007/s11749-008-0126-6.
Web of Science ®Google Scholar
Saei, A., and Taylor, A. (2012), “Labour Force Status Estimates Under a Bivariate Random Components Model,” Journal of the Indian Society of Agricultural Statistics, 66, 187–201.
Google Scholar
Scealy, J. (2010), “Small Area Estimation Using a Multinomial Logit Mixed Model With Category Specific Random Effects,” Research Paper, Australian Bureau of Statistics.
Google Scholar
Torabi, M. (2012), “Likelihood Inference in Generalized Linear Mixed Models With Two Components of Dispersion Using Data Cloning,” Computational Statistics & Data Analysis, 56, 4259–4265.
Web of Science ®Google Scholar
Tuerlinckx, F., Rijmen, F., Verbeke, G., and De Boeck, P. (2006), “Statistical Inference in Generalized Linear Mixed Models: A Review,” British Journal of Mathematical and Statistical Psychology, 59, 225–255. DOI: 10.1348/000711005X79857.
PubMed Web of Science ®Google Scholar
Tzavidis, N., Ranalli, M. G., Salvati, N., Dreassi, E., and Chambers, R. (2015), “Robust Small Area Prediction for Counts,” Statistical Methods in Medical Research, 24, 373–395. DOI: 10.1177/0962280214520731.
PubMed Web of Science ®Google Scholar

Appendix A:

Technical details

A.1 Regularity conditions

In this section, we state the regularity conditions used in our derivations.

${\hat{u}}_{d} = arg {max}_{u_{d} \in R} {log g_{d} (y_{d} | u_{d}, θ) + log h (u_{d})}$ .
$l (θ)$ exists and is well-defined if: (a) $l (θ)$ is continuous, uniquely maximized and $θ_{0} \in Θ$ , where $θ_{0}$ is a true parameter value; (b) $l (θ)$ and $\hat{l} (θ)$ are concave; (c) $θ_{0}$ is an interior point of the parameter space and the estimator $\hat{θ}$ is an interior point of the neighborhood of $θ_{0}$ ; (d) $\hat{l} (θ)$ converges uniformly in probability to $l (θ)$ .
$x_{d j}$ are bounded and $E (y_{d j}^{m}) < \infty$ for all $d \in [D], j \in [n_{d}]$ , where m is suitable large.
For each fixed y, a score equation is continuously differentiable and $E {R (θ_{0})} = 0$ .
$\underset{n}{\lim \inf} λ [n^{- 1} V ar {R (θ)}] > 0$ and $\underset{n}{\lim \inf} λ [- n^{- 1} E {\nabla R (θ)}] > 0$ where $\nabla R (θ) = \frac{\partial R (θ)}{\partial θ}$ and $λ [A]$ indicates the smallest eigenvalue of matrix A.

The first two conditions refer to the log-likelihood function (see, e.g., Bianconcini Citation2014), whereas conditions 3–5 are needed for the derivation of the $MSE$ estimators.

A.2 Proof of Proposition 2

Let $y_{d j}^{*} \sim Exp . Family (θ)$ . If $u_{d}^{*}$ is sampled from a suitable distribution, then we have $γ_{d j}^{*} = M {E (y_{d j}^{*} | u_{d})} = x_{d j}^{t} \hat{β} + u_{d}^{*}$ . Furthermore, $V {ar}^{*} (y_{d}^{*}) = V {ar}^{*} (E^{*} (y_{d}^{*} | u_{d}^{*})) + E^{*} (V {ar}^{*} (y_{d}^{*} | u_{d}^{*}))$ . The first part of the Proposition follows from the way we generate the random effects as well as the results on the consistency of $\hat{θ}$ . To show the second part we consider a general score equation. Replace y by $y^{*}$ and set $θ = \hat{θ}$ , that is, $R^{*} (θ) = \frac{\partial l^{*} (\hat{θ})}{\partial \hat{θ}} = \sum_{d = 1}^{D} \frac{\partial log f_{d} (y_{d}^{*} | \hat{θ})}{\partial \hat{θ}} = 0$ . Then $E^{*} {R^{*} (θ)} = 0$ at $θ = \hat{θ}$ which yields consistency of ${\hat{θ}}^{*}$ . $□$

A.3 Proof of Proposition 3

Let ζ_d be a general EBP, $g_{d} : = g_{d} (θ)$ and ${\hat{g}}_{d} : = {\hat{g}}_{d} (\hat{θ})$ . Assume that $| | {\hat{g}}_{d} - g_{d} | | = O_{P} (n^{- c}), c > 0$ . The proof uses ideas of Chatterjee, Lahiri, and Li (2008). We investigate the properties of $G_{d} (a)$ . $\begin{matrix} G_{d} (a) = P (\frac{{\hat{ζ}}_{d} - ζ_{d}}{{\sqrt{\hat{g}}}_{d}} \leq a) \\ = E (P [\frac{{\tilde{ζ}}_{d} - ζ_{d}}{{\sqrt{g}}_{d}} \leq a + {\frac{a ({\sqrt{\hat{g}}}_{d} - {\sqrt{g}}_{d}) + {\tilde{ζ}}_{d} - \hat{ζ_{d}}}{{\sqrt{g}}_{d}}}] | y_{d}) \\ = E [Φ {a + Q (a, y_{d})}] \\ = Φ (a) + ϕ (a) E {Q (a, y_{d})} - 2^{- 1} a ϕ (a) E {Q^{2} (a, y_{d})} \\ + 2^{- 1} E [\int_{a}^{a + Q (a, y_{d})} {a + Q (a, y_{d}) - x}^{2} (x^{2} - 1) ϕ (x) d x] . \end{matrix}$

Applying some classical results and a triangle inequality, it follows that the last term is bounded by $E | Q |^{3}$ , and is of smaller order than the first three terms. Therefore, the first step toward the consistency of SCI is to quantify the asymptotic expansions of $E {Q (a, y_{d})}$ and $E {Q^{2} (a, y_{d})}$ . We decompose $Q (a, y_{d})$ into $Q (a, y_{d}) = g_{d}^{- 1 / 2} ({\tilde{ζ}}_{d} - \hat{ζ_{d}}) + a g_{d}^{- 1 / 2} ({\hat{g}}_{d}^{1 / 2} - g_{d}^{1 / 2}) = Q_{1} + Q_{2} .$

Let ψ be a twice differentiable function with respect to $θ, y_{d .} = y_{d 1} + \dots + y_{d n_{d}}, \forall d \in [D]$ . Observe that $y_{d .} = y_{d}$ under an area-level model. The specific form of ψ depends on the choice of the GLMM (for instance, under the Poisson-gamma model we spelled it out in EquationEquation (7)(7) $\begin{matrix} E (μ_{d}^{PG} | y_{d}) = \frac{\int_{0}^{\infty} λ_{d} w_{d} g (y_{d} | w_{d}) h (w_{d}) d w_{d}}{\int_{0}^{\infty} g (y_{d} | w_{d}) h (w_{d}) d w_{d}} \\ = \frac{A_{d}^{PG} (y_{d}, θ)}{C_{d}^{PG} (y_{d}, θ)} = \frac{λ_{d} (y_{d} + δ)}{(λ_{d} + δ)} \\ = : ψ_{d}^{PG} (y_{d}, θ) . \end{matrix}$ (7) ). Function ψ satisfies the decomposition(A.1) $\begin{matrix} {\hat{ζ}}_{d} - {\tilde{ζ}}_{d} = ψ_{d} (y_{d .}, \hat{θ}) - ψ_{d} (y_{d .}, θ) = {\frac{\partial}{\partial θ} ψ_{d} (y_{d .}, θ)}^{t} (\hat{θ} - θ) \\ + \frac{1}{2} {(\hat{θ} - θ)}^{t} {\frac{\partial^{2}}{\partial^{2} θ} ψ_{d} (y_{d .}, θ)} (\hat{θ} - θ) + o_{P} (| | \hat{θ} - θ | |^{2}) . \end{matrix}$ (A.1)

Let $C = 2 c$ where c > 0. Since we assume $| | \hat{θ} - θ | | = O_{P} (n^{- c})$ , we have(A.2) $E [{{\hat{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] = \frac{1}{n^{C}} E ({[{\frac{\partial}{\partial θ} ψ_{d} (y_{d .}, θ)}^{t} n^{c} (\hat{θ} - θ)]}^{2}) + o (n^{- C}) .$ (A.2)

As for Q₁, it has been found in EquationEquation (A.1)(A.1) $\begin{matrix} {\hat{ζ}}_{d} - {\tilde{ζ}}_{d} = ψ_{d} (y_{d .}, \hat{θ}) - ψ_{d} (y_{d .}, θ) = {\frac{\partial}{\partial θ} ψ_{d} (y_{d .}, θ)}^{t} (\hat{θ} - θ) \\ + \frac{1}{2} {(\hat{θ} - θ)}^{t} {\frac{\partial^{2}}{\partial^{2} θ} ψ_{d} (y_{d .}, θ)} (\hat{θ} - θ) + o_{P} (| | \hat{θ} - θ | |^{2}) . \end{matrix}$ (A.1) that $E ({\hat{ζ}}_{d} - {\tilde{ζ}}_{d}) = \frac{1}{n^{c}} E [{\frac{\partial}{\partial θ} ψ_{d} (y_{d .}, θ)}^{t} n^{c} (\hat{θ} - θ)] + o (n^{- c}),$ and $E {{({\hat{ζ}}_{d} - {\tilde{ζ}}_{d})}^{2}} = O (n^{- C})$ , thanks to the result in EquationEquation (A.2)(A.2) $E [{{\hat{ζ}}_{d} (\hat{θ}) - {\tilde{ζ}}_{d} (θ)}^{2}] = \frac{1}{n^{C}} E ({[{\frac{\partial}{\partial θ} ψ_{d} (y_{d .}, θ)}^{t} n^{c} (\hat{θ} - θ)]}^{2}) + o (n^{- C}) .$ (A.2) . Furthermore, observe that g_d is of order O(1) which leads to $E (Q_{1}) = O (D^{- c})$ as well as $E (Q_{1}^{2}) = O (D^{- C})$ . When we turn to Q₂, we have an immediate simplification $Q_{2} = a g_{d}^{- 1 / 2} ({\hat{g}}_{d}^{1 / 2} - g_{d}^{1 / 2}) = a {{({\hat{g}}_{d} / g_{d})}^{1 / 2} - 1}$ . Let g_d be twice differentiable with respect to $θ$ . Similarly to the computations above, we have the expansion $\begin{matrix} {\hat{g}}_{d} (\hat{θ}) = g_{d} (θ) + {(\frac{\partial}{\partial θ} g_{d} (θ))}^{t} (\hat{θ} - θ) + \frac{1}{2} {(\hat{θ} - θ)}^{t} \\ \times (\frac{\partial^{2}}{\partial^{2} θ} g_{d} (θ)) (\hat{θ} - θ) + o_{P} (| | \hat{θ} - θ | |^{2}) . \end{matrix}$

Therefore, we obtain $E {{\hat{g}}_{d} (\hat{θ})} = g_{d} (θ) + \frac{1}{n^{C}} E [{\frac{\partial}{\partial θ} g_{d} (θ)}^{t} n^{C} (\hat{θ} - θ)] + O (n^{- C}) .$

It follows that $E {{(\hat{g} / g)}^{1 / 2}} = O (n^{- c}), E (Q_{2}) = O (n^{- C})$ and $E (Q_{2}^{2}) = O (n^{- C})$ . We can deduce that $G_{d} (a)$ attains the asymptotic expansion $G_{d} (a) = Φ (a) + n^{- c} γ (a, θ) + O (n^{- C})$ . A similar expansion can be established for $G_{d}^{*} (a)$ if we replace $θ$ with $\hat{θ}$ and P with $P^{*}$ .

Simultaneous Inference for Empirical Best Predictors With a Poverty Study in Small Areas

Abstract

1 Introduction