2,649
Views
7
CrossRef citations to date
0
Altmetric
Theory and Methods

Simultaneous Inference for Empirical Best Predictors With a Poverty Study in Small Areas

ORCID Icon, &
Pages 583-595 | Received 08 Apr 2020, Accepted 08 Jun 2021, Published online: 09 Aug 2021

Abstract

Today, generalized linear mixed models (GLMM) are broadly used in many fields. However, the development of tools for performing simultaneous inference has been largely neglected in this domain. A framework for joint inference is indispensable to carry out statistically valid multiple comparisons of parameters of interest between all or several clusters. We therefore develop simultaneous confidence intervals and multiple testing procedures for empirical best predictors under GLMM. In addition, we implement our methodology to study widely employed examples of mixed models, that is, the unit-level binomial, the area-level Poisson-gamma and the area-level Poisson-lognormal mixed models. The asymptotic results are accompanied by extensive simulations. A case study on predicting poverty rates illustrates applicability and advantages of our simultaneous inference tools.

1 Introduction

Generalized linear mixed models (GLMM) are suitable for modeling clustered and correlated data with categorical or count outcomes. They are ubiquitous in applied statistics, for example, in biometrics or small area estimation (SAE). In the latter, they serve to analyze surveys on a disaggregated level. Despite an increasing interest, for example, to guide resource allocation, the development of methods for simultaneous inference for predictors is missing. It is surprising as only those would make joint considerations of clusters valid. Available (1α)-confidence intervals (CI) for mixed parameters (except the credibility intervals of Ganesh Citation2009) are constructed such that for each study at least α100% of them do not contain the true value. Undoubtedly, practitioners do compare, but so far without valid statistical tools. We aim to close this distressing gap, not to improve any existing method.

Specifically, we introduce simultaneous confidence interval (SCI) and multiple test procedure (MTP) for the empirical best predictor (EBP) of Jiang (Citation2003). They are based on max-type statistics combined with extreme value theory. We prove asymptotic convergence of SCI and MTP for nested (or hierarchical) GLMM within the exponential family. We study the numerical performance of our SCI and MTP for two area-level and one unit-level mixed models that are widely used, for example, for studying local poverty rates (Pratesi Citation2016). All introduced methods show a satisfactory performance within considered modeling frameworks. Even though our estimates under the area-level models appear to be less volatile, one can argue that EBPs are not directly comparable because different methods and model classes are used. Finally, under area-level Poisson-gamma model, we derive a new mean squared error (MSE) estimator which is of crucial interest in SAE.

The amount of literature on estimation and testing under GLMM is considerable, see, i.a., the review of Tuerlinckx et al. (Citation2006), the monograph of Jiang (Citation2007), and the article of Ghosh et al. (Citation1998) which is particularly interesting within the context of SAE. Furthermore, researchers put forward several methodologies broadly used in the analysis of count data. Molina, Saei, and Lombardía (Citation2007) and Scealy (Citation2010) studied the estimation of labor force status using multinomial logistic models, whereas Saei and Taylor (Citation2012) focused on the same target parameter, and examined the performance of a bivariate random components model. Chandra, Chambers, and Salvati (Citation2012) and Franco and Bell (Citation2015) provided extensions for modeling proportions using logistic unit- and area-level models. Hobza and Morales (Citation2016) implemented the EBP for unit-level, and Boubeta, Lombardía, and Morales (Citation2016) for area-level GLMM to study poverty in small areas. Chambers, Salvati, and Tzavidis (Citation2012) and Tzavidis et al. (Citation2015) extended the M-quantile inference for robust estimation and prediction of count data. Yet, to the best of our knowledge, no one addresses the issue of simultaneous inference for clusters-level parameters when applying GLMM. Likewise, little research has been carried out on simultaneous inference for cluster level parameters in linear mixed effects models (LMM). Ganesh (Citation2009) developed credibility intervals for a mixed parameter in a particular area-level model. Reluga, Lombardía, and Sperlich (Citation2019) proposed bootstrap SCI and MTP for mixed parameter under LMM, whereas Kramlinger, Krivobokova, and Sperlich (Citation2018) developed a framework for marginal and conditional inference with quadratic forms.

After an introduction of a model and estimators in Section 2, we propose the construction of SCI and MTP for a general EBP, followed by the theoretical justifications in Section 3. Sections 4 and 5 present simulations and a case study. Conclusions are drawn in Section 6. More details are deferred to appendix and our supplementary material (SM).

2 Best Prediction for GLMM

Let D be the number of clusters or areas with d[D], nd the number of sampled units in each area j[nd] with n=d=1Dnd, Nd the known population sizes with N=d=1DNd,[A]={1,,A}. Since in our context the notion of cluster and area can be used synonymously, we proceed with the latter. Suppose that {vd:d=1,,D} is a set of independent and identically distributed (iid) random effects with unknown variance δ2,δ>0, which is often parameterized as vd=δud with udN(0,1). The target variable ydj represents the jth sample observation from the dth area. Furthermore, we consider nested data structures such that ydjydj for dd. In full generalization, we assume that random variables Ydj, conditionally on a random effect ud, are independent with a probability density function (pdf) from the exponential family ydj|udExp.Family(θ) ydj|udindep.gYdj|ud(ydj|ud),gdj(ydj|ud,θ)=exp{φ1[ydjγdjb(γdj)]+c(ydj,φ)},where θ=(βt,δ,φ)t with δ the variability parameter, β=(β1,,βp)t regression parameters of auxiliary variables xdj=(xdj1,,xdjp)t for which typically xdj1=1,j[nd] d[D], and γdj, φ are canonical and scale parameters, respectively. Link function M relates E(Ydj|ud) to a linear mixed model such that γdj=M{E(Ydj|ud)}=xdjtβ+δud.

2.1 Estimation and Computation

Let yd=(yd1,,ydnd)t for all d[D] be the vector of outcomes, and y=(y1t,,yDt)t. A conditional pdf of y and the likelihood contribution from each area d are given by(1) Ld(θ):=fd(yd|θ)=gd(yd|ud,θ)h(ud)dud=j=1ndgdj(ydj|ud,θ)h(ud)dud,(1) where θ can be derived fromL(θ):=d=1DLd(θ)=d=1Dj=1ndgdj(ydj|ud,θ)h(ud)dud.

In case of area-level models, nd = 1, and EquationEquation (1) simplifies accordingly. For a concise presentation, we assume that there is a single random effect for each area such that the integral in EquationEquation (1) is one-dimensional. Extensions to multidimensional random effects follow immediately with some changes of notations and more complicated computation. Finding an analytical solution to EquationEquation (1) is difficult unless the integral can be simplified. Often one evaluates the integral numerically by Laplace approximation (LA) (De Bruijn Citation1981), Gaussian quadrature (GQ) (Naylor and Smith Citation1982) or adaptive GQ (AGQ) (Pinheiro and Bates Citation1995). In what follows, we proceed with AGQ as it is a higher order version of LA, that is, it gives smaller approximation errors (Bianconcini Citation2014). An alternative is the quasi-likelihood (Breslow and Clayton Citation1993) which suffers from a nondecreasing bias (Tuerlinckx et al. Citation2006), and the method of moments (Jiang Citation1998). In addition, researchers considerably advanced in developing methods to compute maximum likelihood (ML) estimators under GLMM. (Jiang Citation2007, sec. 4.1) proposed an expectation–maximization algorithm, whereas Lele, Nadeem, and Schmuland (Citation2010) developed the so-called data cloning subsequently implemented by Torabi (Citation2012).

Since we consider a prediction problem of possibly nonlinear mixed effects ζd=ζd(β,ud), we use the best predictor (BP) ζd˜ in the sense of minimizing the area-specific MSE in EquationEquation (3) which is actually the area-specific mean squared prediction error:(2) ζ˜d=ζd˜(θ):=E{ζd(β,ud)|y}=E{ζd(β,ud)|yd}=ζd(β,ud)gd(yd|ud,θ)h(ud)dudgd(yd|ud,θ)h(ud)dud.(2)

Simplification of EquationEquation (2) is possible by choosing the pdf of ud accordingly. If we replace θ by a consistent estimator, then we obtain EBP ζ̂d:=ζ˜d(θ̂). Note that in order to obtain the consistency for random effects, one needs to assume that nd for each ζ̂d,d=1,,D (Jiang and Lahiri Citation2001).

Regarding the estimation of the variability of the EBP ζ̂d, MSE is by far the most popular measure. Well known techniques to estimate MSE are the analytical approximation based on a Taylor expansion (Jiang Citation2003), and parametric bootstrap approaches (Boubeta, Lombardía, and Morales Citation2016; Hobza and Morales Citation2016). Consider the following MSE decomposition:(3) MSE(ζ̂d)=E[{ζ˜d(θ̂)ζd}2]=E[{ζ˜d(θ̂)ζ˜d(θ)}2]+E[{ζ˜d(θ)ζd}2]=:g2d+g1d,(3) which can be derived applying the law of iterated expectations (for details, see Jiang Citation2003, and our SM). The analytical formulas of MSE estimators are model dependent. Bootstrapping permits to obtain estimators that do not vary with the model assumed. In what follows, we denote with E*, Var*, and MSE*, the corresponding bootstrap operators for expected value, variance and MSE and define(4) MSEB*(ζ̂d)=E*{(ζ̂d*ζd*)2}B11b1=1B1(ζ̂d*(b1)ζd*(b1))2=:mseB(ζ̂d),(4) which is a bootstrap equivalent of EquationEquation (3). In their article, Hall and Maiti (Citation2006) pointed out that (4) tends to underestimate the MSE, and propose a double-bootstrap bias-correction(5) MSEBC*(ζ̂d)=2MSEB*(ζ̂d)MSEB2*(ζ̂d)2mseB(ζ̂d)mseB2(ζ̂d),(5) where MSEB2*(ζ̂d) is the second-stage bootstrap MSE estimator, that isMSEB2*(ζ̂d)=E**{(ζ̂d**ζd**)2}B11B21b1=1B1b2=1B2(ζ̂d**(b1,b2)ζd**(b1,b2))2=:mseB2(ζ̂d).

The computation of MSEB2*(ζ̂d) involves selecting B2 bootstrap replicates from each first-stage bootstrap sample. In this article we do not aim for a precise estimation of the variability of EBP, but the construction of narrow SCI and reliable MTPs. It turns out that for doing this, the use of an estimate of g1d as in EquationEquation (3) yields better results than using an estimate of the entire MSE (see Section 4), similarly as in Chatterjee, Lahiri, and Li (2008) under LMM.

2.2 Popular Examples of GLMM and Their Properties

2.2.1 Poisson-Gamma Area-Level Model

The Poisson-gamma model is widely applied for modeling counts in the presence of overdispersion (see Cameron and Trivedi Citation2013, Section 4.2.2). Within the SAE context, Chen, Jiang, and Nguyen (Citation2015) investigated the observed best prediction and bootstrap MSE estimation for small area mean counts. Among others, they also consider a Poisson-gamma specification. We propose a different model formulation, focus on the EBP of ζd:=μdPG and develop a plug-in MSE estimator. Let yd|udPoiss(μdPG), d=1,,D, where μdPG>0,nd=1 d[D], with canonical parameter logμdPG=xdtβ+ud, and wd:=exp(ud)Gamma(δ,δ) such that E(yd|ud)=μdPG=λdwd=exp(xdtβ)wd=exp(xdtβ+ud). Since Gamma pdf is conjugate to the Poisson, their mixture yields a negative binomial ydNB(λd,δ1) with likelihood(6) LPG(θ):=fPG(y|θ)=d=1DΓ(yd+δ)Γ(yd+1)Γ(δ)×(δδ+λd)δ(λdδ+λd)yd,(6) where E(yd)=λd and Var(yd)=λd+δ1λd2. The marginal mean of yd is the same as in the Poisson case, but the random effect increases the variance. Suppose that this model holds for all areas of population P of size N partitioned into subpopulations P1,P2,,PD of sizes N1,N2,,ND. We can show that the BP for counts μ˜dPG(θ):=E(μdPG|yd) is(7) E(μdPG|yd)=0λdwdg(yd|wd)h(wd)dwd0g(yd|wd)h(wd)dwd=AdPG(yd,θ)CdPG(yd,θ)=λd(yd+δ)(λd+δ)=:ψdPG(yd,θ).(7)

EquationEquation (7) follows from the conjugation of the Gamma pdf to the Poisson pdf, whileAdPG=0λdwdexp(λdwd)λdydwdydδδwdδ1exp(wdδ)yd!Γ(δ)dwd=λdyd+1δδΓ(yd+1+δ)Γ(δ)yd!(λd+δ)yd+1+δ.

The EBP μ̂dPG is obtained by replacing the vector of unknown parameters θ in EquationEquation (7) with a consistent estimator θ̂. Under the Poisson-gamma model φ=1 and θ=(β,δ). We derive an analytical plug-in MSE estimator to measure the variability of our EBP.

Proposition 1.

Let Vard(θ)=DE{(θ̂θ)(θ̂θ)t}. An analytical MSE decomposition with its corresponding practical plug-in estimator are given by(8) MSEPG(μ˜dPG)=gPG1d+1Dcd(θ)+o(1/D)andmsePG(μ̂dPG)=ĝPG1d+1Dĉd(θ̂),(8) (9) gPG1d=κ1d(θ)κ2d(θ),ĝPG1d=κ1d(θ̂)κ̂2d(θ̂),d[D], with κ1d(θ)=λd2(δ+1)δ and κ2d(θ)=j=0λd2(j+δ)2(λd+δ)2P(yd=j), as well as cd(θ)=j=0{θψdPG(yd,θ)}t× Vard(θ){θψdPG(yd,θ)}P(yd=j).(9)

ĉd(θ) is a Monte Carlo approximation of cd(θ),κ̂2d refers to κ2d with an infinite series truncated at a large term and θ replaced by θ̂. To estimate κ1d we need only the latter.

One can estimate Vard(θ) using any reasonable method. In Section 4 we use bootstrap estimators defined in (24). Details on the derivation of EquationEquations (6) and Equation(8) are deferred to our SM.

2.2.2 Poisson-Lognormal Area-Level Model

The Poisson-lognormal model has been thoroughly examined by, among others, Cameron and Trivedi (Citation2013), Section 4.2.4, Franco and Bell (Citation2015) and Boubeta, Lombardía, and Morales (Citation2016). For udN(0,1), let yd|udPoiss(μdPL),d=1,,D, where μdPL>0, nd = 1 for all d[D]. In addition, μdPL=νdρd, where νd is a known size variable and ρd a binomial probability. The canonical parameter is logμdPL=logνd+xdtβ+δud for all d[D]. Typically ζd:=ρd for which we have ρd=exp(xdtβ+δud) with θ=(βt,δ). In this case, the likelihood isLPL(θ):=fPL(y|θ)=(2π)D/2d=1DRexp(νdρd)νdydexp{yd(xdtβ+δud)}yd!exp(ud22)du.

Once θ is estimated, we obtain BPs μ˜dPL,ρ˜d, and EBPs μ̂dPL,ρ̂d using the formulas from Boubeta, Lombardía, and Morales (Citation2016). In Section 4.1 we estimate their MSE by bootstrap.

2.2.3 Logit Unit-Level Model

The unit-level logit model is a popular choice for binary responses, comprehensively discussed by Hobza and Morales (Citation2016). Under this setting, ydj|udBin(mdj,pdj),udN(0,1) with mdj a known size parameter for a logistic regression. The natural parameter is pdj/(1pdj)=xdjtβ+δud,d[D], j[nd] where pdj={exp(xdjtβ+δud)}/{1+exp(xdjtβ+δud)}. We assume that the unit-level logit model holds for all units of population P of size N, partitioned into D subpopulations Pd of sizes Nd, d[D]. Let ζd:=μdU=j=1Ndpdj. As for the Poisson models, we have φ=1 and therefore θ=(βt,δ). The likelihood is given by(10) LU(θ):=fU(y|β,δ)=(2π)D/2d=1DRexp[j=1ndlog(mdjydj)+j=1ndydj(xdjtβ+δud)ud22j=1ndmdjlog{1+exp(xdjtβ+δud)}]dud.(10)

We can proceed with the estimation of the BP p˜dj(θ) and μ˜dU=j=1Ndp˜dj only if we have access to the information on each population unit. In practice, however, the auxiliary information is available only for the sample units. Then, following the suggestion of Hobza and Morales (Citation2016), we can still estimate the population quantity of interest by using only categorical covariates. Suppose that they take a finite number of values xdj{z1,,zL} for d[D] and j[nd] with zl denoting the resulting covariate class. We then define(11) μ¯dU=μdUNd,μdU=j=1Ndpdj=l=1LNdlrdl,withrdl=exp(zlβ+δud)1+exp(zlβ+δud),(11) where Ndl=#{lPd:xdj=zl} is the known size of class zl in area d. Hobza and Morales (Citation2016) derived BP μ˜dU(θ) and EBP μ̂dU(θ̂) for μdU as well as for other quantities in EquationEquation (11). Due to the computational burden of the analytical estimator, in Section 4.2 we use bootstrap for obtaining an estimate of MSE.

3 Simultaneous Intervals and Multiple Testing

To construct CI for ζd that account for the effect of estimates from other areas, we need to find a region I1α such that P(ζdI1αd[D])=1α. Define(12) S0=maxd=1,,D|S0d|, with S0d=ζ̂dζdσ̂(ζ̂d),d[D],(12) (13) qS0(1α)=inf{tR:P(S0  t)  1α},(13) with σ̂(ζ̂d) being an estimate of the variability of EBP ζ̂d. We then consider(14) α=P(|ζ̂dζd|>qS0(1α)σ̂(ζ̂d) for  some d[D])=P(maxd=1,,D|ζ̂dζdσ̂(ζ̂d)|>qS0(1α)).(14)

Constructing SCI boils down to the estimation of qS0(1α), as one can define then(15) I1αS=×d=1DId,1αS,withId,1αS={ζ̂d±qS0(1α)×σ̂(ζ̂d)},(15) where × denotes a generalized Cartesian product. I1αS covers all ζd with probability 1α, that is, its joint confidence level is 1α. In contrast, for each qS0d(1α) defined analogously to qS0(1α), with S0 replaced by |S0d|, individual area CI (iCI) are given by(16) Id,1αiCI={ζ̂d±qS0d(1α)×σ̂(ζ̂d)}d[D].(16)

By construction, iCI does not contain ζd for at least 100α% of all areas.

Remark 1.

Id,1αiCI is designed to cover ζd at an individual confidence level. Consequently, the joint coverage probability of iCIs decreases in a cumulative way for increasing D. This highlights the need to construct SCI. Nevertheless, maintaining 1α simultaneous confidence level of SCI I1αS makes its constituents Id,1αS wider than corresponding iCIs Id,1αiCI. This is not surprising because Id,1αiCI and Id,1αS were constructed to cover different sets which serve distinct inferential purposes. It is worth mentioning that the length of Id,1αS stabilizes as for growing D we observe two opposite trends: the increase of area parameters to cover and the decrease of MSE (see and ).

Table 1 ECP, AIW, and AIWV of SCI under area-level models.

Table 3 ECP, AIW and AIWV of 95% SCI under the unit-level model.

The SCI defined in EquationEquation (15) is not operational as the distribution of S0 is unknown. The problem can be circumvented by bootstrap approximation: for b1=1,,B1 set(17) SB(b1)=maxd=1,,D|SBd(b1)|,SBd(b1)=ζ̂d*(b1)ζd*(b1)σ̂*(b1)(ζ̂d*),(17) and approximate the critical value qSB(1α)=inf{tR:P(SB  t|(y,X))  1α}, by a [(1α)B1+1]th-order statistic of the SB(b1). Then the bootstrap equivalent of EquationEquation (15) is(18) I1αB=×d=1DId,1αB,whereId,1αB={ζ̂d±qSB(1α)×σ̂(ζ̂d)}.(18)

An alternative approach to EquationEquation (12) could be to take computationally simpler nonstudentized statistics. Yet, already DiCiccio and Efron (Citation1996) pointed out that the lack of studentization results in slower convergence rates. Since the application of nonstudentized SCI did not yield satisfactory results, we decided not to include them.

Our methodology is also applicable for hypothesis testing. Consider the test problem(19) H0:=bvs.H1:b,(19)

where BRD×D,D  D,bRD. We are interested in max-type statistic tH such that(20) tH=maxd=1,,D|tHd|,tHd=ζ̂dHbdσ̂(ζ̂dH),SH0=maxd=1,,D|SH0d|,SH0d=ζ̂dHζdHσ̂(ζ̂dH),(20) where ζH=(ζ1H,,ζDH)t=RD with ζ̂H being its estimator. One rejects H0 at the α-level if tH  qH0(1α) with qH0(1α)=inf{tR:P(SH0  t)  1α}.

In practice, we might use such a test to examine differences between area characteristics. Similarly as for SCI, we approximate qH0(1α) applying bootstrap to a modified version of statistic SB, namely(21) qBH0(1α)=inf{tR:P(SBH0  t|(y,X))  1α},(21) where SBH0 in the b1th bootstrap sample is(22) SBH0(b1)=maxd=1,,D|SBH0d(b1)|,SBH0d(b1)=ζ̂d*H(b1)ζd*H(b1)σ̂*(ζ̂d*H(b1)),(22) with ζ*H(b1)=(ζ1*H(b1),,ζD*H(b1))=Bζ*(b1)RD and ζ̂*H(b1)=(ζ̂1*H(b1),,ζ̂D*H(b1)) its corresponding estimated version.

We provide the consistency of our bootstrap-based CI and tests, as well as asymptotic convergence and coverage probability. Proofs are deferred to Appendix A.2 and A.3. Suppose that θ̂ is consistent such that ||θ̂θ||=OP(nc), c > 0. Since for the GLMM with clustered random effects the log-likelihood can be expressed as the sum of independent random components, the consistency of θ̂ estimated by ML follows assuming a classical theory. The consistency under a general GLMM had been an open problem for many years until it was solved by Jiang (Citation2013). Bianconcini (Citation2014) and Huber, Ronchetti, and Victoria-Feser (Citation2004) investigated the consistency of θ̂ once we compute it using AGQ and LA respectively. For our purpose, we need to prove the bootstrap consistency

Proposition 2.

Under Assumptions 1–5 from Appendix A.1 it holds thatE*(ydj*)E(ydj)=oP*(1),Var*(yd*)Var(yd)=[oP*(1)]nd×nd,||θ̂*θ̂||=OP*(nc).

Given Proposition 2, we can derive the consistency of I1αB based on results from extreme value theory and asymptotic expansions of the standardized statistics using ideas from Chatterjee, Lahiri, and Li (2008). Let us assume σ̂(ζ̂d)=ĝ1d(ζ̂d), though similar results are immediate for σ̂(ζ̂d)=mse(·)(μ̂d(·)) where (·) stands for different types of estimators. We use q:=qS0(1α) where unambiguous, and denote the cumulative distribution function (cdf) of S0d and SBd by Gd(w)=P(S0d  w) and GBd(w)=P(SBd  w). In Appendix A.3 we provide asymptotic expansions for both. Define (S0(d+1),S0(2D))=(S01,,S0D), and observe that maxd=1,,D|S0d|=maxd=1,,2D(S01,,S0D,S01,,S0D). From EquationEquation (14), we have(23) TD(q)=P(S0  q)=P(S01  q,,S0D  q,S01  q,,S0D  q)=d=12DGd(q).(23)

As D, unless standardized, the distribution in EquationEquation (23) converges to 0 or 1. In Appendix A.3, we show that Gd(w) is asymptotically normal, such that P(S0  q)Φ2D(q). Since the cdf of the maxima of the standard normal random variables is in the domain of attraction of the Gumbel law, it follows that limDΦ2D(q/bD+bD)=exp(exp(q))=T0(q), for all qR where bD is a sequence of constants (see Leadbetter, Lindgren, and Rootzén Citation2012, theor.1.5.3). Unfortunately, this approximation has a poor convergence rate, but bootstrap is again a remedy. Notice that a similar representation holds for SB, substituting P with P* and replacing the true parameters by their estimates. Application of Poyla’s theorem that combines the convergence in distribution with a convergence in sup norm results in our next proposition.

Proposition 3.

Define TD*(w)=P*(SB  q) which is a bootstrap analogue of TD(w) in EquationEquation (23). Under Assumptions 1–5 from Appendix A.1 it holds thatsupwR|TD(w)TD*(w)|=oP(1).

Corollary 1.

Proposition 3 implies that under the same assumptions,P(ζdI1αB d[D])1α.

Since we use almost identical max-type statistics in EquationEquations (12) and Equation(20), the construction of MTP follows almost immediately from the correspondence between tests and CI. In fact, the acceptance region of our test is I1αH0=×d=1DId,1αH0, whereId,1αH0={ζ̂dqS0(1α)×σ̂(ζ̂d)bdζ̂d+qS0(1α)×σ̂(ζ̂d)},that is, we reject H0 if bI1αH0. We can write P(hdI1αH0d[D])=1α. Since this probability statement is true for any hd=ζd, we obtain the CI defined in EquationEquation (15) by inverting the test.

Corollary 2.

Let H0 be the null hypothesis defined in EquationEquation (19) and α(0,1). Under Proposition 3, we haveP(tH>qBH0)  α+o(1).

Remark 2.

Our single-step testing procedure in EquationEquation (19) with a bootstrap critical value in EquationEquation (21) controls weakly for the family-wise error rate (FWER), and might be limited in detecting false null hypotheses once we deal with a large D. Yet, we can readily extend our test to a bootstrap-based step-down procedure of Romano, Shaikh, and Wolf (Citation2008) which controls the false discovery rate with a better power to detect false H0d than FWER.

4 Empirical Reliability Study

We performed intensive simulation studies to assess the reliability of our methods. SCI and MTP for EBP were constructed with different estimators of variability under the models presented in Sections 2.2.1–2.2.3. First, we examined the relative bias and relative root-MSE of fixed effects β̂ and variability parameter δ̂. Then, the performance of EBP was evaluated comparing bias, average absolute bias and MSE for D=26,52,and78. Since they did not show any atypical pattern, the results under Poisson area-level models and logistic unit-level model were deferred to the SM. Regarding SCIs, we calculated empirical coverage probability (ECP), average interval width (AIW), and the AIW variation (AIWV):ECP=1Kk=1K1{ζd(k)I1αS  d[D]},AIW=1DKd=1Dk=1Kωd(k),ωd(k)=2q(·)(1α)(k)σ̂(k)(ζ̂d),AIWV=1D(K1)d=1Dk=1K(ωd(k)ω¯d)2,ω¯d=1Kk=1Kωd(k),d=1,D.

For each simulation run k we record the widths of the SCI and check whether they cover all EBPs. ECP is then computed by averaging over K simulation runs and is aimed to be close to 1α. AIW is obtained by averaging over the simulation runs and areas. Narrower intervals are preferable if its ECP is close to the nominal level. These are standard measures to assess the quality of interval estimators (Chatterjee, Lahiri, and Li 2008; Ganesh Citation2009). Lower AIWV values indicate that the length is stable and does not depend on the simulation run.

4.1 Finite Sample Performance Under Area-Level Models

Under the Poisson-gamma model we set ydPoiss(μdPL),μdPL=λdwd. Covariates, parameters and sample sizes are taken from our case study in Section 5, that is, we set θ=(βt,δ)=(10.038,7.747,3.136,11.317,2.466,2.480)t, and D={26,52,78}, nd = 1, d[D], n = D. For D = 52 we take covariates from the original sample, for D = 26 we randomly select the areas using simple random sampling without replacement, and for D = 78, we take the original sample plus 26 randomly selected areas, that is, these areas enter at most twice. Parameter of interest is the area proportion of individuals below the poverty line, μ¯dPL=μdPL/Nd. The EBP for μdPL is given in EquationEquation (7). Since Nd is usually unknown, in practice it is replaced by its estimate, see EquationEquation (25) in Section 5. We apply double bootstrap with B1=1000 first-stage and B2=1 second-stage bootstrap replicates (the choice of the latter is motivated by Erciulescu and Fuller Citation2014). We generate K = 1000 samples with the same areas and fixed covariates, but randomly drawn wd and yd. SCIs and iCIs are constructed as follows:

  1. Fit the model to the data and obtain estimates θ̂=(β̂,δ̂).

  2. For b1=1,,B1 bootstrap samples, generate wd*(b1)Gamma(δ̂,δ̂) iid and setμdPG*(b1)=λ̂dwd*(b1)andyd*(b1)Poisson(μdPG*(b1)).

  3. For each bootstrap sample calculate θ̂*(b1),μ̂dPG*(b1)(θ̂*(b1)), ADPG,d(b1)=|μ̂dPG*(b1)μdPG*(b1)|.

  4. For b2=1,,B2 generate samples wd**(b1,b2)Gamma(δ̂*(b1),δ̂*(b1)) iid andμdPG**(b1,b2)=λ̂d*(b1)wd**(b1,b2),yd**(b1,b2)Poisson(μdPG**(b1,b2)).

  5. For each bootstrap sample calculate θ̂**(b1,b2) and μ̂dPG**(b1,b2)(θ̂**(b1,b2)).

  6. Set msed(b1)=1B2b2=1B2(μ̂dPG**(b1,b2)μdPG**(b1,b2))2.

  7. Calculate bootstrap estimates ĝPG1d(θ̂*(b1)) as in EquationEquation (9) as well asmseB(μ̂dPG)=1B1b1=1B1(μ̂dPG*(b1)μdPG*(b1))2,mseBC(μ̂dPG)=2mseB(μ̂dPG)1B1b1=1B1msed(b1).

  8. Calculate statistic SPG,B with the critical value qPG,SB(1α) obtained from the bootstrap sample SPG,B=(SPG,B(1),SPG,B(B1))t, whereSPG,B(b1)=maxd=1,,DADPG,d*(b1)/σ̂*(b1)(μ̂dPG*(b1))andqPG,SB(1α)=Q1α(SPG,B)

as well as a variance estimate for θ̂:(24) var̂(θ̂)=1B1b1=1B1(θ̂*(b1)θ¯)(θ̂*(b1)θ¯)t withθ¯=1B1b1=1B1θ̂*(b1).(24)

We compare the performance of SCI and MTP for different variability estimates σ̂(μ̂dPG) and their bootstrap equivalents σ̂*(μ̂d*PG), namely for σ̂(μ̂dPG)=ĝPG1d and σ̂(μ̂dPG)=mse(·)(μ̂dPG). Here, mse(·) refers to either the plug-in mseP, the mseB or the mseBC, defined in EquationEquations (8), Equation(4), and Equation(5). Steps 3(a)–(c) of the algorithm refer to the second-stage bootstrap which is only necessary to obtain bias-corrected mseBC. Under the Poisson-gamma model, we are interested in the estimation of poverty rates. We thus consider μ¯̂dPG=μ̂dPG/Nd, g¯̂PG1d=ĝPG1d/Nd2 and mse(·)(μ¯̂dPG)=mse(·)(μ¯̂dPG)/Nd2.

For the Poisson-lognormal model with ydPoisson(μdPL),μdPL=νdρd, the parameter of interest is ρd with Nd=νd estimated by EBP derived by Boubeta, Lombardía, and Morales (Citation2016). We take the fixed parameters from Section 5, that is, (βt,δ)=(2.264,3.480,0.870,4.842,0.125,0.322)t. Covariates, sample sizes, number of simulation runs and bootstrap replicates are the same as in case of the Poisson-gamma model. The variability of ρ̂d was estimated using bootstrap MSEs, that is mseB and mseBC. To obtain estimates of SCI and iCI one can use almost the same algorithm as above by changing the way we generate yd*(b1).

summarizes the performance of 95% SCI for μ¯̂dPG constructed with mseB (B), mseBC (BC), plug-in mseP (P) and ĝPG1d (G). For ρ̂d, they were constructed using mseB (B) and mseBC (BC). All methods perform very well regarding the coverage ECP, even for D = 26. In contrast, SCIs constructed using a Bonferroni procedure yield unacceptably low ECP. For instance, for D = 52 and mseB it equals 78% for the Poisson-gamma, and 88% for the Poisson-lognormal model. Therefore, we do not further report them.

presents 95% SCI and iCI estimates for a randomly selected simulation under the Poisson-gamma model. The plot is divided into five panels according to the number of units N̂ddir defined in EquationEquation (25) with the first presenting the results for the areas with the fewest observations. The black and red dots represent the true proportions known in a simulation. The color red indicates true parameters not covered by theirs iCIs. In , that holds for four of the true values (7.7%). This illustrates well the difference between individual and simultaneous inference as well as a particular relevance of the latter. We obtain similar figures for other simulations (see our SM).

Fig. 1 95% iCI and SCI for proportions with D = 52. Red dots indicate true parameters outside iCI, whereas black dots indicate true parameters inside their iCI.

Fig. 1 95% iCI and SCI for proportions with D = 52. Red dots indicate true parameters outside iCI, whereas black dots indicate true parameters inside their iCI.

Finally, we studied the performance of our test (19) under Poisson-gamma and Poisson-lognormal models. Results for the latter are in our SM as they reveal the same features. Consider H0:μ¯PG=b vs. H1:μ¯PG=b+1DΔ, where b:=μ¯ for the same data-generating processes as before. Critical values are obtained from the bootstrap analogues of SH0 calculated similarly as in Step 5 of the algorithm above. shows the power functions of our test based on different variability estimates. They are visibly indistinguishable, which is not surprising given the similar ECPs and AIWs in . For D = 52, that is, the sample size of the real data, the nominal level of 5% is attained almost exactly under H0.

Fig. 2 Simulated powers for multiple test H0:μ¯PG=b versus H1:μ¯PG=b+1DΔ under the area-level Poisson-gamma model; (left) D = 26, (middle) D = 52, (right) D = 78.

Fig. 2 Simulated powers for multiple test H0:μ¯PG=b versus H1:μ¯PG=b+1DΔ under the area-level Poisson-gamma model; (left) D = 26, (middle) D = 52, (right) D = 78.

4.2 Finite Sample Performance Under the Unit-Level Model

Under the unit-level model we assume ydjBin(mdj,pdj) with pdj={exp(xdjtβ+δud)}/{1+exp(xdjtβ+δud)}, mdj = 1, udN(0,1). In our context, ydj is binary and value 1 indicates an individual below the poverty threshold defined in Section 5. The regression parameters are taken from our case study: θ=(βt,δ)=(2.048,0.989,0.172,0.760,0.100,0.348)t. Four categorical covariates result in 16 covariate classes xdj{z1,,z16} for which we need to estimate Ndl using EquationEquation (25), l=1,,16. We considered D={26,52,78} containing unit-level information with n={11423,23628,35818}, respectively. Summary statistics for all samples are presented in . Furthermore, for D = 52, nd, Nd, xdj, zl are the same as in our case study. The areas for D={26,78} were selected in the same way as in Section 4.1. In addition, for D = 78, within each of the additional area we sampled with replacement nd units (i.e., 26 newly sampled areas contained different units in comparison to the original sample). The parameter of interest is the area poverty proportion μ¯dU defined in EquationEquation (11). Given that the original sample size was n = 23, 628, under the unit-level model we restrict our simulations to K = 200, B1=500, and B2=1. As far as the algorithm for constructing SCI and iCI is concerned, it follows almost the same steps as in Section 4.1. The exact algorithm can be found in the SM.

Table 2 Summary statistics of nd under different scenarios in the simulation study under the unit-level model.

presents the performance of SCI constructed using mseB (B) and mseBC (BC). The coverage probability is somewhat lower than the nominal level. In addition, it slightly decreases with increasing D, whereas the AIW increase stabilizes as expected (see Remark 1). The undercoverage might be related to the simulation design. Even though the latter is popular in SAE, it is suboptimal for random effects from the asymptotic point of view (nd,d[D], recall Section 2.1). The results in do not demonstrate any inconsistencies with respect to the theoretical developments, nor they exhibit unexpected findings. Due to their limited impact, the equivalents of and for this simulation are deferred to our SM. In comparison to the area-level models, the coverage probability is worse and the average width of SCIs is much larger (it is also the case for the iCI, see our SM). Moreover, fitting unit-level models is computationally more expensive. In our case the estimation of MSE and construction of intervals took about 900–1000 times longer. Since the data-generation processes are different, the numerical results in our simulations are not directly comparable. However, our empirical studies suggest to give some preference to the area-level modeling in the considered GLMM settings.

Our simulations lead us to following conclusions. First, for a given sample size and data, our SCI attains the nominal coverage probability, almost independently from the choice of the estimator of variability. In particular, the area-level models yield very accurate results even for small samples. Second, the distinction between SCI and iCI is crucial, and the latter should not be employed in comparative studies. Third, the numerical performance of our test for comparative studies is satisfying. Given the simplicity of SCI and tests based on ĝ1d, we restrict further presentations to them.

Remark 3.

In our simulation study, we do not analyze the performance of direct estimators for proportions, because our goal is to study the numerical performance of our MTP and SCIs, and to compare them to existing iCIs. Since MTP and SCI are the first tools for simultaneous inference with GLMM-based mixed parameter, we concentrate on their implementation and application to the well-known model-based estimators. These have been thoroughly examined in comparative analyses which included direct estimators (see for instance Boubeta, Lombardía, and Morales Citation2016; Hobza and Morales Citation2016). In our case study in Section 5, we include direct estimators in order to have an almost model-free benchmark.

5 Predicting Poverty Rates in Galicia

Poverty prediction is of great interest for statistical offices. It provides a basis on which local or central authorities can decide about resource allocation and related polices. The interest is not in individual, randomly chosen small areas but in the total picture. Resource distribution requires comparative statistics, and one would thus provide SCI instead of iCI. We illustrate our methodology calculating point estimates, iCIs and SCIs for the poverty rates in each county of Galicia, that is, the proportions of inhabitants who live under a poverty line. We make use of a general part of the Structural Survey for Homes (SSH) in Galicia in 2015 with 23,628 individuals within 9203 households located in 52 counties (small areas). The survey does not produce official estimates at the area level, but we managed to recover the direct estimates of the totals of people below the poverty line (Yd), as well as the number of inhabitants (Nd) for each county. For the area-level models, we need to calculate the number of units which fall into a particular category (Xdi), for example, number of employees or of graduates in each county of Galicia, i=1,,p. The latter are used to obtain the proportions of individuals in each category X¯di=Xdi/Nd. For the unit-level model, we need to obtain the number of units Ndl falling into artificially created categories zdl,d=1,,D,l=1,,L, see Section 2.2.3. The explicit formulas are(25) Ŷddir=jRdwjyj,N̂ddir=jRdwj,N̂dldir=jRdwjxj1{xj=zl},X̂didir=jRdwjxji,andX¯̂didir=X̂didir/N̂ddir,(25) where RdPd are the sample elements belonging to area d, d[D], wdj sampling weights, and ydj binary variables with 1 indicating that an individual is below the poverty line. The poverty threshold is calculated from the survey. It is set to 0.6 of the median household income per capita in Galicia, that is, we do not use county specific poverty lines. This income is calculated in each household according to scale developed by the Organisation for Economic Co-operation and Development (the same technique is used by Eurostat). The model-based approach of this paper assumes that the estimates in EquationEquation (25) are considered to be known, nonrandom quantities, following López-Vizcaíno, Lombardía, and Morales (Citation2015). SSH provides many categorical, auxiliary variables. Under the unit-level model these are binary variables with 1 indicating that a person belongs to a particular category, whereas under area-level models we use the county proportions. We considered four variables for labor status: children (ls0), employed (ls1), unemployed (ls2), inactive (ls3), and four covariates for education: less than primary (ed0), primary (ed1), first- and second-level secondary (ed2), higher education (ed3). Furthermore, we analyzed three variables for the size of municipality: less than 10,000 (sm1), 10,000–50,000 (sm2), more than 50,000 (sm3). We have also investigated the effect of two variables indicating the nationality, that is, Spanish (n1), not Spanish (n2). Finally, we examined five age variables: < 15 (age1), 1524 (age2), 2549 (age3), 5064 (age4), >=65 (age5). We are interested in μ¯d(·):=μd(·)/Nd with (·) standing for PG or U in case of Poisson-gamma and binomial model, respectively, and in ρd in case of the Poisson-lognormal model. We first compute estimates of proportions and their variances using the same formulas as Boubeta, Lombardía, and Morales (Citation2016)(26) p̂ddir=Y¯̂ddir=Y¯ddirN̂ddir,var̂(p̂ddir)=1(N̂ddir)2jRdwj(1wj)(yjp̂ddir)2.(26)

We used estimates in EquationEquation (26) to construct design-based iCI intervals (Dir) displayed in . Following López-Vizcaíno, Lombardía, and Morales (Citation2015), we then proceed with a variable selection inspired by the simulation results. More specifically, under the Poisson-gamma model we check if any of the levels of categorical variables for labor status, education and age are significant at the α=0.05 level. We examined these covariates in the first place, because they turned out to be important in earlier studies on poverty rates (see, for instance, Boubeta, Lombardía, and Morales Citation2016). In this way, we selected ls2, ed2, and age2. Afterwards, we tested the levels of variables nationality and the size of the municipality and we additionally retained sm1 which was significant after the selection of ls2, ed2, and age2. The same categories were then used to other models, see . As we do not carry out a causality analysis, we refrain ourselves from a discussion of the magnitude or signs of estimates. We only notice that under the Poisson-gamma model, the signs are consistent with our expectations; unemployment and young age are associated with higher poverty rates, whereas higher level of studies or living in a small municipality is associated with lower poverty rates.

Fig. 3 Design and model-based 95% iCIs.

Fig. 3 Design and model-based 95% iCIs.

Table 4 Estimates of regression parameters under the area- and the unit-level models with δ̂PG=2.48,δ̂PL=0.32 and δ̂U=0.35, respectively.

shows point and iCI estimates of proportions under Poisson-gamma (PG), Poisson-lognormal (PL), and binomial (Unit) models together with direct estimates (Dir). In this plot, we compare point estimates within four modeling frameworks; we do not compare them across different areas within the same model. First, the variability reflected by the width of iCIs decreases with the number of units in each area N̂ddir defined in EquationEquation (25). Second, even though the area sample sizes nd, d[D] are not that small, the iCI of direct estimates are wider than model-based estimates, which is in accordance with the literature. The width difference is especially pronounced when comparing area-level-based with design-based direct estimates—the latter entirely cover the former. Unit-level model-based point and interval estimates are different with much wider iCIs than under area-level models, but still overlapping with the direct estimates. Only in one case (sixth area in the third panel), the iCIs under area-level models do not overlap with the iCI under the unit-level model which indicates a possible bias in one of the approaches. In contrast, both area-level models produce almost identical estimates.

presents bootstrap iCI and SCI for μ¯̂dPG,d=1,,D constructed with σ̂(ζ̂d)=ĝPG1d as defined in EquationEquation (9). The plot is divided into five panels according to the numbers of units in each area obtained by direct estimates of county inhabitants N̂ddir in (25). serves as an illustration of the differences between individual and simultaneous inference. When comparing iCI and SCI, in many cases (e.g., first and second county of the first panel in ) iCI would insinuate statistically different poverty rates, whereas SCI does not confirm this claim. Such multiple comparisons are valid only if we use SCI. In addition, at least 5% of the true poverty rates are not contained in their iCIs. Analogous figures under the Poisson-lognormal and binomial models lead to the same conclusions. They are thus deferred to the SM. Further model selection and specification testing might be interesting but they are beyond the scope of this article.

Fig. 4 95% bootstrap iCI and SCI estimates for poverty rates in counties of Galicia.

Fig. 4 95% bootstrap iCI and SCI estimates for poverty rates in counties of Galicia.

Since we do not know which model is closer to the real data-generating process, we proceed with the Poisson-gamma area-level model, as it is reliable and the least computer intensive. Left and middle panel of depict the resulting maps of the counties with the corresponding lower and upper bounds of our SCI. We observe a higher rate of poverty in the interior and a south-western part of the region whereas a lower level is typical for the northern part. These conclusions are similar to those drawn by Boubeta, Lombardía, and Morales (Citation2017).

Fig. 5 SCI of EBP poverty proportions: (left) lower boundary, (middle) upper boundary; (right) significant differences in poverty rates between women (F) and men (M).

Fig. 5 SCI of EBP poverty proportions: (left) lower boundary, (middle) upper boundary; (right) significant differences in poverty rates between women (F) and men (M).

Finally, we investigate whether men and women are equally affected by poverty. We wish to test for equality on the county level across Galicia. Testing for each county individually at α=5% error level results in rejection of at least 5% of the hypotheses of no significant difference. We thus use our MTP and consider clusters created from the cross section of sex and county such that ζR104. We test H0:=052 vs. H1:052 where BR52×104 with rows being vectors with 1 on the 2d1 place, –1 on 2d place, and 0 elsewhere. The max-type test statistic yields tH=maxd=1,,104|Bζ̂|/σ̂(ζ̂)20.489 while the bootstrap critical value under H0 is qBH0(1α)2.999. Thus, we strongly reject H0. However, our test does not support the hypothesis that women are more affected than men, or vice versa, see the right panel of . Additional results are deferred to our SM.

Remark 4.

Imagine that Galician counties were considered as a part of a macro region, for example, Spain with DS counties, and consider two inferential problems: (a) the calculation of SCI for the poverty rates in all DS Spanish counties, (b) the calculation of SCI only for D Galician counties, but using all data. Following Remark 1, we expect that the widths of our SCI in would increase in case (a) to maintain the joint coverage probability of 95% for all DS>D counties. In contrast, they would most likely slightly decrease in case (b). In fact, the simultaneous coverage probability of 95% would be requested for the set of D counties, but SCI would be constructed using a more precise estimate of MSE computed using a larger dataset with DS counties.

6 Conclusions

We developed a methodology that allows for statistically valid simultaneous inference for EBP under GLMM. We constructed SCI and MTP applying a combination of max-type statistics and consistent bootstrap estimation of its distribution. These tools enable practitioners to make comparisons between areas. In contrast, the iCIs are not suitable for such comparative analyses because they are constructed at individual confidence level and disregard an additional variation which arises in joint studies. We do not claim that SCI and MTP are better than iCIs or t-tests. The former simply complete the toolbox for statistical inference for mixed parameter ζd. Similarly, the simultaneous inference completes the individual inference for fixed parameters. We introduced various versions of statistics to construct SCI and MTP. Within our framework, all of them exhibited similar performances without indicating a clear winner.

Our methodology can be extended to more complicated data structures such as GLMM with spatial or temporal correlation (see, e.g., Hobza, Morales, and Santamaría Citation2018; Chandra, Chambers, and Salvati Citation2019). One could also consider spatio-temporal or nonparametric models to build SCI by adjusting the statistic S0 and choosing a bootstrap procedure accordingly. Apart from a mathematical challenge to develop a valid asymptotic theory, these extensions would require a construction of an appropriate bootstrap scheme and its computationally efficient implementation.

Supplemental material

Supplemental Material

Download Zip (601.1 KB)

Supplementary Materials

The supplementary materials consist of: (a) a document with further developments, in particular additional MSE decomposition, a proof of Proposition 1, the derivations of estimators under area-level Poisson and unit-level binomial models, additional numerical results and a data analysis which completes the case study in Section 5, (b) codes for replicating the results in the main document, and (c) a document which contains additional information on the data set and the description of the codes.

Ministerio de Econom?a y Competitividad (Spain);

Additional information

Funding

The authors gratefully acknowledge the support from the Swiss National Science Foundation for the project 200021-192345. In addition, they acknowledge the support from the MINECO grants MTM2017-82724-R and MTM2014-52876-R, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF. The computations were performed at the University of Geneva on the Baobab cluster.

References

  • Bianconcini, S. (2014), “Asymptotic Properties of Adaptive Maximum Likelihood Estimators in Latent Variable Models,” Bernoulli, 20, 1507–1531. DOI: 10.3150/13-BEJ531.
  • Boubeta, M., Lombardía, M. J., and Morales, D. (2016), “Empirical Best Prediction Under Area-Level Poisson Mixed Models,” Test, 25, 548–569. DOI: 10.1007/s11749-015-0469-8.
  • Boubeta, M., Lombardía, M. J., and Morales, D. (2017), “Poisson Mixed Models for Studying the Poverty in Small Areas,” Computational Statistics & Data Analysis, 107, 32–47.
  • Breslow, N. E., and Clayton, D. G. (1993), “Approximate Inference in Generalized Linear Mixed Models,” Journal of the American Statistical Association, 88, 9–25.
  • Cameron, A. C., and Trivedi, P. K. (2013), Regression Analysis of Count Data, Cambridge: Cambridge University Press.
  • Chambers, R., Salvati, N., and Tzavidis, N. (2012), “M-Quantile Regression for Binary Data With Application to Small Area Estimation,” Centre for Statistical and Survey Methodology, University of Wollongong, Working Paper, 1–24.
  • Chandra, H., Chambers, R., and Salvati, N. (2012), “Small Area Estimation of Proportions in Business Surveys,” Journal of Statistical Computation and Simulation, 82, 783–795. DOI: 10.1080/00949655.2011.554834.
  • Chandra, H., Chambers, R., and Salvati, N. (2019), “Small Area Estimation of Survey Weighted Counts Under Aggregated Level Spatial Model,” Survey Methodology, 45, 31–59.
  • Chatterjee, S., Lahiri, P., and Li, H. (2008), “Parametric Bootstrap Approximation to the Distribution of EBLUP and Related Prediction Intervals in Linear Mixed Models,” Annals of Statistics, 36, 1221–1245.
  • Chen, S., Jiang, J., and Nguyen, T. (2015), “Observed Best Prediction for Small Area Counts,” Journal of Survey Statistics and Methodology, 3, 136–161. DOI: 10.1093/jssam/smv001.
  • De Bruijn, N. G. (1981), Asymptotic Methods in Analysis, New York: Dover Publications, Inc.
  • DiCiccio, T. J., and Efron, B. (1996), “Bootstrap Confidence Intervals,” Statistical Science, 11, 189–228. DOI: 10.1214/ss/1032280214.
  • Erciulescu, A. L., and Fuller, W. A. (2014), “Parametric Bootstrap Procedures for Small Area Prediction Variance,” in Proceedings of the Joint Statistical Meeting-Survey Research Methods Section, Boston, pp. 3307–3318.
  • Franco, C., and Bell, W. R. (2015), “Borrowing Information Over Time in Binomial/Logit Normal Models for Small Area Estimation,” Statistics in Transition New Series, 16, 563–584. DOI: 10.21307/stattrans-2015-033.
  • Ganesh, N. (2009), “Simultaneous Credible Intervals for Small Area Estimation Problems,” Journal of Multivariate Analysis, 100, 1610–1621. DOI: 10.1016/j.jmva.2009.01.009.
  • Ghosh, M., Natarajan, K., Stroud, T. W. F., and Carlin, B. P. (1998), “Generalized Linear Models for Small-Area Estimation,” Journal of the American Statistical Association, 93, 273–282. DOI: 10.1080/01621459.1998.10474108.
  • Hall, P., and Maiti, T. (2006), “On Parametric Bootstrap Methods for Small Area Prediction,” Journal of the Royal Statistical Society, Series B, 68, 221–238. DOI: 10.1111/j.1467-9868.2006.00541.x.
  • Hobza, T., and Morales, D. (2016), “Empirical Best Prediction Under Unit-Level Logit Mixed Models,” Journal of Official Statistics, 32, 661–692. DOI: 10.1515/jos-2016-0034.
  • Hobza, T., Morales, D., and Santamaría, L. (2018), “Small Area Estimation of Poverty Proportions Under Unit-Level Temporal Binomial-Logit Mixed Models,” Test, 27, 270–294. DOI: 10.1007/s11749-017-0545-3.
  • Huber, P., Ronchetti, E., and Victoria-Feser, M.-P. (2004), “Estimation of Generalized Linear Latent Variable Models,” Journal of the Royal Statistical Society, Series B, 66, 893–908. DOI: 10.1111/j.1467-9868.2004.05627.x.
  • Jiang, J. (1998), “Consistent Estimators in Generalized Linear Mixed Models,” Journal of American Statistical Association, 93, 720–729. DOI: 10.1080/01621459.1998.10473724.
  • Jiang, J. (2003), “Empirical Best Prediction for Small-Area Inference Based on Generalized Linear Mixed Models,” Journal of Statistical Planning Inference, 111, 117–127.
  • Jiang, J. (2007), Linear and Generalized Linear Mixed Models and Their Applications, Springer Science & Business Media, New York: Springer.
  • Jiang, J. (2013), “The Subset Argument and Consistency of MLE in GLMM: Answer to an Open Problem and Beyond,” Annals of Statistics, 41, 177–195.
  • Jiang, J., and Lahiri, P. (2001), “Empirical Best Prediction for Small Area Inference With Binary Data,” Annals of the Institute of Statistical Mathematics, 53, 217–243. DOI: 10.1023/A:1012410420337.
  • Kramlinger, P., Krivobokova, T., and Sperlich, S. (2018), “Marginal and Conditional Multiple Inference in Linear Mixed Models,” arXiv:1812.09250.
  • Leadbetter, M. R., Lindgren, G., and Rootzén, H. (2012), Extremes and Related Properties of Random Sequences and Processes, Springer Science & Business Media. New York: Springer.
  • Lele, S. R., Nadeem, K., and Schmuland, B. (2010), “Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning,” Journal of the American Statistical Association, 105, 1617–1625. DOI: 10.1198/jasa.2010.tm09757.
  • López-Vizcaíno, E., Lombardía, M. J., and Morales, D. (2015), “Small Area Estimation of Labour Force Indicators Under a Multinomial Model With Correlated Time and Area Effects,” Journal of Royal of Statistical Society, Series A, 178, 535–565. DOI: 10.1111/rssa.12085.
  • Molina, I., Saei, A., and Lombardía, M. J. (2007), “Small Area Estimates of Labour Force Participation Under a Multinomial Logit Mixed Model,” Journal of the Royal Statistical Society, Series A, 170, 975–1000. DOI: 10.1111/j.1467-985X.2007.00493.x.
  • Naylor, J. C., and Smith, A. F. (1982),“Applications of a Method for the Efficient Computation of Posterior Distributions,” Journal of Royal Statistical Society, Series C, 31, 214–225. DOI: 10.2307/2347995.
  • Pinheiro, J. C., and Bates, D. M. (1995), “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model,” Journal of Computational and Graphical Statistics, 4, 12–35.
  • Pratesi, M. (2016), Analysis of Poverty Data by Small Area Estimation, New Jersey: Wiley.
  • Reluga, K., Lombardía, M. J., and Sperlich, S. A. (2019), “Simultaneous Inference for Mixed and Small Area Parameters,” arXiv:1903.02774.
  • Romano, J. P., Shaikh, A. M., and Wolf, M. (2008), “Control of the False Discovery Rate Under Dependence Using the Bootstrap and Subsampling,” Test, 17, 417–442. DOI: 10.1007/s11749-008-0126-6.
  • Saei, A., and Taylor, A. (2012), “Labour Force Status Estimates Under a Bivariate Random Components Model,” Journal of the Indian Society of Agricultural Statistics, 66, 187–201.
  • Scealy, J. (2010), “Small Area Estimation Using a Multinomial Logit Mixed Model With Category Specific Random Effects,” Research Paper, Australian Bureau of Statistics.
  • Torabi, M. (2012), “Likelihood Inference in Generalized Linear Mixed Models With Two Components of Dispersion Using Data Cloning,” Computational Statistics & Data Analysis, 56, 4259–4265.
  • Tuerlinckx, F., Rijmen, F., Verbeke, G., and De Boeck, P. (2006), “Statistical Inference in Generalized Linear Mixed Models: A Review,” British Journal of Mathematical and Statistical Psychology, 59, 225–255. DOI: 10.1348/000711005X79857.
  • Tzavidis, N., Ranalli, M. G., Salvati, N., Dreassi, E., and Chambers, R. (2015), “Robust Small Area Prediction for Counts,” Statistical Methods in Medical Research, 24, 373–395. DOI: 10.1177/0962280214520731.

Appendix A:

Technical details

A.1 Regularity conditions

In this section, we state the regularity conditions used in our derivations.

  1. ûd=arg maxudR{loggd(yd|ud,θ)+logh(ud)}.

  2. l(θ) exists and is well-defined if: (a) l(θ) is continuous, uniquely maximized and θ0Θ, where θ0 is a true parameter value; (b) l(θ) and l̂(θ) are concave; (c) θ0 is an interior point of the parameter space and the estimator θ̂ is an interior point of the neighborhood of θ0; (d) l̂(θ) converges uniformly in probability to l(θ).

  3. xdj are bounded and E(ydjm)< for all d[D],j[nd], where m is suitable large.

  4. For each fixed y, a score equation is continuously differentiable and E{R(θ0)}=0.

  5. liminfnλ[n1Var{R(θ)}]>0 and liminfnλ[n1E{R(θ)}]>0 where R(θ)=R(θ)θ and λ[A] indicates the smallest eigenvalue of matrix A.

The first two conditions refer to the log-likelihood function (see, e.g., Bianconcini Citation2014), whereas conditions 3–5 are needed for the derivation of the MSE estimators.

A.2 Proof of Proposition 2

Let ydj*Exp.Family(θ). If ud* is sampled from a suitable distribution, then we have γdj*=M{E(ydj*|ud)}=xdjtβ̂+ud*. Furthermore, Var*(yd*)=Var*(E*(yd*|ud*))+E*(Var*(yd*|ud*)). The first part of the Proposition follows from the way we generate the random effects as well as the results on the consistency of θ̂. To show the second part we consider a general score equation. Replace y by y* and set θ=θ̂, that is, R*(θ)=l*(θ̂)θ̂=d=1Dlogfd(yd*|θ̂)θ̂=0. Then E*{R*(θ)}=0 at θ=θ̂ which yields consistency of θ̂*.

A.3 Proof of Proposition 3

Let ζd be a general EBP, gd:=gd(θ) and ĝd:=ĝd(θ̂). Assume that ||ĝdgd||=OP(nc),c>0. The proof uses ideas of Chatterjee, Lahiri, and Li (2008). We investigate the properties of Gd(a).Gd(a)=P(ζ̂dζdĝd  a)=E(P[ζ˜dζdgd  a+{a(ĝdgd)+ζ˜dζd̂gd}]|yd)=E[Φ{a+Q(a,yd)}]=Φ(a)+ϕ(a)E{Q(a,yd)}21aϕ(a)E{Q2(a,yd)}+21E[aa+Q(a,yd){a+Q(a,yd)x}2(x21)ϕ(x)dx].

Applying some classical results and a triangle inequality, it follows that the last term is bounded by E|Q|3, and is of smaller order than the first three terms. Therefore, the first step toward the consistency of SCI is to quantify the asymptotic expansions of E{Q(a,yd)} and E{Q2(a,yd)}. We decompose Q(a,yd) intoQ(a,yd)=gd1/2(ζ˜dζd̂)+agd1/2(ĝd1/2gd1/2)=Q1+Q2.

Let ψ be a twice differentiable function with respect to θ,yd.=yd1++ydnd,d[D]. Observe that yd.=yd under an area-level model. The specific form of ψ depends on the choice of the GLMM (for instance, under the Poisson-gamma model we spelled it out in EquationEquation (7)). Function ψ satisfies the decomposition(A.1) ζ̂dζ˜d=ψd(yd.,θ̂)ψd(yd.,θ)={θψd(yd.,θ)}t(θ̂θ)+12(θ̂θ)t{22θψd(yd.,θ)}(θ̂θ)+oP(||θ̂θ||2).(A.1)

Let C=2c where c > 0. Since we assume ||θ̂θ||=OP(nc), we have(A.2) E[{ζ̂d(θ̂)ζ˜d(θ)}2]=1nCE([{θψd(yd.,θ)}tnc(θ̂θ)]2)+o(nC).(A.2)

As for Q1, it has been found in EquationEquation (A.1) thatE(ζ̂dζ˜d)=1ncE[{θψd(yd.,θ)}tnc(θ̂θ)]+o(nc),and E{(ζ̂dζ˜d)2}=O(nC), thanks to the result in EquationEquation (A.2). Furthermore, observe that gd is of order O(1) which leads to E(Q1)=O(Dc) as well as E(Q12)=O(DC). When we turn to Q2, we have an immediate simplification Q2=agd1/2(ĝd1/2gd1/2)=a{(ĝd/gd)1/21}. Let gd be twice differentiable with respect to θ. Similarly to the computations above, we have the expansionĝd(θ̂)=gd(θ)+(θgd(θ))t(θ̂θ)+12(θ̂θ)t×(22θgd(θ))(θ̂θ)+oP(||θ̂θ||2).

Therefore, we obtainE{ĝd(θ̂)}=gd(θ)+1nCE[{θgd(θ)}tnC(θ̂θ)]+O(nC).

It follows that E{(ĝ/g)1/2}=O(nc),E(Q2)=O(nC) and E(Q22)=O(nC). We can deduce that Gd(a) attains the asymptotic expansion Gd(a)=Φ(a)+ncγ(a,θ)+O(nC). A similar expansion can be established for Gd*(a) if we replace θ with θ̂ and P with P*.