Full article: Small area estimation using reduced rank regression models

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Small area estimation techniques have got a lot of attention during the last decades due to their important applications in survey studies. Mixed linear models and reduced rank regression analysis are jointly used when considering small area estimation. Estimates of parameters are presented as well as prediction of random effects and unobserved area measurements.

Keywords:

1. Introduction

Small area estimation is an active research area in statistics with many applications. Today there exist several approaches to deal with small area problems. For recent contributions to the topic when considering discrete data (e.g. see Chandra et al. 2018), and for continuous data (see Baldermann 2018). In both articles many works of other authors are presented and discussed. For those who want to get a solid introduction to the subject the book (Rao and Molina Citation2015) can be recommended. Usually a common basis for small area estimation problems is that a survey study (finite population case) has been conducted and based on the survey one intends to extract information concerning small domains by adding specific local information which was not accounted for in the comprehensive survey.

In this article a case is considered where units of a survey sample are investigated several times, leading to the existence of dependent observations. It is assumed that observed background information exist which then is implemented via a linear model, giving a model which includes variance components. Moreover, it is assumed that latent processes exist which often show up when different types of systems are studied, for example when weather impact is studied. In this case often a huge number of variables are studied but we do not know the effects of the individual variables, for example we cannot directly relate temperature to plant growth. This leads us to exploit reduced rank regression models. Thus, the model which will be discussed in this article will include time dependent fix effects, i.e. structured mean responses, random parameters and rank restrictions on parameters, which to our knowledge is a new type of model. In Section 2 more motivating details are given, including basic references, and in Section 3 the assumptions are summarized together with all technical details. Briefly, in Section 4 the estimation of all unknown parameters is considered and finally two prediction results are presented in Section 5.

2. Background

In this section, via an example, we will introduce our way of thinking about extracting information from a survey sample. In particular, a number of different sources of uncertainty are presented which then will be included in a statistical model. It turns out that the model will comprise three different types of uncertainty. Section 3 includes all details of the model and in the subsequent section likelihood-based estimators are derived. In the last section these estimators are utelized when finding predictors.

Example 2.1.

Let there be a national survey which collects data about some specific production from a certain type of companies. The survey is followed-up once per year and it is repeated, say, five years.

Altogether there exist 20 regions and each region consists of 10 subregions. The survey sample (units) consists of four companies from each subregion. Thus, in the survey, in total 800 companies are followed five years. With the models presented in this article the aim is to obtain information about the companies in the subregions. Note that in reality the companies should be classified into strata and then at least one company should be sampled from each stratum and each subregion. To simplify the presentation we will only have one stratum. However, in each subregion we assume that there is a lot of background information which we would like to take into account when predicting production in subregions or predicting production of unobserved units.

Small area models, also called small domain models, can be classified into two categories. One category is the area-level models category (Fay and Herriot Citation1979) and the other category is the unit-level models category (Battese et al. Citation1988). Both model categories are discussed in detail in a well-written book (Rao and Molina Citation2015). In this article we are thinking of area levels but if unit-level information would be available the proposed model can also be used.

Example 2.1 indicates the following use of notation. Let ${\hat{θ}}_{ij}, i = 1,, \dots, m, j = 1, 2, \dots, n_{i}, n = \sum_{i = 1}^{m} n_{i}$ be the direct estimates from the survey for the jth subarea within the ith area. The estimates are based on the survey design where units have been sampled on the subarea level. Note that we will not discuss the underlying survey in this article, only use the outcome of the survey, i.e. the direct estimators of the subareas.

Statistics is partly about quantifying uncertainty. Often this is carried out by assuming that an estimate is a representation of an estimator which follows some distribution. However, we also have cases where uncertainty is not described via distributions. Examples are rounding errors, specific types of truncations, identification of influential observations, outlier detection, model comparisons, the number of observations in a sample, etc. One choice of distribution for the estimator ${\hat{θ}}_{ij}$ would be the one generated by the survey design, i.e. how many and under what model the units have been sampled would determine the distribution. However, for our purposes we have difficulties to work with the probability space generated by the survey design. Instead, we now think of the direct estimates ${\hat{θ}}_{ij}$ being misspecified quantities since the survey sample distribution does not take into account those covariables we are going to use in the adjustment of ${\hat{θ}}_{ij} .$ The covariables can be used to correct for bias as well as diminish the uncertainty in the estimate. For example, returning to our example, if the survey design does not take into account the size of companies small companies can be overrepresented in the sample and then θ_ij is wrongly estimated.

In order to derive an applicable model lets start with (2.1) $y_{ij} = θ_{ij} + ϵ_{ij}$ (2.1) where for notational convenience ${\hat{θ}}_{ij}$ has been replaced by y_ij. The joint distribution for ${ϵ_{ij}}$ is specified via (2.2) $ϵ \sim N_{n} (0, V)$ (2.2) where $ϵ = (ϵ_{11}, \dots, ϵ_{1, n_{1}}, \dots, ϵ_{m, n_{m}})$ is a row vector, V is supposed to be known and determined by the survey design. Often V is solely a function of the number of sampled units. Moreover, $ϵ$ is defined on another probability space than the one generated by the survey design. However, the uncertainty in the direct estimates, due to the survey, is incorporated in the dispersion of $ϵ .$

Furthermore, one basic assumption in this article is that some of the θ_ij will be the same, i.e. the survey estimators are the same for clusters of subregions. To make such an assumption is often realistic but has to be checked in practise. Under this assumption, instead of (2.1), the model can be written in matrix notation (2.3) $y = β' C + ϵ$ (2.3) where $y = (y_{11}, \dots, y_{1, n_{1}}, \dots, y_{m, n_{m}}), (θ_{11}, \dots, θ_{1, n_{1}}, \dots, θ_{m, n_{m}}) = β' C,$ and $β :$ $k \times 1$ consists of unknown parameters and C: k × n is a usual design matrix describing the relations among the elements ${θ_{ij}} .$ Usually C consists of blocks of one-vectors but in principle it can be arbitrary as long as it fits together with the covariate structures which will be presented later. For example, with three areas and four subareas in each area C equals $C = (\begin{matrix} 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{matrix})$

The error in EquationEquation (2.3)(2.3) $y = β' C + ϵ$ (2.3) is defined by EquationEquation (2.2)(2.2) $ϵ \sim N_{n} (0, V)$ (2.2) . The model means that a probability space is only connected to the modeling of ${{\hat{θ}}_{ij}},$ i.e. the set of estimated θ_ij, but we include in the model extra variation due to the survey design.

In the introductory example it was stated that the survey was repeated five times. When survey studies are repeated it will for this article be assumed that when there exists a linear trend it is of the same form for all subareas. In this case there will be a matrix Y where each row in the matrix follows EquationEquation (2.3)(2.3) $y = β' C + ϵ$ (2.3) , besides an unknown scalar multiplier. Thus, EquationEquation (2.3)(2.3) $y = β' C + ϵ$ (2.3) implies (2.4) $Y = ABC + E$ (2.4) where A is a design matrix which models p repeated surveys (within subarea design matrix), B is a matrix of unknown parameters and $E \sim N_{p, n} (0, Σ, V),$ i.e. E is matrix normally distributed with a separable dispersion structure $D [E] = D [vec E] = V \otimes Σ,$ where ⊗ stands for the Kronecker product and $Σ$ is an unknown positive definite matrix. The parameter $Σ$ is used to model the dependency due to the repeated measurement which exist because the observed units of the design have been repeatedly measured. No particular structure in $Σ$ is assumed since it would only make sense if we would have precise knowledge about the mean structure, which is not the case in this article. For example, it does not make sense to assume an autocorrelated structure if the mean has not been established. The model in EquationEquation (2.4)(2.4) $Y = ABC + E$ (2.4) is often called Growth Curve model, GMANOVA and recently bilinear regression model (see von Rosen Citation2018, for basic results and references).

As an example of A we can have $A = (\begin{matrix} 1 & t_{1} & t_{1}^{2} \\ 1 & t_{2} & t_{2}^{2} \\ 1 & t_{3} & t_{3}^{2} \\ 1 & t_{4} & t_{4}^{2} \\ 1 & t_{5} & t_{5}^{2} \end{matrix})$ meaning that there is a second order trend which is observed over five years.

When expressing θ_ij as a linear function of an unknown parameter vector $β,$ as it was done in EquationEquation (2.3)(2.3) $y = β' C + ϵ$ (2.3) , we usually cannot say that the model is completely true for all subareas. Instead we believe that there is some random variation around a true imaginary model, i.e. we introduce a variance component. Thus, instead of EquationEquation (2.4)(2.4) $Y = ABC + E$ (2.4) , the following model is set up: (2.5) $Y = ABC + UZ + E$ (2.5) where $U \sim N_{p, l} (0, Σ_{u}, I_{l})$ is independent of E and Z is related to C because, in the model, usually via C, the observations are linked to the subareas and Z helps to describe the random variation (errors) in the subareas. Therefore, formally, $C (Z') \subseteq C (C'),$ where $C (•)$ denotes the column vector space. Additionally we will standardize these random effects by assuming $Z V^{- 1} Z' = I_{l},$ meaning that it has been adjusted for a different number of subareas within an area, taking into an account the uncertainty in the survey estimates. This has two important implications. One is that it simplifies the mathematical treatment of the model and the second is that it becomes easier to compare predicted random effects.

Now observed covariables will be introduced in the model. There exist different types of covariables but we are not going to distinguish between different types. For example one type of covariables are those which are accounted for in the survey design (e.g. variables defining different strata) and another type can be covariables observed in registries and a third type can be baseline data which were available when a survey study with repeated measurements on units was initiated. The effect of the covariate will be modeled by a term $B_{1} C_{1}$ where $B_{1}$ stands for the unknown effect and $C_{1}$ collects the observed covariables leading to the model (2.6) $Y = ABC + B_{1} C_{1} + UZ + E$ (2.6)

Moreover, suppose that there are a number of latent variables. It can be large clusters of variables, e.g. weather variables, socio-econometric variables, chemical characteristics of soil and water, etc. A typical phenomena of these clusters are that one knows that there should be an effect on the survey estimates but it is difficult to express this via some functional relationship. Instead we will model the relationship via rank restrictions on parameter matrices and will consider the following model: (2.7) $Y = ABC + B_{1} C_{1} + Ψ F + UZ + E$ (2.7) where F: t × n is known and $Ψ$ is unknown of rank $r (Ψ) = r \leq \min (p, t) .$ The idea is that all clustered variables appear in F and these variables are governed by r unobserved latent variables (processes) which are incorporated in the model via the rank restrictions on $Ψ .$ It is important to note that p has to be larger than r meaning that modeling the number of underlying latent variables depends on how many times the survey has been repeated. Thus, we really see an advantage of performing repeated surveys when latent variables are supposed to exist, which indeed we believe is a common phenomena.

3. Detailed model specification

Let Y consist of the direct estimates of the subareas where each column corresponds to a subarea including p estimators from the p repeated measurements. When performing likelihood inference the direct estimates are used but in the presentation we will not distinguish between estimators and estimates.

Definition 3.1.

Let $Y = ABC + B_{1} C_{1} + Ψ F + UZ + E$ where A: p × q, C: k × n, $C_{1}$ : $k_{1} \times n,$ F: t × n, are all known matrices, $C (F') \subseteq C (C'), r (Ψ) = r < \min (p, t),$ Z: l × n, $p \leq l < n, Z V^{- 1} Z' = I_{l}, U \sim N_{p, l} (0, Σ_{u}, I_{l}), E \sim N_{p, n} (0, Σ_{e}, V)$ where U and E are independently distributed. The parameters $Σ_{u}$ and $Σ_{e}$ are supposed to be unknown and positive definite whereas V is known and determined from the survey.

When in Definition 3.1 $B_{1} C_{1} = 0,$ $Ψ F = 0$ and $UZ = 0$ then we have the classical growth curve model (GMANOVA) (see Potthoff and Roy Citation1964; von Rosen Citation2018). When $Ψ F = 0$ and $UZ = 0$ then the growth curve model with covariate information appears which sometimes is identified as a mixture of the GMANOVA and MANOVA models. Some references to this model, as well as more general models (see Chinchilli and Elswick Citation1985; Verbyla and Venables Citation1988; von Rosen Citation1989; and Bai and Shi Citation2007). If $B_{1} C_{1} = 0$ and $Ψ F = 0$ we say that we have the growth curve model with random effects (see Ip et al. Citation2007) whereas if only $Ψ F = 0$ holds (see Yokoyama and Fujikoshi Citation1992; Yokoyama Citation1995) where similar models are considered and where references to earlier works can be found. However, usually in these works one puts structures on the covariance matrices which will not be discussed in this article.

4. Estimation of parameters

Let Q be a matrix of basis vectors such that $C (Q) = C (V^{- 1 / 2} C' : V^{- 1 / 2} C_{1}^{'} : V^{- 1 / 2} Z'),$ where $V^{1 / 2}$ is a symmetric square root of V in $E \sim N_{p, n} (0, Σ, V) .$ Further, let $Q^{o}$ be any matrix of full rank satisfying $C (Q^{o}) = C {(C^{'} : C_{1}^{'} : Z^{'})}^{⊥} .$ We assume $Q' Q = I_{v},$ where $v = r (C' : C_{1}^{'} : Z'),$ v > l.

A one-one transformation of the model in Definition 3.1, using Q, yields (4.8) $\begin{matrix} Y V^{- 1 / 2} Q = ABC V^{- 1 / 2} Q + B_{1} C_{1} V^{- 1 / 2} Q + Ψ F V^{- 1 / 2} Q \\ + UZ V^{- 1 / 2} Q + E V^{- 1 / 2} Q, \end{matrix}$ (4.8) (4.9) $Y V^{- 1 / 2} Q^{o} = E V^{- 1 / 2} Q^{o}$ (4.9)

The matrix $G = Q' V^{- 1 / 2} Z' Z V^{- 1 / 2} Q$ of size v × v is idempotent since $C (V^{- 1 / 2} Z') \subseteq C (Q)$ and $Z V^{- 1} Z' = I,$ and therefore $G = Γ (\begin{matrix} I_{l} & 0 \\ 0 & 0 \end{matrix}) Γ'$ where $Γ :$ v × v is a known orthogonal matrix. The identity in EquationEquation (4.8)(4.8) $\begin{matrix} Y V^{- 1 / 2} Q = ABC V^{- 1 / 2} Q + B_{1} C_{1} V^{- 1 / 2} Q + Ψ F V^{- 1 / 2} Q \\ + UZ V^{- 1 / 2} Q + E V^{- 1 / 2} Q, \end{matrix}$ (4.8) is post-multiplied by $Γ$ leading to the model (4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10)

However, since $Γ' Q' V^{- 1 / 2} Z' Z V^{- 1 / 2} Q Γ = I,$ $D [UZ V^{- 1 / 2} Q Γ] = (\begin{matrix} I_{l} & 0 \\ 0 & 0 \end{matrix}) \otimes Σ_{u}$ and this leads to that the model in EquationEquation (4.10)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) will be split into two models. Let $Γ = (Γ_{1} : Γ_{2}) :$ $v \times l, v \times (v - l) .$ Then we have three models which will be used when finding estimators:

$Y V^{- 1 / 2} Q Γ_{1} = ABC V^{- 1 / 2} Q Γ_{1} + B_{1} C_{1} V^{- 1 / 2} Q Γ_{1} + Ψ F V^{- 1 / 2} Q Γ_{1} + UZ V^{- 1 / 2} Q Γ_{1} + E V^{- 1 / 2} Q Γ_{1}, UZ V^{- 1 / 2} Q Γ_{1} \sim N_{p, l} (0, Σ_{u}, I_{l}), E V^{- 1 / 2} Q Γ_{1} \sim N_{p, l} (0, Σ_{e}, I_{l}),$ (4.11)
$Y V^{- 1 / 2} Q Γ_{2} = ABC V^{- 1 / 2} Q Γ_{2} + B_{1} C_{1} V^{- 1 / 2} Q Γ_{2} + Ψ F V^{- 1 / 2} Q Γ_{2} + E V^{- 1 / 2} Q Γ_{2}, E V^{- 1 / 2} Q Γ_{2} \sim N_{p, v - l} (0, Σ_{e}, I_{v - l}),$ (4.12)
$Y V^{- 1 / 2} Q^{o} = E V^{- 1 / 2} Q^{o}, E V^{- 1 / 2} Q^{o} \sim N_{p, n - v} (0, Σ_{e}, I_{n - v}) .$ (4.13)

The idea is to utilize EquationEquations (4.12)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) and Equation(4.13)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) when estimating B, $B_{1}, Ψ$ and $Σ_{e} .$ To estimate $Σ_{u}$ these estimators are inserted in EquationEquation (4.11)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) . Suppose that the estimators $\hat{B}, {\hat{B}}_{1},$ $\hat{Ψ}$ and ${\hat{Σ}}_{e}$ have been obtained, and let $\begin{matrix} Y_{0} = X V^{- 1 / 2} Q Γ_{1} - A \hat{B} C V^{- 1 / 2} Q Γ_{1} - {\hat{B}}_{1} C_{1} V^{- 1 / 2} Q Γ_{1} - \hat{Ψ} F V^{- 1 / 2} Q, \\ Ω = Σ_{u} + Σ_{e} . \end{matrix}$

Under the assumption of no randomness in $\hat{B},$ ${\hat{B}}_{1}$ and $\hat{Ψ},$ $Y_{0} = \tilde{E}, \tilde{E} \sim N_{p, l} (0, Ω, I_{l})$ Thus, if $p \leq l, \hat{Ω} = \frac{1}{l} Y_{0} Y_{0}^{'}$ , implying ${\hat{Σ}}_{u} = \frac{1}{l} Y_{0} Y_{0}^{'} - {\hat{Σ}}_{e} .$

Here it is assumed that the difference is positive definite. If ${\hat{Σ}}_{u}$ is not positive definite, which should occur with some positive probability, this indicates that data does not support the model presented in Definition 3.1 or the estimation procedure has some deficiencies. It can be noted that the problem with negative variance components in mixed linear models has a long history and an early reference is Nelder (Citation1954). A more recent contribution to the discussion of the problem is provided by Molenberghs and Verbeke (Citation2011). If we would start with the model in EquationEquation (2.5)(2.5) $Y = ABC + UZ + E$ (2.5) and only consider $D [Y] = ZZ' \otimes Σ_{u} + V \otimes Σ_{e}$ we could avoid to assume $Σ_{u}$ to be positive definite. However, in our model derivation, via $U \sim N_{p, l} (0, Σ_{u}, I_{l}),$ $Σ_{u}$ has to be positive definite. Thus, if obtaining a non-positive definite ${\hat{Σ}}_{u}$ we have to either reformulate the model or change the estimator. One way to modify ${\hat{Σ}}_{u}$ is to work with the eigenvalues of ${\hat{Σ}}_{u} .$ If there are a few “small” (by absolute value) non-positive eigenvalues these eigenvalues can be replaced by small positive quantities. This should of course take place with caution and common sense but as far as we know there is no general recipe of how to handle a non-positive definite ${\hat{Σ}}_{u} .$

To estimate $\hat{B}, {\hat{B}}_{1},$ $\hat{Ψ}$ and ${\hat{Σ}}_{e}$ EquationEquations (4.12)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) and Equation(4.13)(4.10) $\begin{matrix} Y V^{- 1 / 2} Q Γ = ABC V^{- 1 / 2} Q Γ + B_{1} C_{1} V^{- 1 / 2} Q Γ + Ψ F V^{- 1 / 2} Q Γ \\ + UZ V^{- 1 / 2} Q Γ + E V^{- 1 / 2} Q Γ . \end{matrix}$ (4.10) are merged: (4.14) $\begin{matrix} Y V^{- 1 / 2} (Q Γ_{2} : Q^{o}) = AB (C V^{- 1 / 2} Q Γ_{2} : 0) + B_{1} (C_{1} V^{- 1 / 2} Q Γ_{2} : 0) \\ + Ψ (F V^{- 1 / 2} Q Γ_{2} : 0) + \tilde{E}, \end{matrix}$ (4.14) where $\tilde{E} \sim N_{p, n - l} (0, Σ_{e}, I_{n - l}) .$ Put $\begin{matrix} X = Y V^{- 1 / 2} (Q Γ_{2} : Q^{o}) : p \times (n - l), \\ D_{1} = (C V^{- 1 / 2} Q Γ_{2} : 0) : k \times (n - l), \\ D_{2} = (C_{1} V^{- 1 / 2} Q Γ_{2} : 0) : k_{1} \times (n - l), \\ D_{3} = (F V^{- 1 / 2} Q Γ_{2} : 0) : t \times (n - l) . \end{matrix}$

EquationEquation (4.14)(4.8) $\begin{matrix} Y V^{- 1 / 2} Q = ABC V^{- 1 / 2} Q + B_{1} C_{1} V^{- 1 / 2} Q + Ψ F V^{- 1 / 2} Q \\ + UZ V^{- 1 / 2} Q + E V^{- 1 / 2} Q, \end{matrix}$ (4.8) is identical to (4.15) $X = AB D_{1} + B_{1} D_{2} + Ψ D_{3} + \tilde{E}$ (4.15) which is an extended Growth Curve model (GMANOVA + MANOVA) with a reduced rank regression component. Furthermore, the likelihood function which corresponds to the model in EquationEquation (4.15)(4.15) $X = AB D_{1} + B_{1} D_{2} + Ψ D_{3} + \tilde{E}$ (4.15) equals (4.16) $\begin{matrix} L (B, B_{1}, Ψ, Σ_{e}) = {(2 π)}^{- \frac{1}{2} p (n - l)} | Σ_{e} |^{- \frac{1}{2} (n - l)} \\ \times exp (- \frac{1}{2} tr {Σ_{e}^{- 1} (X - AB D_{1} - B_{1} D_{2} - Ψ D_{3}) ()'}), \end{matrix}$ (4.16) where we have used the convention that instead of $(H) (H)'$ it is written $(H) ()'$ for any arbitrary matrix expression H. Our first observation is that the likelihood in EquationEquation (4.16)(4.16) $\begin{matrix} L (B, B_{1}, Ψ, Σ_{e}) = {(2 π)}^{- \frac{1}{2} p (n - l)} | Σ_{e} |^{- \frac{1}{2} (n - l)} \\ \times exp (- \frac{1}{2} tr {Σ_{e}^{- 1} (X - AB D_{1} - B_{1} D_{2} - Ψ D_{3}) ()'}), \end{matrix}$ (4.16) is smaller or equal to (4.17) $\begin{matrix} {(2 π)}^{- \frac{1}{2} p (n - l)} | Σ_{e} |^{- \frac{1}{2} (n - l)} \\ \times exp (- \frac{1}{2} tr {Σ_{e}^{- 1} ((X - AB D_{1}) (I - P_{D_{2}^{'}}) - Ψ D_{3} (I - P_{D_{2}^{'}}) ()')}) \end{matrix}$ (4.17) with equality if and only if $B_{1} D_{2} = X P_{D_{2}^{'}} - AB D_{1} P_{D_{2}^{'}} - Ψ D_{3} P_{D_{2}^{'}}$ which, under some full rank conditions on $D_{2},$ determines $B_{1}$ as a function of B and $Ψ .$ Thus, if B and $Ψ$ can be estimated the covariate effect described by $B_{1} D_{2}$ can be estimated. The density in EquationEquation (4.17)(4.17) $\begin{matrix} {(2 π)}^{- \frac{1}{2} p (n - l)} | Σ_{e} |^{- \frac{1}{2} (n - l)} \\ \times exp (- \frac{1}{2} tr {Σ_{e}^{- 1} ((X - AB D_{1}) (I - P_{D_{2}^{'}}) - Ψ D_{3} (I - P_{D_{2}^{'}}) ()')}) \end{matrix}$ (4.17) corresponds to the model (4.18) $\begin{matrix} X (I - P_{D_{2}^{'}}) = AB D_{1} (I - P_{D_{2}^{'}}) + Ψ D_{3} (I - P_{D_{2}^{'}}) + \tilde{E}, \\ \tilde{E} \sim N_{p, n - l} (0, Σ_{e}, I_{n - l}) \end{matrix}$ (4.18) which is a Growth Curve model with latent variables (reduced rank effect). The model has been considered (Reinsel and Velu Citation1998; von Rosen and von Rosen Citation2017), where also other references can be found. Now it is shortly described how B and $Ψ$ can be estimated. For notational conveniences the model in EquationEquation (4.18)(4.18) $\begin{matrix} X (I - P_{D_{2}^{'}}) = AB D_{1} (I - P_{D_{2}^{'}}) + Ψ D_{3} (I - P_{D_{2}^{'}}) + \tilde{E}, \\ \tilde{E} \sim N_{p, n - l} (0, Σ_{e}, I_{n - l}) \end{matrix}$ (4.18) is written (4.19) $X_{1} = AB {\tilde{D}}_{1} + Ψ {\tilde{D}}_{3} + \tilde{E}, \tilde{E} \sim N_{p, n - l} (0, Σ_{e}, I_{n - l})$ (4.19)

Note that $C ({\tilde{D}}_{3}^{'}) \subseteq C ({\tilde{D}}_{1}^{'})$ because $C (F') \subseteq C (C')$ which is essential for being able to obtain explicit estimators.

Let $W = (X_{1} - AB {\tilde{D}}_{1} - Ψ {\tilde{D}}_{3}) ()'$

and then the likelihood function for model EquationEquation (4.19)(4.19) $X_{1} = AB {\tilde{D}}_{1} + Ψ {\tilde{D}}_{3} + \tilde{E}, \tilde{E} \sim N_{p, n - l} (0, Σ_{e}, I_{n - l})$ (4.19) equals (4.20) $L (B, Ψ, Σ_{e}) = {(2 π)}^{- (n - l) p / 2} | Σ_{e} |^{- (n - l) / 2} exp (- \frac{1}{2} tr {Σ_{e}^{- 1} W})$ (4.20)

The upper bound of the likelihood function in EquationEquation (4.20) $W = (X_{1} - AB {\tilde{D}}_{1} - Ψ {\tilde{D}}_{3}) ()'$ is well known (see Srivastava and Khatri Citation1979, Theorem 1.10.4): (4.21) $L (B, Ψ, Σ_{e}) \leq {(2 π)}^{- (n - l) p / 2} e^{{- \frac{1}{2} (n - l) p}} | \frac{1}{n - l} W |^{- (n - l) / 2}$ (4.21) and in EquationEquation (4.21)(4.21) $L (B, Ψ, Σ_{e}) \leq {(2 π)}^{- (n - l) p / 2} e^{{- \frac{1}{2} (n - l) p}} | \frac{1}{n - l} W |^{- (n - l) / 2}$ (4.21) equality holds if and only if $(n - l) Σ_{e} = W .$ Hence, if B and $Ψ$ are estimated then $Σ_{e}$ is also estimated. Maximizing the log-likelihood function is equivalent to minimizing the determinant $| W |$ which now will take place.

It is worth noting that it is used, since $r (Ψ) = r,$ that the matrix $Ψ$ can be factored into two terms, i.e. $Ψ = Ψ_{1} Ψ_{2},$ where $Ψ_{1}$ and $Ψ_{2}$ are of size p × r and size r × t, respectively. Depending on the knowledge about the model $Ψ_{1}$ and $Ψ_{2}$ can be interpreted but there are also cases where the factorization has no clear interpretation.

In the subsequent, let $P_{Q} = Q {(Q^{'} Q)}^{-} Q'$ and $P_{Q, V} = Q {(Q^{'} V^{- 1} Q)}^{-} Q' V^{- 1}$ with ${(•)}^{-}$ denoting an arbitrary generalized inverse.

Following the approach presented in (Kollo and von Rosen Citation2005, Chapter 4) we have (4.22) $| W | \geq | S_{1} + T_{1} (X P_{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (X P_{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} |$ (4.22) where (4.23) $S_{1} = X_{1} (I_{n - l} - P_{{\tilde{D}}_{1}^{'}}) X_{1}^{'},$ (4.23) (4.24) $T_{1} = I_{p} - P_{A, S_{1}}$ (4.24) and equality in EquationEquation (4.22)(4.22) $| W | \geq | S_{1} + T_{1} (X P_{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (X P_{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} |$ (4.22) holds if and only if $AB {\tilde{D}}_{1} = P _{A, S_{1}} (X P_{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) .$ Throughout the article it is supposed that $S_{1}$ is positive definite which holds with probability 1. Thus, B as a function of $Ψ = Ψ_{1} Ψ_{2}$ can be obtained. Moreover, (4.25) $\begin{matrix} | S_{1} + T_{1} (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} | \\ \geq | S_{2} + T_{2} T_{1} XP _{{\tilde{D}}_{3}^{'}} X' T'_{1} T'_{2} |, \end{matrix}$ (4.25) where (4.26) $\begin{matrix} S_{2} = S_{1} + T_{1} X (P _{{\tilde{D}}_{1}^{'}} - P _{{\tilde{D}}_{3}^{'}}) X' T'_{1}, \\ T_{2} = I_{p} - P _{T_{1} Ψ_{1}, S_{2}}, \end{matrix}$ (4.26)

and the equality in EquationEquation (4.25)(4.25) $\begin{matrix} | S_{1} + T_{1} (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} | \\ \geq | S_{2} + T_{2} T_{1} XP _{{\tilde{D}}_{3}^{'}} X' T'_{1} T'_{2} |, \end{matrix}$ (4.25) holds if and only if (4.27) $T_{1} Ψ_{1} Ψ_{2} {\tilde{D}}_{3} = P _{T_{1} Ψ_{1}, S_{2}} XP _{{\tilde{D}}_{3}^{'}}$ (4.27)

Given $Ψ_{1},$ the linear system of equations in EquationEquation (4.27)(4.27) $T_{1} Ψ_{1} Ψ_{2} {\tilde{D}}_{3} = P _{T_{1} Ψ_{1}, S_{2}} XP _{{\tilde{D}}_{3}^{'}}$ (4.27) is consistent and hence $Ψ_{2}$ can always be estimated as a function of $Ψ_{1} :$ (4.28) $Ψ_{2} = {(Ψ_{1}^{'} T_{1}^{'} S_{2}^{- 1} T_{1} Ψ_{1})}^{- 1} Ψ'_{1} T'_{1} S_{2}^{- 1} X {\tilde{D}}_{3}^{'} {({\tilde{D}}_{3} {\tilde{D}}_{3}^{'})}^{- 1},$ (4.28) under the assumption that $r ({\tilde{D}}_{3}) = k_{1}$ and $C (A) \cap C (Ψ_{1}) = {0} .$ The only parameter in the right-hand side of EquationEquation (4.25)(4.25) $\begin{matrix} | S_{1} + T_{1} (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} | \\ \geq | S_{2} + T_{2} T_{1} XP _{{\tilde{D}}_{3}^{'}} X' T'_{1} T'_{2} |, \end{matrix}$ (4.25) which is left to estimate is $Ψ_{1} .$ The right-hand side of EquationEquation (4.25)(4.25) $\begin{matrix} | S_{1} + T_{1} (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3}) (XP _{{\tilde{D}}_{1}^{'}} - Ψ_{1} Ψ_{2} {\tilde{D}}_{3})' T'_{1} | \\ \geq | S_{2} + T_{2} T_{1} XP _{{\tilde{D}}_{3}^{'}} X' T'_{1} T'_{2} |, \end{matrix}$ (4.25) can be factored as $| S_{2} + T_{2} T_{1} XP _{{\tilde{D}}_{3}^{'}} X' T'_{1} T'_{2} | = | S_{2} | | R | | M' UM |,$ where $\begin{matrix} R = I_{n} + P _{{\tilde{D}}_{3}^{'}} X' T'_{1} S_{2}^{- 1} T_{1} XP _{{\tilde{D}}_{3}^{'}}, \\ M = H' Ψ_{1} {(Ψ_{1}^{'} {HH}^{'} Ψ_{1})}^{- 1 / 2}, \\ U = I_{p - r (A)} - H' XP _{{\tilde{D}}_{3}^{'}} R^{- 1} P _{{\tilde{D}}_{3}^{'}} X' H, \end{matrix}$ with $H : p \times r (A)$ is such that $T'_{1} S_{2}^{- 1} T_{1} = HH',$ $M : (p - r (A)) \times r$ is semi-orthogonal of rank r and U is a positive definite matrix. The square root in M is supposed to be symmetric. Due to the Poincaré separation theorem (see Rao Citation1979) $| S_{2} | | R | | M' UM | \geq | S_{2} | | R | \prod_{i = 1}^{r} λ_{p - r (A) - r + i},$ where $λ_{1} \geq \dots \geq λ_{p - r (A)}$ are the eigenvalues of U which do not depend on any unknown parameter. The lower bound is then attained if M consists of the corresponding eigenvectors $v_{1}, \dots, v_{r} .$ Let $\tilde{M} = (v_{1}, \dots, v_{r}) .$ Therefore, a maximum likelihood estimator of $Ψ_{1}$ is found if we can find $Ψ_{1}$ satisfying the following equality: (4.29) $H' Ψ_{1} {(Ψ_{1}^{'} {HH}^{'} Ψ_{1})}^{- 1 / 2} = \tilde{M} .$ (4.29)

Since $\tilde{M}' \tilde{M} = I_{r},$ the following choice of $Ψ_{1}$ is appropriate: (4.30) ${\hat{Ψ}}_{1} = H {(H^{'} H)}^{- 1} \tilde{M} .$ (4.30)

There remains an unresolved problem of presenting all solutions to EquationEquation (4.29)(4.29) $H' Ψ_{1} {(Ψ_{1}^{'} {HH}^{'} Ψ_{1})}^{- 1 / 2} = \tilde{M} .$ (4.29) , but this is not considered here.

Together with EquationEquation (4.28) $| S_{2} | | R | | M' UM | \geq | S_{2} | | R | \prod_{i = 1}^{r} λ_{p - r (A) - r + i},$ this yields that $\hat{Ψ} {\tilde{D}}_{3} = {\hat{Ψ}}_{1} \hat{Ψ}'_{1} T'_{1} S_{2}^{- 1} X P_{\tilde{D}'_{3}} .$ Now, the obtained results will be stated in the following theorem.

Theorem 4.1.

Let the direct estimates of a survey follow the model given in Definition 3.1. Moreover, all matrices in the statements have earlier been defined in this section.

${\hat{Ψ}}_{1} = H {(H^{'} H)}^{- 1} {\tilde{M}}_{1};$
If $C (A) \cap C (Ψ) = {0},$ $\hat{Ψ} {\tilde{D}}_{3} = {\hat{Ψ}}_{1} {\hat{Ψ}}_{1}^{'} T_{1}^{'} S_{2}^{- 1} X_{1} P_{{\tilde{D}}_{3}^{'}};$
If additionally $r ({\tilde{D}}_{3}^{'}) = t,$ $\hat{Ψ} = {\hat{Ψ}}_{1} {\hat{Ψ}}_{1}^{'} T_{1}^{'} S_{2}^{- 1} X_{1} {\tilde{D}}_{3}^{'} {({\tilde{D}}_{3} {\tilde{D}}_{3}^{'})}^{- 1};$
If $C (A) \cap C (Ψ) = {0},$ $A \hat{B} {\tilde{D}}_{1} = A {(A^{'} S_{1}^{- 1} A)}^{-} A' S_{1}^{- 1} (X_{1} P_{{\tilde{D}}_{1}^{'}} - \hat{Ψ} {\tilde{D}}_{3});$
If additionally $r ({\tilde{D}}_{1}) = k$ and $r (A) = q$ $\hat{B} = {(A^{'} S_{1}^{- 1} A)}^{- 1} A' S_{1}^{- 1} (Y_{1} {\tilde{D}}_{1}^{'} {({\tilde{D}}_{1} {\tilde{D}}_{1}^{'})}^{- 1} - \hat{Ψ} {\tilde{D}}_{3} {\tilde{D}}_{1}^{'} {({\tilde{D}}_{1} {\tilde{D}}_{1}^{'})}^{- 1});$
$\hat{B} D_{2} = X P_{D_{2}^{'}} - A {\hat{B}}_{1} D_{1} P_{D_{2}^{'}} - \hat{Ψ} D_{3} P_{D_{2}^{'}};$
If additionally $r (D_{2}) = k_{2}$ ${\hat{B}}_{1} = X D_{2}^{'} {(D_{2} D_{2}^{'})}^{- 1} - A \hat{B} D_{1} D_{2}^{'} {(D_{2} D_{2}^{'})}^{- 1} - \hat{Ψ} D_{3} D_{2}^{'} {(D_{2} D_{2}^{'})}^{- 1};$
$(n - l) {\hat{Σ}}_{e} = S_{2} + {\hat{T}}_{2} T_{1} Y_{1} P_{{\tilde{D}}_{3}^{'}} Y_{1}^{'} T_{1}^{'} {\hat{T}}_{2}^{'},$ where ${\hat{T}}_{2} = I_{p} - T_{1} {\hat{Ψ}}_{1} {({\hat{Ψ}}_{1}^{'} T_{1}^{'} S_{2}^{- 1} T_{1} {\hat{Ψ}}_{1})}^{-} {\hat{Ψ}}_{1}^{'} T_{1}^{'} S_{2}^{- 1};$
Suppose $\frac{1}{l} Y_{0} Y_{0}^{'} - {\hat{Σ}}_{e}$ is positive definite. Then ${\hat{Σ}}_{u} = \frac{1}{l} Y_{0} Y_{0}^{'} - {\hat{Σ}}_{e} .$

5. Prediction

Small area estimation is very often about prediction of unobserved units (subareas). For our model, due to normality assumptions, it is relatively straight forward to construct predictors, in particular if the explicit estimators of the previous section are utilized. The strategy will be to first predict U which will be based on the observed data and thereafter the unobserved units are predicted.

For prediction of U the conditional mean $E [U | Y]$ is crucial and thus the joint distribution of U and Y will be derived. According to Definition 3.1, U and E are independently distributed $(vec U, vec Y)$ is normally distributed, i.e. $(\begin{matrix} vec U \\ vec Y \end{matrix}) \sim N_{p (n + l)} (E [•], D [•]),$ where $\begin{matrix} E [•] = (0, vec' (ABC + B_{1} C_{1} + Ψ F))', \\ D [•] = (\begin{matrix} I \otimes Σ_{u} & Z \otimes Σ_{u} \\ Z' \otimes Σ_{u} & Z' Z \otimes Σ_{u} + V \otimes Σ_{e} \end{matrix}) . \end{matrix}$

Thus, $\begin{matrix} E [vec U | vec Y] = (Z \otimes Σ_{u}) {(Z^{'} Z \otimes Σ_{u} + V \otimes Σ_{e})}^{- 1} vec (Y - ABC - B_{1} C_{1} - Ψ F) \end{matrix}$ and the next proposition can be stated.

Proposition 5.1.

Let the model be given in Definition 3.1 and use the notation introduced earlier in Section 4.

The predicted value $\hat{U}$ is given by $vec \hat{U} = (Z \otimes {\hat{Σ}}_{u}) {(Z^{'} Z \otimes {\hat{Σ}}_{u} + V \otimes {\hat{Σ}}_{e})}^{- 1} vec (Y - A \hat{B} C - {\hat{B}}_{1} C_{1} - \hat{Ψ} F),$ where ${\hat{Σ}}_{u}, {\hat{Σ}}_{e},$ $A \hat{B} C,$ ${\hat{B}}_{1} C_{1}$ and $\hat{Ψ} F$ are presented in Theorem 4.1 and Y consists of subarea estimates.
Let $Y_{o}$ consist of the unobserved direct estimates, corresponding to the non-sampled units. The corresponding covariables are available which are denoted $C_{o}, C_{1 o}, F_{o}$ and $Z_{o} .$ Then $vec {\hat{Y}}_{o} = vec (A \hat{B} C_{o} + {\hat{B}}_{1} C_{1 o} + \hat{Ψ} F_{o}) + (Z_{o}^{'} \otimes I) vec \hat{U},$ where $vec \hat{U}$ is given in (i).

Additional information

Funding

This research has been supported by The Swedish Foundation for Humanities and Social Sciences (P14-0641:1) and The Swedish Natural Research Council (2017-03003).

References

Bai, P., and L. Shi. 2007. Exact distributions of MLEs of regression coefficients in GMANOVA-MANOVA model. Journal of Multivariate Analysis 98 (9):1840–52. doi:10.1016/j.jmva.2007.06.001.
Web of Science ®Google Scholar
Battese, G. E., R. M. Harter, and W. A. Fuller. 1988. An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association 83 (401):28–36. doi:10.1080/01621459.1988.10478561.
Web of Science ®Google Scholar
Chinchilli, V., and R. Elswick. 1985. A mixture of the Manova and Gmanova models. Communication in Statistics- Theory and Methods 14 (12):3075–89. doi:10.1080/03610928508829096.
Web of Science ®Google Scholar
Fay, R. E., and R. A. Herriot. 1979. Estimates of income for small places: an application of James-Stein procedures to census data. Journal of the American Statistical Association 74 (366a):269–77. doi:10.2307/2286322.
Web of Science ®Google Scholar
Ip, W.-C., M.-X. Wu, S.-G. Wang, and H. Wong. 2007. Estimation for parameters of interest in random effects growth curve models. Journal of Multivariate Analysis 98 (2):317–27. doi:10.1016/j.jmva.2006.04.001.
Web of Science ®Google Scholar
Kollo, T., and D. von Rosen. 2005. Advanced Multivariate Statistics with Matrices. Dordrecht: Springer.
Google Scholar
Molenberghs, G., and G. Verbeke. 2011. A note on a hierarchical interpretation for negative variance components. Statistical Modelling: An International Journal 11 (5):389–408. doi:10.1177/1471082X1001100501.
Web of Science ®Google Scholar
Nelder, J. A. 1954. The interpretation of negative components of variance. Biometrika 41 (3–4):544–8. doi:10.2307/2332734.
Web of Science ®Google Scholar
Potthoff, R. F., and S. N. Roy. 1964. A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51 (3–4):313–26. doi:10.2307/2334137.
Web of Science ®Google Scholar
Rao, C. R. 1979. Separation theorems for singular values of matrices and their applications in multi- variate analysis. Journal of Multivariate Analysis 9 (3):362–77. doi:10.1016/0047-259X(79)90094-0.
Web of Science ®Google Scholar
Rao, J. N. K., and I. Molina. 2015. Small Area Estimation. 2nd ed. Hoboken, NJ: Wiley.
Google Scholar
Reinsel, G. C., and R. P. Velu. 1998. Multivariate Reduced-Rank regression. Theory and applications. Lect. Notes stat. 136, New York: Springer-Verlag.
Google Scholar
von Rosen, D. 1989. Maximum likelihood estimators in multivariate linear normal models. Journal of Multivariate Analysis 31 (2):187–200. doi:10.1016/0047-259X(89)90061-4.
Web of Science ®Google Scholar
von Rosen. D. 2018. Bilinear Regression Analysis: An Introduction. Lect. Notes stat. 220, New York: Springer-Verlag.
Google Scholar
von Rosen, T., and D. von Rosen. 2017. On estimation in some reduced rank extended growth curve models. Mathematical Methods of Statistics 26 (4):299–310. doi:10.3103/S1066530717040044.
Web of Science ®Google Scholar
Srivastava, M. S., and C. G. Khatri. 1979. An Introduction to Multivariate Statistics. New York: North-Holland.
Google Scholar
Verbyla, A. P., and W. N. Venables. 1988. An extension of the growth curve model. Biometrika 75 (1):129–38. doi:10.1093/biomet/75.1.129.
Web of Science ®Google Scholar
Yokoyama, T. 1995. Statistical inference on some mixed MANOVA - GMANOVA models with random effects. Hiroshima Mathematical Journal 25 (3):441–74. doi:10.32917/hmj/1206127624.
Google Scholar
Yokoyama, T., and Y. Fujikoshi. 1992. Tests for random-effects covariance structures in the growth curve model with covariates. Hiroshima Mathematical Journal 22 (1):195–202. doi:10.32917/hmj/1206128618.
Google Scholar

Small area estimation using reduced rank regression models

Abstract

1. Introduction

2. Background

3. Detailed model specification

4. Estimation of parameters

5. Prediction

References

Information for

Open access

Opportunities

Help and information

Small area estimation using reduced rank regression models

Abstract

1. Introduction

2. Background

3. Detailed model specification

4. Estimation of parameters

5. Prediction

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date