Search in:

Statistical Theory and Related Fields Volume 3, 2019 - Issue 2

Submit an article Journal homepage

Free access

368

Views

CrossRef citations to date

Altmetric

Listen

Articles

Generalised variance functions for longitudinal survey data

Guoyi ZhangDepartment of Mathematics and Statistics, University of New Mexico, Albuquerque, NM, USAView further author information

Yang ChengSubstance Abuse and Mental Health Administration, Rockville, MD, USAView further author information

Yan LuDepartment of Mathematics and Statistics, University of New Mexico, Albuquerque, NM, USACorrespondence[email protected]
View further author information

Pages 150-157 | Received 02 Nov 2018, Accepted 03 Sep 2019, Published online: 13 Sep 2019

Cite this article
https://doi.org/10.1080/24754269.2019.1664372
CrossMark

In this article

ABSTRACT
1. Introduction
2. Generalised variance function model
3. Longitudinal generalised variance functions
4. Implementation with CPS
5. Conclusions
Acknowledgements
Disclosure statement
Additional information
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In this research, we propose longitudinal generalised variance functions (LGVFs) to produce convenient estimates of variances by incorporating time effect into modelling. Asymptotic properties of some certain type of estimators are investigated. Simulation studies and implementation of the proposed methods to Current Population Survey (CPS) data show that LGVFs work well in producing standard error estimates.

KEYWORDS:

CPS
design effect
generalised variance function
longitudinal generalised variance function
simulation

1. Introduction

In many large-scale sample surveys such as the CPS or the Canadian Labour Force Survey (CLFS), thousands of estimates need to be reported. Calculating standard error for each published estimator involves a large amount of work. In addition, standard error estimates that are not provided by public-use files may also be needed. In a generalised variance function (GVF), we first estimate variances for totals of a group of variables by using balanced repeated replication (BRR), Taylor Series Linearisation (TSL) or other methods. Interested readers in variance estimation of a sample survey can refer to Cohen (Citation1979), Burt and Cohen (Citation1984), Rao and Wu (Citation1988), Rao (Citation1988), and Wolter (Citation2007). Next, we postulate a regression model relating the variance with the estimated totals and derive a fitted regression line for the purpose of predicting the standard errors of potential survey statistics. The GVF method saves a lot of time to produce the government reports.

Johnson and King (Citation1987) studied GVF estimators using a national survey of reading ability among young adults and found out that one way to markedly improve upon the GVF model is to use the prior information about the design effect (deff) of an individual estimator. Valliant (Citation1987) proved that the GVF model produces consistent estimates of the variance for a certain class of superpopulation models. He also mentioned that if the deffs for the group of estimated totals are similar, the GVF variances were often more stable than the direct estimate, as they smooth out some of the variability from variable to variable.

Many current surveys follow the same households at regular time intervals. GVF could be applied to analyse longitudinal data by treating population total as a constant over years. However, as Figure shows, the Census population from 1900 to 2010 exhibits a linear or slight exponential growth trend. Thompson (Citation2015) discussed approaches to incorporate complex designs in longitudinal data inference, as well as the complications introduced by time-in-sample effects. On the other hand, separate GVFs for each year sounds no longer a wise choice as we have longitudinal data. Shook-Sa, Heller, Williams, Couzens, and Berzofsky (Citation2013) mentioned, separate GVFs are currently needed for each year in National Crime Victimization Survey (NCVS), which makes it difficult to manage the analysis. All these request a new method that can produce a convenient formula to estimate the standard errors for longitudinal data. The fitted longitudinal model is expected to be used to predict standard errors of interested variables in the future without estimating GVF parameters.

Figure 1. U.S. population from 1900 to 2010.

In this research, we propose longitudinal generalised variance functions (LGVFs) by incorporating time effect into modelling. In Section 2, we review the GVF model. In Section 3, we set up a framework, propose LGVFs and derive asymptotic properties of the proposed estimators. Section 4 gives simulation studies and implementation of LGVFs with CPS data. Section 5 gives the conclusion of the research.

2. Generalised variance function model

In this section, we briefly review the GVF models. More detailed description can be found in textbooks from Wolter (Citation2007) and Lohr (Citation2010).

Let $\hat{T}$ be a survey statistic, for example, the estimated number of persons employed. Let $\hat{p}$ be an estimated proportion of employment, with $\hat{p} = \hat{T} / M$ , where M is the population total from the U.S. Census Bureau. Let d be the design effect of $\hat{p}$ and m be the sample size. We have $var (\hat{p}) = d \times p (1 - p) / m .$ Define relative variance (relvar) of $\hat{p}$ as $relvar (\hat{p}) = \frac{var (\hat{p})}{[E (\hat{p})]^{2}} = a + \frac{b}{E (\hat{T})},$ where $a = - d / m$ and $b = M d / m$ . Let υ be the estimate of relvar of $\hat{p}$ , i.e., $υ = \hat{var} (\hat{p}) / {\hat{p}}^{2}$ . Postulate a regression model relating a set of $υ_{i}$ to ${\hat{T}}_{i}, i = 1, 2, \dots, m$ by $υ_{i} = a + b / {\hat{T}}_{i}$ . Let $\hat{a}$ and $\hat{b}$ be the regression estimates of a and b. The GVF relative variance is predicted by the fitted regression function $\hat{a} + \hat{b} / {\hat{T}}_{i}$ . A GVF estimate for $var (\hat{T})$ is given by the following function: (1) $\hat{var} (\hat{T}) = \hat{a} {\hat{T}}^{2} + \hat{b} \hat{T} .$ (1)

3. Longitudinal generalised variance functions

GVF has been widely used for a long time by many large-scale surveys because of the advantages of time saving and stability of the estimators. For example, it has been used by the CPS since 1947 (U.S. Census Bureau, Citation2006). In this section, we introduce the framework of our research and propose longitudinal generalised variance functions (LGVFs) by incorporating time effects. Properties of certain type of estimators are investigated.

3.1. Framework

Much of the notation in this section follows from Valliant (Citation1987). The main difference is that we have added index $t, t = 1, 2, \dots, τ$ for time periods 1 to τ. In a stratified two-stage cluster sampling, we define h as the index for stratum, i as the index for primary sampling unit (psu), and j as the index for secondary sampling units (ssu) within the psu. At the psu level, let $N_{t}$ be the number of psus in the population at time t, $N_{t h}$ be the number of psus in stratum h at time t, so that $N_{t} = \sum_{h = 1}^{H} N_{t h}$ . At the ssu level, let $M_{t h i}$ be the number of ssus in psu i within stratum h at time t, so that the total number of units in stratum h at time t is $M_{t h} = \sum_{i = 1}^{N_{h}} M_{t h i}$ , and the total number of ssus in the population at time t is $M_{t} = \sum_{h = 1}^{H} M_{t h}$ .

Accordingly, at time t, let $n_{t}$ be the number of psus in the sample. Let $n_{t} = \sum_{h = 1}^{H} n_{t h}$ , where $n_{t h}$ is the number of psus in the sample within stratum h. Assume that $n_{t} = n$ for $t = 1, 2, \dots, τ$ . Let $m_{t h i}$ be the number of elements in the sample from ith psu within stratum h. As a result, $m_{t h} = \sum_{i = 1}^{n_{h}} m_{t h i}$ , and the total number of units in a sample over all strata $m_{t} = \sum_{h = 1}^{H} m_{t h}$ . At time t, let $S_{t h}$ be the set of sampled psu in stratum h, $R_{t h}$ be the set of nonsampled psu in stratum h, and $S_{t h i}$ and $R_{t h i}$ be the set of sampled and nonsampled units within psu i in stratum h.

Using a combined inference framework, assume a random variable $y_{t h i j}$ is associated with each unit in the population at time t. The finite population total at time t is $T_{t} = \sum_{h = 1}^{H} \sum_{i = 1}^{N_{h t}} \sum_{j = 1}^{M_{t h i}} y_{t h i j}$ . A general type of the estimator $T_{t}$ can be written as (2) ${\hat{T}}_{t} = \sum_{h} \sum_{i \in S_{t h}} γ_{t h i} {\hat{T}}_{t h i},$ (2) where $γ_{t h i}$ is the coefficient, ${\bar{y}}_{t h i} = \sum_{j \in S_{t h i}} y_{t h i j} / m_{t h i}$ and ${\hat{T}}_{t h i} = M_{t h i} {\bar{y}}_{t h i}$ , which estimates $T_{t h i} = \sum_{j = 1}^{M_{t h i}} y_{t h i j}$ . For example, the Horvitz–Thompson estimator when psus are selected with probabilities proportional to $M_{t h i}$ , and an equal probability sample is selected within each sampled psu at time t can be written as follows: (3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) where $γ_{t h i} = M_{t h} (n_{t h} M_{t h i})^{- 1}$ .

The following model assumptions can be applied for prediction purposes: (4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) Similar formulations can be found from Scott and Smith (Citation1969), Royall (Citation1976, Citation1986), and Burdick and Sielken (Citation1979). We can also apply more complex models such as the one in Cook and Pocock (Citation1983), time series models, or stochastic models. The general variance estimator of $var ({\hat{T}}_{t})$ to be studied is based on the one proposed by Royall (Citation1986): (5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) where $r_{t h i} = {\hat{T}}_{t h i} - (\sum_{S_{t h}} γ_{t h j} {\hat{T}}_{t h j} / M_{t h}) M_{t h i}$ , and $γ_{t h i}$ is defined in Equation (Equation2(2) ${\hat{T}}_{t} = \sum_{h} \sum_{i \in S_{t h}} γ_{t h i} {\hat{T}}_{t h i},$ (2) ).

Let $k_{t h i} = [1 + (m_{t h i} - 1) ρ_{t h i}] / m_{t h i}$ . Under the condition that $σ_{t h i}^{2} = σ_{t h}^{2} = α_{1 t h} μ_{t h} + α_{2 t h} μ_{t h}^{2}$ , we can show that (6) $\begin{aligned} relvar ({\hat{T}}_{t}) \\ \approx \sum_{h} π_{t h}^{2} α_{2 t h} M_{t h}^{- 2} \sum_{S_{t h}} γ_{t h i}^{2} k_{t h i} M_{t h i}^{2} \\ + [\sum_{h} π_{t h} α_{1 t h} M_{t h}^{- 2} \sum_{S_{t h}} γ_{t h i}^{2} k_{t h i} M_{t h i}^{2}] / E ({\hat{T}}_{t}) \\ = a_{t} + b_{t} / E ({\hat{T}}_{t}), \end{aligned}$ (6) where $π_{t h} = E ({\hat{T}}_{t h}) / E ({\hat{T}}_{t})$ .

3.2. The time effect model

In this section, we propose LGVFs by incorporating time effects. Let V be the number of variables for GVF and LGVF calculation. Let τ be the number of time periods we consider for LGVF. Let $θ = (a, b)^{'}$ be the LGVF parameters we want to estimate. The V variables together with τ time periods provide $V τ$ observations for regression parameters a and b estimation. Let time effect $e_{t} = M_{t} / \bar{M}$ , where $\bar{M} = M_{1} + M_{2} + \dots + M_{τ}$ . Let $a_{t v} = - d_{t v} / m$ , and $b_{t v} = \bar{M} d_{t v} / m$ . By Equation (Equation1(1) $\hat{var} (\hat{T}) = \hat{a} {\hat{T}}^{2} + \hat{b} \hat{T} .$ (1) ), $\begin{aligned} \hat{var} ({\hat{T}}_{t v}) & = \frac{- d_{t v}}{m_{t}} {\hat{T}}^{2} + \frac{M_{t} d_{t v}}{m} {\hat{T}}_{t v} \\ = \frac{- d_{t v}}{m_{t}} {\hat{T}}^{2} + \frac{M_{t} \bar{M} d_{t v}}{\bar{M} m} {\hat{T}}_{t v} \\ = a_{t v} {\hat{T}}_{t v}^{2} + e_{t} b_{t v} {\hat{T}}_{t v} . \end{aligned}$ As in GVF, we define a set of relative variances $υ_{t v}$ for $t = 1, 2, \dots, τ$ and $v = 1, 2, \dots, V$ . We now have (7) $υ_{t v} = a_{t v} + b_{t v} \cdot \frac{e_{t}}{{\hat{T}}_{t v}} .$ (7) Let $υ_{t} = (υ_{t 1}, v_{t 2}, \dots, υ_{t V})^{'}$ , $υ = (υ_{1}^{'}, \dots, υ_{τ}^{'})^{'}$ , $ϵ_{t} = (ϵ_{t 1}, \dots, ϵ_{t V})^{'}$ , and $ϵ = (ϵ_{1}^{'}, \dots, ϵ_{τ}^{'})^{'}$ . Now define $X_{t}$ as the $V \times 2$ design matrix for time t with the first column 1s and second column $(e_{t} / {\hat{T}}_{t 1}, e_{t} / {\hat{T}}_{t 2}, \dots, e_{t} / {\hat{T}}_{t V})^{'}$ . Let $X$ be the design matrix with $X = (X_{1}^{'}, \dots, X_{τ}^{'})^{'}$ . Under the condition that $a_{t v} = a_{t} = a$ and $b_{t v} = b_{t} = b$ for $t = 1, 2, \dots, τ$ , $v = 1, 2, \dots, V$ , time effect model (Equation7(7) $υ_{t v} = a_{t v} + b_{t v} \cdot \frac{e_{t}}{{\hat{T}}_{t v}} .$ (7) ) can be written in the matrix form as follows: (8) $υ = X θ + ϵ .$ (8) The weighted least square estimators of $θ$ is $\hat{θ} = (X^{'} W X)^{- 1} X^{'} W υ$ , where $w_{t v}$ is the weight associated with variable v at time t, and $W$ is a $V τ \times V τ$ matrix with the diagonal element $w_{t v}$ . $w_{t v}$ is usually chosen as the reciprocal of variance of $υ_{t v}$ when they are known. Otherwise, we can approximate the weight by reciprocal of squared $υ_{t v}$ .

Consider data pairs $(υ_{t v}, {\hat{T}}_{t v})$ for $t = 1, 2, \dots, τ$ , $v = 1, 2, \dots, V$ . We can derive the following estimators for a and b: (9) $\hat{b} = \frac{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} υ_{t v} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}] / w_{t v}}{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}]^{2} / w_{t v}} = {\hat{S}}_{1} / {\hat{S}}_{2}$ (9) and (10) $\hat{a} = \bar{υ} - \hat{b} {\bar{T}}_{-},$ (10) where ${\bar{T}}_{-} = \sum_{t, v} (e_{t}^{- 1} {\hat{T}}_{t v} w_{t v})^{- 1} / \sum_{t, v} w_{t v}^{- 1}$ , $\bar{υ} = \sum_{t, v} υ_{t v} w_{t v}^{- 1} / \sum_{t, v} w_{t v}^{- 1}$ , ${\hat{S}}_{1} = \sum_{t = 1}^{τ} \sum_{v = 1}^{V} υ_{t v} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}] / w_{t v}$ , and ${\hat{S}}_{2} = \sum_{t = 1}^{τ} \sum_{v = 1}^{V} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}]^{2} / w_{t v}$ . The predicted relvariance of ${\hat{T}}_{t v}$ based on the estimated LGVF is (11) ${\hat{υ}}_{t v} = \bar{υ} + \hat{b} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}] .$ (11) Note that model (Equation8(8) $υ = X θ + ϵ .$ (8) ) only incorporate $e_{t}$ . It doesn't specify what kind of time effect that $e_{t}$ has. The simplest case of $e_{t}$ could be $e_{t} = M_{t} / \bar{M}$ , where $M_{t}$ is the population total from the U.S. Census Bureau without introducing any modelling. We can also incorporate linear time effect as illustrated in Figure by the following example.

Example

Linear time effect LGVF.

Figure shows that the U.S. population size increased dramatically with a linear trend during the years of 1990–2010. We now fit a simple linear regression model for the population size $M_{t}$ growth over time t as follows: (12) $M_{t} = β_{0} + β_{1} t .$ (12) By the fact that ${\hat{β}}_{0} = \bar{M} - {\hat{β}}_{1} \bar{t}$ , we have ${\hat{e}}_{t} = \frac{{\hat{M}}_{t}}{\bar{M}} = \frac{\bar{M} + {\hat{β}}_{1} (t - \bar{t})}{\bar{M}} = 1 + \frac{{\hat{β}}_{1}}{\bar{M}} (t - \bar{t}) .$ Replacing $e_{t}$ in (Equation9(9) $\hat{b} = \frac{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} υ_{t v} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}] / w_{t v}}{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}]^{2} / w_{t v}} = {\hat{S}}_{1} / {\hat{S}}_{2}$ (9) ) and (Equation10(10) $\hat{a} = \bar{υ} - \hat{b} {\bar{T}}_{-},$ (10) ) by $1 + {\hat{β}}_{1} (t - \bar{t}) / \bar{M}$ , we have the LGVF estimates for linear time model (Equation12(12) $M_{t} = β_{0} + β_{1} t .$ (12) ).

3.3. Properties of proposed estimators

In this section, we consider a certain type of estimators such that $γ_{t h i}$ in Equation (Equation2(2) ${\hat{T}}_{t} = \sum_{h} \sum_{i \in S_{t h}} γ_{t h i} {\hat{T}}_{t h i},$ (2) ) is with a structure of $γ_{t h i} = g_{1 t h} g_{2 t h i}$ . For example, $g_{1 t h} = M_{t h} / n_{t h}$ for ${\hat{T}}_{H T}$ in Equation (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ). Under assumptions (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ), given estimators with the structure of $γ_{t h i} = g_{1 t h} g_{2 t h i}$ , asymptotic properties of ${\hat{T}}_{t v}$ , $s_{\hat{T_{t v}}}^{2}$ , and ${\hat{υ}}_{v}$ can be derived when the number of psus in each stratum is large. Lemmas A.1–A.3 (refer to Appendix) are extensions of work by Royall (Citation1986). Under certain conditions, Theorem 3.1 shows that ratios of relative variances and predicted relative variances from proposed LGVFs converge in probability to 1 (refer to Appendix for proof). The asymptotic normality then follows immediately.

Theorem 3.1

Under model (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ), assumptions (i) to (xiii), $μ_{4 t h i} = E [{\hat{T}}_{t h i} - E ({\hat{T}}_{t h i})]^{4} < \infty,$ $a_{t v} = a_{t} = a,$ and $b_{t v} = b_{t} = b$ for $t = 1, 2, \dots, τ,$ $v = 1, 2, \dots, V,$ as $N_{t h}, n_{t h} \to \infty,$ $\frac{relvar ({\hat{T}}_{t v} - T_{t v})}{{\hat{υ}}_{v}} \overset{p}{\to} 1.$

Proof.

Proof is given in Appendix.

Theorem 3.2

Under model (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ), assumptions (i) to (xiii), $μ_{4 t h i} = E [{\hat{T}}_{t h i} - E ({\hat{T}}_{t h i})]^{4} < \infty,$ $a_{t v} = a_{t} = a,$ and $b_{t v} = b_{t} = b$ for $t = 1, 2, \dots, τ,$ $v = 1, 2, \dots, V,$ as $N_{t h}, n_{t h} \to \infty,$ $\frac{{\hat{T}}_{t v} - T_{t v}}{{\hat{T}}_{t v} ({\hat{υ}}_{v})^{1 / 2}} \overset{d}{\to} N (0, 1) .$

Proof.

The proof is a straightforward extension of work by Royall (Citation1986).

4. Implementation with CPS

In this section, we first use CPS annual social and economic supplement (ASEC) data as a population to perform simulation studies. Next, we apply the proposed methods to analyse ASEC data in conjunction with ASEC public use replicate weight file (ASECREP). The corresponding ASECREP data are merged with ASEC by the link variables $h_{s e q}$ (household sequence number) and pppos (trailer portion of unique household ID) for variance estimation purpose in data application. The ASECREP data have weights for the variables according to 160 replications, which are used to calculate variances. Nineteen binary variables from the ‘Source of Income’ section are initially considered, such as self-employment or not, unemployment compensation or not, and so on. Specifically, they are finc_ws, finc_se, finc_fr, finc_uc, finc_wc, finc_ss, finc_ssi, finc_paw, finc_vet, finc_sur, finc_dis, finc_ret, finc_int, finc_div, finc_rnt, finc_ed, finc_csp, finc_fin, and finc_oi.

A person's value of a binary variable is 1 if the person had a particular characteristic, and is 0 otherwise. By examining 2009 ASEC data, the mean of deffs of the 19 variables is 3.754811, and the range of deffs is from 1.593687 to 6.329467. We removed two variables with the low deffs: finc_ss with deff of 1.593687, and finc_sur with deff of 1.809737. We also removed three variables with high deffs: finc_int with deff of 6.329467; finc_div with deff of 6.073850, and finc_fin with deff of 5.559943. The remaining 14 variables are relatively similar regarding deffs with a mean of 3.569623, and a narrower range from 2.059918 to 5.424058. These 14 binary variables are used to construct GVFs (using 2011 ASEC data) and LGVFs (using 2008 to 2010 ASEC data). In the simulation study, we removed variable finc_ws due to its very low relative variance. We restricted our analysis to the state of New Mexico when we apply ASEC and ASECREP data.

4.1. Simulation studies

We treat 2008 (2059 observations), 2009 (2188 observations), and 2010 (2108 observations) ASEC data in New Mexico as finite population. Each household was associated with an ultimate sampling unit (USU) defined for the CPS. However, the USU information is not released to public. To mimic the design, we sorted households from the smallest sequence number to the largest one within each year and combined four households as a USU according to order. This results in 205, 208, and 193 USUs, respectively. Simulation is performed with the following steps:

Within each year, we select n=40 (about 20% sampling rate) and n=100 (about 50% sampling rate) USUs with probabilities proportional to size (PPS) and select $m_{i} = 4$ individuals within selected USU i with equal probability.
Calculate estimates for the three samples (2008, 2009 and 2010). Total for variable v at time t is estimated by the Horvitz–Thompson estimator in Equation (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ), denoted by ${\hat{T}}_{t v}$ ; variance is estimated by Equation (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ), denoted by $s_{{\hat{T}}_{t v}}^{2}$ ; relative variance (relvar) is calculated as $υ_{t v} = s_{{\hat{T}}_{t v}}^{2} / {\hat{T}}_{t v}^{2}$ .
Apply time adjustment $e_{t}$ to estimates from step (b) using $e_{t} / {\hat{T}}_{t v}$ , where $e_{1} = M_{1} / \bar{M} = 1, 978, 390 / 1, 967, 487 = 1.0056$ (for year 2009); $e_{2} = M_{2} / \bar{M} = 1, 977, 807 / 1, 967, 487 = 1.0052$ (for year 2010); and $e_{3} = M_{3} / \bar{M} = 1, 946, 264 / 1, 967, 487 = 0.9892$ (for year 2008).
Apply regression model (Equation7(7) $υ_{t v} = a_{t v} + b_{t v} \cdot \frac{e_{t}}{{\hat{T}}_{t v}} .$ (7) ) with fitting methods LGVF1 (ordinary linear regression) and LGVF2 (weighted least squares with $w_{t v} = 1 / υ_{t v}^{2}$ ).
Record relvar calculated by using formulas (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ) and (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ); record relvar calculated by using fitted values from LGVF1 and LGVF2 (LGVFs 1–2); and record standard errors of the fitted relvar by LGVFs 1–2.
Repeat (a)–(e) for 2000 times. For each variable, record average values of the relvar calculated by Equations (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ) and (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ) (treated as true relvar); record average values of the relvar estimated by fitted values using LGVFs 1–2 (treated as estimated value of relvar); record sampling variance of relvar calculated by Equations (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ) and (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ) (treated as true variance of relvar); and record average standard errors of the fitted relvar by LGVFs 1–2 (estimated variance of relvar).

Simulation results of both cases: PPS 40 USUs and PPS 100 USUs are very close to each other, so we will only report results from the case of PPS 100 USUs. The case of PPS 100 USUs performs slightly better than the other case regarding bias and variance. This is very reasonable as Theorems 3.1 and 3.2 require large $N_{t h}$ and $n_{t h}$ . Figure is the plot of logs of relvars by Equations (Equation3(3) ${\hat{T}}_{t, H T} = \sum_{h} \sum_{i \in S_{t h}} [M_{t h} (n_{t h} M_{t h i})^{- 1} {\hat{T}}_{t h i}],$ (3) ) and (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ) (solid line, treated as true values) and estimates from LGVFs 1–2 (dashed line and dotted line) plotted versus logs of population totals. From the plot, we can see that LGVF2 works very well in estimating the true relvars. LGVF1 deviates from the true value quite a bit when the population total of the variables is large.

Figure 2. Logs of estimates of relvar plotted versus logs of population totals.

To see how precise our LGVF estimators are, we also plot the ratios of standard error estimates of relvars by LGVFs 1–2 to the standard error estimate of relvars calculated by sampling variability from simulations (see Figure ). Ratios less than 1 indicate that an LGVF is more precise than the sampling variability by simulations. LGVF2 is doing perfect in estimating the variance of relvar with none of the ratios greater than 1. While not surprisingly, LGVF1 has large variance when population total of the variable is large. But LGVF1 is also doing Okay.

Figure 3. Ratio of the SEs (LGVF1–LGVF2/relvar) versus log (totals).

We also investigated the histograms of the binary variables to see how Theorem 3.2 works. We observe that asymptotic normality reveals well with high proportion variable such as finc_se (with a total of 211). While for small total variable such as finc_dis with a total of only 10, the histogram is highly skewed to the left as samples with small totals are frequently selected.

4.2. Data analysis: apply LGVF to the full 2008–2010 data

In this section, we apply our methods to the full data set, which we used as population in simulation studies. The 14 binary variables used to construct GVFs and LGVFs are from the ‘Source of Income’ section of ASEC without the two lowest and three highest deff scores as we discussed in Section 4. In data analysis, variance is calculated by using replicate weights and a formula provided by ASECPEP user's manual (U.S. Census Bureau, Citation2009): $var ({\hat{p}}_{i}) = 4 * \sum_{i = 1}^{160} ({\hat{p}}_{i} - {\hat{p}}_{0})^{2} / 160$ , where ${\hat{p}}_{0}$ is calculated using weights ‘PWWGT0’ for full data, and ${\hat{p}}_{i}, i = 1, 2, \dots, 160$ are calculated by using the 160 replicate weights ‘PWWGT1’ to ‘PWWGT160’. This is essentially the empirical variance adjusted by a factor of 4. The estimated totals ${\hat{T}}_{t v}$ are calculated by using full data weights ‘PWWGT0’. We then postulate a regression function on the relative variances and the adjusted estimated totals $e_{t} / {\hat{T}}_{t v}$ to derive the estimates of the regression parameters a and b.

Regression fitting methods, LGVF1: ordinary linear regression, LGVF2: weighted least squares with $w_{t v} = 1 / υ_{t v}^{2}$ , and LGVF3: data after log transformation on both y and x are applied. Figure is the plot of logs of estimates of relative variances and the estimates from the LGVFs 1–3 plotted versus logs of population totals. From the plot, we can see that LGVF2 (dotted line) seems to mimic the relative variances most closely. LGVF1 (dashed line) and LGVF3 (dash-dotted line) also work fine, with the tail a little off from the black line.

Figure 4. Logs of estimates of relvar plotted versus logs of population totals.

We also plot the ratios of standard error estimates of relative variances by LGVFs 1–3 to the standard error estimate of relative variance by using replicate weights to see how precise the proposed LGVF estimators are. Ratios less than 1 indicate that an LGVF is more precise than υ. LGVF1 and LGVF2 are both precise with none of the ratios greater than 1. The variance of relvar from some variables estimated by using LGVF3 is less precise than the variance estimated using replicate weights, but LGVF3 is also doing well (Figure ).

Figure 5. Ratio of the SEs (LGVF1–LGVF3/relvar) versus log(totals).

Next, we use LGVF models to predict the relative variances of the year 2011 data. These relative variances can be calculated by the replicate weights as we have done before, which are treated as direct calculated relvar. Figure shows the prediction are quite good, with LGVF2 performing the best.

Figure 6. Logs of predicted relvar by using LGVF1–3 plotted versus logs of population totals (11 March).

4.3. Comparison of GVF with LGVF

In this section, we do a brief comparison study of the performance of GVF and LGVF. The GVF models are constructed by the year 2011 data, while the LGVF models are built by using three years (2008–2010) of data with time adjustment. The same regression fitting methods: ordinary least squares (Method 1), weighted least squares (Method 2), and log transformations (Method 3) are applied to GVF modelling. Accordingly, they are called GVF1, GVF2, and GVF3. Fourteen variables are used to construct GVFs and LGVFs, and the remaining five variables are used for predicting. Mean-squared prediction errors are calculated as the average of the sum of the squares of the difference between predicted values and observed values.

Table shows that LGVFs have smaller mean-squared prediction errors. When predicting the five remaining variables, GVFs and LGVFs do not make much difference. However, the case of predicting 19 variables in 2011 is standing out with SE of 0.003108 by LGVF2 compared to SE of 0.01781 by GVF1. It is quite exciting since this is the most common case we want to apply the LGVF methods. That is, we want to use a few years of data to build the LGVF model to make a prediction for future years. Since design effects do not change much over years, therefore, combining the variables from 2009 with the same variables from 2008 and 2010 should result in reasonable results. We also incorporated $e_{t}$ to adjust for time effect for a longitudinal issue. The LGVF methods, particularly LGVF2, perform very well regarding mean-squared prediction errors.

Table 1. Comparison of GVF and LGVF.

Download CSV Display Table

5. Conclusions

In this research, we extended the Generalised Variance Functions (GVFs) to Longitudinal Generalised Variance Functions (LGVFs), which reduce to GVFs when data are cross-sectional. We incorporated time effect into modelling to adjust the dynamic time changes over the years. We show that ratios of relative variances and predicted relative variances from the proposed LGVFs converge in probability to 1 under certain conditions. Based on simulation studies, we would suggest using LGVF2 (the weighted least square regression fitting) to predict relative variances of the variables as it has smaller bias and variance compared to the other two methods. Data application to ASEC supplements using replicate weights provided by ASECPEP reveals similar findings. A comparison study between LGVF and GVF also show that LGVF is efficient in reducing the mean squared prediction errors. Future research may consider adopting mixed models and nonparametric smoothing methods for regression model fitting. In both mixed model and nonparametric application, we can add the prior design effect information into models. This may markedly improve our model as suggested by Johnson and King (Citation1987).

Acknowledgements

The authors thank the referees for helpful comments and constructive suggestions to improve the manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Guoyi Zhang

Dr Guoyi Zhang is an Associate Professor of the Department of Mathematics and Statistics at the University of New Mexico. His research areas are in nonparametric function estimation, statistical computing, and survey sampling.

Yang Cheng

Dr Yang Cheng is a Senior Mathematical Statistician at the Substance Abuse and Mental Health Administration. He works on the areas of sample designs, weighting structure, statistical estimation and modelling.

Yan Lu

Dr Yan Lu is an Associate Professor of the Department of Mathematics and Statistics at the University of New Mexico. Her research areas are in survey sampling and mixed Models.

References

Burdick, R. K., & Sielken Jr., R. L. (1979). Variance estimation based on a superpopulation model in two-stage sampling. Journal of the American Statistical Association, 74, 438–440.
Web of Science ®Google Scholar
Burt, V., & Cohen, S. B. (1984). A comparison of methods to approximate standard errors for complex survey data. Review of Public Data Use, 12, 159–168.
PubMedGoogle Scholar
Cohen, S. B. (1979). An assessment of curve smoothing strategies which yield variance estimates from complex survey. Proceedings of the Survey Research Methods Section of the American Statistical Association, Washington, DC.
Google Scholar
Cook, D. G., & Pocock, S. J. (1983). Multiple regression in geo-graphical mortality studies with allowance for spatially correlated errors. Biometrics, 39, 361–371. doi: 10.2307/2531009
PubMed Web of Science ®Google Scholar
Johnson, E. G., & King, B. F. (1987). Generalized variance functions for a complex sample survey. Journal of Official Statistics, 3, 235–250.
Google Scholar
Lohr, S. (2010). Sampling: Design and analysis (2nd ed.). Boston, MA: Cengage Learning.
Google Scholar
Rao, J. N. K. (1988). Variance estimation in sample surveys. In P. R. Krishnaiah & C. R. Rao (Eds.), Handbook of statistics (Vol. 6, pp. 427–447). Amsterdam: Elsevier Science Publishers B.V.
Google Scholar
Rao, J. N. K., & Wu, C. F. J. (1988). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231–241. doi: 10.1080/01621459.1988.10478591
Web of Science ®Google Scholar
Royall, R. M. (1976). The linear least squares prediction approach to two-stage sampling. Journal of the American Statistical Association, 71, 657–664. doi: 10.1080/01621459.1976.10481542
Web of Science ®Google Scholar
Royall, R. M. (1986). The prediction approach to robust variance estimation in two-stage cluster sampling. Journal of the American Statistical Association, 81, 119–123. doi: 10.1080/01621459.1986.10478247
Web of Science ®Google Scholar
Scott, A. J., & Smith, T. M. F. (1969). Estimation in multi-stage surveys. Journal of the American Statistical Association, 64, 830–840. doi: 10.1080/01621459.1969.10501015
Web of Science ®Google Scholar
Shook-Sa, B., Heller, D., Williams, R., Couzens, G. L., & Berzofsky, M. (2013). Comparing generalized variance functions to direct variance estimation for the national crime victimization survey. 2013 research conference, Federal Committee on Statistical Methodology (FCSM), Washington, DC.
Google Scholar
Thompson, M. E. (2015). Using longitudinal complex survey data. The Annual Review of Statistics and Its Application, 2, 305–320. doi: 10.1146/annurev-statistics-010814-020403
Web of Science ®Google Scholar
U.S. Census Bureau. (2006). Current population survey: Design and methodology (Technical Paper 66).
Google Scholar
U.S. Census Bureau. (2009). Estimating ASEC variances with replicate weights. Part 1: Instructions for using the ASEC public use replicate weight file to create ASEC variance estimates.
Google Scholar
Valliant, R. L. (1987). Generalized variance functions in stratified two-stage sampling. Journal of the American Statistical Association, 82, 499–508. doi: 10.1080/01621459.1987.10478454
Web of Science ®Google Scholar
Wolter, K. M. (2007). Introduction to variance estimation (2nd ed.). New York, NY: Spring-Verlag.
Google Scholar

Appendix

For each time period $t, t = 1, 2, \dots, τ$ , the following conditions apply as $n_{t h}, N_{t h} \to \infty$ for $h = 1, 2, \dots, H$ .

$n_{t h} / N_{t h} \to 0, m_{t h i} / γ_{t h i} \to 0 for i = 1, 2, \dots, N_{t h}$
$n_{t h} / n_{t} \to c_{1 t h}$
$N_{t h} / N_{t} \to c_{2 t h}$
$n g_{1 t h}^{2} n_{t h} / N_{t}^{2} \to c_{3 t h}$
$n_{t h}^{- 1} \sum_{i \in S_{t h}} g_{2 t h i}^{2} D_{1 t h i} \to V_{1 t h}$
$n_{t h}^{- 1} \sum_{i \in S_{t h}} D_{2 t h i} \to V_{2 t h}$
$(N_{t h} - n_{t h})^{- 1} \sum_{i \in R_{t h}} D_{3 t h i} \to V_{3 t h}$
$n_{t h}^{- 1} \sum_{i \in S_{t h}} g_{2 t h i} D_{4 t h i} \to V_{4 t h}$
$n_{t h}^{- 1} \sum_{i \in S_{t h}} g_{2 t h i}^{2} M_{t h i}^{2 l} \to c_{4 t h}^{(l)}, l = 0, 1$
$n_{t h}^{- 1} \sum_{i \in S_{t h}} (m_{t h i} / M_{t h i})^{2} D_{1 t h i} \to V_{5 t h}$

where

$D_{1 t h i} = M_{t h i}^{2} σ_{t h i}^{2} [1 + (m_{t h i} - 1) ρ_{t h i}] / m_{t h i}$
$D_{2 t h i} = (M_{t h i} - m_{t h i}) σ_{t h i}^{2} [1 + (M_{t h i} - m_{t h i} - 1) ρ_{t h i}]$
$D_{3 t h i} = M_{t h i} σ_{t h i}^{2} [1 + (M_{t h i} - 1) ρ_{t h i}]$
$D_{4 t h i} = M_{t h i} σ_{t h i}^{2} (M_{t h i} - m_{t h i}) ρ_{t h i}$

and

c_{1 t h}

through

c_{3 t h}

c_{4 t h}^{(l)}

V_{1 t h}

through

V_{5 t h}

are constants. By conditions (i)–(iii), we have

n_{t} / N_{t} \to 0

g_{1 t h}

and

g_{2 t h i}

have more specific forms related to the estimators. The above assumptions apply to each time period t. For all the time periods, the following conditions apply

$w_{t v} / d_{t v} \to ω_{v}$
$M_{t} / N_{t} \to \bar{M} / \bar{N}$
$E ({\hat{T}}_{t v}) / \bar{N} \to e_{t} ψ_{v} \to N_{t} / \bar{N} ψ_{v}$

where

{\hat{T}}_{t v}

is the estimator of total for variable v at time t,

ω_{v}

and

ψ_{v}

are constants.

Lemmas A.1–A.3 are extensions of work by Royall (Citation1986).

Lemma A.1

Under model (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ) and conditions (i) to (viii) in Appendix, $v a r ({\hat{T}}_{t} - T_{t}) \approx \sum_{h} \sum_{S_{t h}} γ_{t h i}^{2} D_{1 t h i},$ where $γ_{t h i}$ is defined in Equation (Equation2(2) ${\hat{T}}_{t} = \sum_{h} \sum_{i \in S_{t h}} γ_{t h i} {\hat{T}}_{t h i},$ (2) ), $D_{1 t h i}$ is defined in Appendix, and the symbol ≈ means ‘asymptotically equivalent to’.

Lemma A.2

Under model (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ) and conditions (i) to (ix), $u_{4 t h i} = E [({\hat{T}}_{t h i} - E ({\hat{T}}_{t h i})]^{4} < \infty,$ $γ_{t h i} / M_{t h} = o (n_{t h}),$ and $s_{{\hat{T}}_{t}}^{2}$ as defined in Equation (Equation5(5) $s_{{\hat{T}}_{t}}^{2} = \sum_{h} n_{t h} (n_{t h} - 1)^{- 1} \sum_{S_{t h}} γ_{t h i}^{2} r_{t h i}^{2},$ (5) ), we have $var ({\hat{T}}_{t} - T_{t}) / s_{{\hat{T}}_{t}}^{2} \overset{p}{\to} 1.$

Lemma A.3

Under model (Equation4(4) $\begin{aligned} E (y_{t h i j}) & = μ_{t h} \\ cov (y_{t h i j}, y_{t h^{'} i^{'} j^{'}}) & = \{\begin{cases} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j = j^{'} \\ ρ_{t h i} σ_{t h i}^{2} & if h = h^{'}, i = i^{'}, j \neq j^{'} \\ 0 & otherwise \end{cases} \end{aligned}$ (4) ) and conditions (i) to (viii) and (x), $u_{4 t h i} < \infty$ and the random variables ${\bar{y}}_{t h i} (h = 1, \dots, H, i = 1, \dots, n_{t h})$ are mutually independent at each time period t, then $\frac{{\hat{T}}_{t} - T_{t}}{s_{{\hat{T}}_{t}}} \overset{d}{\to} N (0, 1) .$

Proof of Theorem 3.1.

Proof of Theorem 3.1

Proof follows Valliant (Citation1987). We first prove that $relvar ({\hat{T}}_{t v} - T_{t v})$ has the same limit as $υ_{v}$ for any time period t. Next, we prove that the estimated ${\hat{υ}}_{v}$ converges to the same limit.

By Equation (Equation6(6) $\begin{aligned} relvar ({\hat{T}}_{t}) \\ \approx \sum_{h} π_{t h}^{2} α_{2 t h} M_{t h}^{- 2} \sum_{S_{t h}} γ_{t h i}^{2} k_{t h i} M_{t h i}^{2} \\ + [\sum_{h} π_{t h} α_{1 t h} M_{t h}^{- 2} \sum_{S_{t h}} γ_{t h i}^{2} k_{t h i} M_{t h i}^{2}] / E ({\hat{T}}_{t}) \\ = a_{t} + b_{t} / E ({\hat{T}}_{t}), \end{aligned}$ (6) ), adding the subscript v and time t, we have $relvar ({\hat{T}}_{t v} - T_{t v}) \approx a_{t v} + b_{t v} / E ({\hat{T}}_{t v}),$ where $a_{t v} = \sum_{h} (π_{t h v} / M_{t h})^{2} α_{2 t h v} \sum_{S_{h i}} γ_{t h i}^{2} k_{t h i v} M_{t h i}^{2},$ and $b_{t v} = \sum_{h} (π_{t h v} α_{1 t h v} / M_{t h}) \sum_{s_{h i}} γ_{t h i}^{2} k_{t h i v} M_{t h i}^{2} .$ By the definition of $π_{t h v}, k_{t h i v}, D_{1 t h i v}, γ_{t h i}$ and the assumption that $σ_{t h i v}^{2} = σ_{t h v}^{2}$ , together with conditions (iv), (v), (xii), and (xiii), $\begin{aligned} n_{t} a_{t v} & = \sum_{h} (n_{t} g_{1 t h}^{2} n_{t h} / N_{t}^{2}) α_{2 t h v} μ_{t h v}^{2} N_{t}^{2} E ({\hat{T}}_{t v})^{- 2} σ_{t h v}^{- 2} \\ \times (\sum_{S_{h}} g_{2 t h i}^{2} D_{1 t h i v} / n_{t h}) \\ \to \sum_{h} c_{3 t h} α_{2 t h v} μ_{t h v}^{2} ψ_{v}^{- 2} V_{1 t h v} σ_{t h v}^{- 2} = A_{t v} . \end{aligned}$ Similarly, $(\frac{n_{t}}{N_{t}}) b_{t v} \to \sum_{h} c_{3 t h} α_{1 t h v} μ_{t h v} ψ_{v}^{- 1} σ_{t h v}^{- 2} V_{1 t h v} = B_{t v} .$ Let $a_{t v} = a_{t} = a$ for all t and v. $A_{t v} = A$ for some constant A. $b_{t v} = b_{t} = b$ for all t and v, so $B_{t v} = B_{t} = B$ for some constant B. Therefore, $n_{t} relvar ({\hat{T}}_{t v} - T_{t v}) \to A + \frac{B}{ψ_{v}} .$ Next we'll show that $n_{t} υ_{t v}$ has the same limit. Lemma A.3 shows $\frac{{\hat{T}}_{t v} - E ({\hat{T}}_{t v})}{\bar{N}} \overset{p}{\to} 0.$ Therefore, $\frac{{\hat{T}}_{t v}}{\bar{N}} \overset{p}{\to} e_{t} ψ_{v} .$

Together with Lemmas A.1 and A.2, we have $\begin{aligned} n_{t} υ_{t v} & \to e_{t}^{- 2} ψ_{v}^{- 2} \frac{n_{t}}{N_{t}^{2}} e_{t}^{2} \sum_{h} n_{t h} g_{1 t h}^{2} V_{1 t h v} \\ \to ψ_{v}^{- 2} \sum_{h} c_{3 t h} V_{1 t h v} . \end{aligned}$ Now multiplying and dividing within the summation by $σ_{t h v}^{2} = α_{1 t h v} μ_{t h v} + α_{2 t h v} μ_{t h v}^{2}$ gives (A1) $\begin{aligned} n_{t} υ_{t v} & \overset{p}{\to} ψ_{v}^{- 2} \sum_{h} σ_{t h v}^{2} \sum_{h} \frac{c_{3 t h} V_{1 t h v}}{\sum_{h} σ_{t h v}^{2}} \\ \overset{p}{\to} \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{2 t h} μ_{t h v}^{2}}{σ_{t h v}^{2} ψ_{v}^{2}} + \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{1 t h} μ_{t h v}}{σ_{t h v}^{2} ψ_{v}^{2}} \\ \overset{p}{\to} A + \frac{B}{ψ_{v}} . \end{aligned}$ (A1) Next, we want to show that $n_{t} {\hat{υ}}_{t v} \overset{p}{\to} A + B / ψ_{v}$ to complete the proof. Recall that $n_{t} = n$ for all time period t. Consider ${\hat{S}}_{1}$ and ${\hat{S}}_{2}$ in Equation (Equation9(9) $\hat{b} = \frac{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} υ_{t v} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}] / w_{t v}}{\sum_{t = 1}^{τ} \sum_{v = 1}^{V} [e_{t} {\hat{T}}_{t v}^{- 1} - {\bar{T}}_{-}]^{2} / w_{t v}} = {\hat{S}}_{1} / {\hat{S}}_{2}$ (9) ), by condition (xi), (xii), (xiii), and the result from (EquationA1(A1) $\begin{aligned} n_{t} υ_{t v} & \overset{p}{\to} ψ_{v}^{- 2} \sum_{h} σ_{t h v}^{2} \sum_{h} \frac{c_{3 t h} V_{1 t h v}}{\sum_{h} σ_{t h v}^{2}} \\ \overset{p}{\to} \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{2 t h} μ_{t h v}^{2}}{σ_{t h v}^{2} ψ_{v}^{2}} + \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{1 t h} μ_{t h v}}{σ_{t h v}^{2} ψ_{v}^{2}} \\ \overset{p}{\to} A + \frac{B}{ψ_{v}} . \end{aligned}$ (A1) ), we have (A2) $n \bar{N} d_{t v} {\hat{S}}_{1} \overset{p}{\to} B [\sum_{t} \sum_{v} ψ_{v}^{- 2} ω_{v}^{- 2} - \sum_{t} \sum_{v} ψ_{v}^{- 1} ω_{v}^{- 1} {\bar{ψ}}_{-}]$ (A2) and (A3) ${\bar{N}}^{2} d_{t v} {\hat{S}}_{2} \overset{p}{\to} \sum_{t} \sum_{v} ψ_{v}^{- 2} ω_{v}^{- 2} - \sum_{t} \sum_{v} ω_{v}^{- 1} ({\bar{ψ}}_{-})^{2},$ (A3) where ${\bar{ψ}}_{-} = \sum_{t} \sum_{v} (ψ_{v}^{- 1} ω_{v}^{- 1}) / \sum_{t} \sum_{v} ω_{v}^{- 1} .$ By (EquationA2(A2) $n \bar{N} d_{t v} {\hat{S}}_{1} \overset{p}{\to} B [\sum_{t} \sum_{v} ψ_{v}^{- 2} ω_{v}^{- 2} - \sum_{t} \sum_{v} ψ_{v}^{- 1} ω_{v}^{- 1} {\bar{ψ}}_{-}]$ (A2) ) and (EquationA3(A3) ${\bar{N}}^{2} d_{t v} {\hat{S}}_{2} \overset{p}{\to} \sum_{t} \sum_{v} ψ_{v}^{- 2} ω_{v}^{- 2} - \sum_{t} \sum_{v} ω_{v}^{- 1} ({\bar{ψ}}_{-})^{2},$ (A3) ), we have $\hat{b} = \frac{{\hat{S}}_{1}}{{\hat{S}}_{2}} \overset{p}{\to} \frac{B}{n / \bar{N}} or \frac{n}{\bar{N}} \hat{b} \overset{p}{\to} B .$ The convergence of $n_{t} υ_{t v}$ (EquationA1(A1) $\begin{aligned} n_{t} υ_{t v} & \overset{p}{\to} ψ_{v}^{- 2} \sum_{h} σ_{t h v}^{2} \sum_{h} \frac{c_{3 t h} V_{1 t h v}}{\sum_{h} σ_{t h v}^{2}} \\ \overset{p}{\to} \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{2 t h} μ_{t h v}^{2}}{σ_{t h v}^{2} ψ_{v}^{2}} + \sum_{h} \frac{c_{3 t h} V_{1 t h v} α_{1 t h} μ_{t h v}}{σ_{t h v}^{2} ψ_{v}^{2}} \\ \overset{p}{\to} A + \frac{B}{ψ_{v}} . \end{aligned}$ (A1) ) implies that $n {\bar{υ}}_{v} \overset{p}{\to} A + B ({\bar{ψ}}_{-}) .$ As a result, $\begin{aligned} n {\hat{υ}}_{v} & = n {\bar{υ}}_{v} + (\frac{n}{\bar{N}}) \hat{b} (e_{t} \bar{N} {\hat{T}}_{t v}^{- 1} - \bar{N} {\bar{T}}_{-}) \\ \overset{p}{\to} A + \frac{B}{ψ_{v}} . \end{aligned}$ Therefore, for all time periods $t = 1, 2, \dots, τ$ $\frac{relvar ({\hat{T}}_{t v} - T_{t v})}{{\hat{υ}}_{v}} \overset{p}{\to} 1.$

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Generalised variance functions for longitudinal survey data

ABSTRACT

1. Introduction

2. Generalised variance function model