235
Views
0
CrossRef citations to date
0
Altmetric
Articles

Optimal AK composite estimators in current population survey

, &
Pages 257-264 | Received 21 Mar 2017, Accepted 21 Jul 2017, Published online: 19 Sep 2017

ABSTRACT

The Current Population Survey (CPS) is a monthly household sample survey with a sample consisting of eight rotation groups. Sampled individuals of a rotation group are interviewed four consecutive months and another four consecutive months after resting eight consecutive months. A composite-type estimator is adopted in the CPS for the estimation of the monthly population total, which combines sample information from the current month's survey and previous months using the fact that 75% households have data for two consecutive months. There are two values, A and K, in the composite estimator to decide how to combine the available information, and thus this estimator is called the AK composite estimator. In this paper, we use a formula of the mean-squared error of AK composite estimator and propose an easy-to-use method of choosing A and K based on data, which evolves the estimation of some population quantities using the method of moments and replication. Some numerical studies are conducted to illustrate the effectiveness of the proposed method.

1. Introduction

The Current Population Survey (CPS) is a household sample survey sponsored by the U.S. Bureau of Labor Statistics and conducted monthly by the U.S. Census Bureau to provide estimates of employment, unemployment and other characteristics of the non-institutionalised civilian population 16 years of age and older. The CPS adopts a 4–8–4 rotation sample design that consists of a sample of eight rotation groups, approximately equal in size, partitioned in such a manner that for any given month, one-eighth of the sample is interviewed for the first time, one-eighth for the second time,… and one-eighth for the eighth time. Households in a rotation group are interviewed for four consecutive months, dropped for the next eight months and then returned to the sample for the following four months before they retire from the sample. The rotation paradigm ensures a 75% month-to-month overlap and a 50% year-to year overlap, which makes it possible to increase the efficiency of the current month estimators using data from previous months as well as to efficiently estimate month-to-month changes. More details of the CPS can be found in U.S. Bureau of Labor Statistics (2006) and Cheng (Citation2012).

Let Yt be the unknown population total of a variable of interest (e.g. total unemployed) and Nt be the total population units at month t. The unknown population mean is Yt/Nt (e.g. population proportion for unemployed). The current estimation procedure in the CPS can be described as follows. Based on the data in rotation group i and month t, let be a ratio, regression or calibration estimator of Yt using some covariates such as age, sex, race, ethnicity and other household characteristics. A simple estimator of Yt is the average of over the eight rotation panels, i.e. (1) Using data from the 75% sampled units in month t having data for the two consecutive months t and t − 1, we can estimate the month-to-month change Δt = YtYt − 1 by where s = {2, 3, 4, 6, 7, 8}. Note that units in group i = 1 or 5 do not have data for month t − 1. This estimator of change together with the estimated monthly total for month t − 1 provides an alternative estimator of Yt: (2) While the simple estimator in (Equation1) is based on the data collected in month t only, the alternative estimator in (Equation2) might be more efficient since it makes use of the data from month t − 1 as well as data from month t in overlapping rotation groups in s. On the other hand, the estimator in (Equation2) does not use data from month t and rotation groups 1 and 5 that are not in s. Thus, to combine the advantages of the estimators in (Equation1) and (Equation2), the following first-generation composite estimator was used prior to 1985: (3)

which is a convex combination of two estimators defined in (Equation1) and (Equation2) with a value K between 0 and 1.

After a series of pioneer research studies (e.g. Gurney & Daly, Citation1965; Huang & Ernst, Citation1981), in 1985, a different composite estimator was introduced by adding another term to the composite estimator in (Equation3), which is the estimator of the net difference between the incoming and continuing parts of the current month's sample: The resulting second-generation composite estimator is called the AK composite estimator: (4) where A and K are two values to be determined based on some criterion. By assigning more weights to rotation groups that have been in the sample for the first and fifth time, the additional term might reduce both the bias and variance of the composite estimators (Gurney & Daly, Citation1965).

The population mean Yt/Nt is estimated by , where is the same as with all observed values being set to 1.

Kumar and Lee (Citation1983) studied the mean-squared error (MSE) of the AK composite estimator with a simple six rotation groups design for the Canadian Labor Force Survey and discussed how to choose the optimal A and K that minimise the MSE. For the 4–8–4 rotation design under consideration in this paper, based on the results in Lent and Cantwell Citation(1994) and Lent, Miller, and Cantwell Citation(1994, 1998, 1999) proposed to choose A and K by minimising the variance of the AK composite estimator on a grid, and to estimate unknown population parameters in the optimal values A and K by replication as developed in Lent (Citation1991) and Adam and Fuller (Citation1992). Based on some empirical results, they suggested that (A, K) should be chosen as (0.4, 0.7) for estimating the total employed or as (0.3, 0.4) for estimating the total unemployed, which were adopted by the CPS. However, the rotation group bias issue in the AK composite estimator (Bailar, Citation1975; Huang & Ernst, Citation1981) was not addressed in Lent et al. (Citation1999).

Similar to the development in Kumar and Lee (Citation1983), in this article, we re-express the variance formulae (Cantwell, Citation1990; Lent et al., Citation1999) of the AK composite estimator for the 4–8–4 rotation design as a quadratic function of A for each fixed K. After taking the rotation group bias into consideration, we find that the MSE of the AK composite estimator is still a quadratic function of A for each fixed K. Based on this MSE formula, the optimal values A and K that minimise the MSE can be obtained in terms of some unknown population parameters including the rotation group biases. To estimate these unknown parameters, we propose a method based on the method of moments and replication. By substituting unknown parameters with appropriate sample estimators, the resulting AK estimator is approximately optimal in terms of the MSE. The proposed method is examined via some simulation studies.

2. The optimal AK composite estimator

The biases of the composite estimator in (Equation3) and the AK composite estimator in (Equation4) were studied in Bailar Citation(1975) and Huang and Ernst (Citation1981), respectively, under the following condition:

(C1)

for any month t, i = 1,… , 8.

The biases ai's in (C1) are mainly caused by the difference in data collection among different rotation groups and are assumed to be independent of time t. Hence, they are called rotation group biases. Some empirical results showing this type of bias can be found in Krueger, Mas, and Niu Citation(2017).

Using condition (C1) and the results in Huang and Ernst (Citation1981), we express the bias of the AK composite estimator as a linear function of A: (5) where

Next, we focus on the variance of the AK composite estimator. Huang and Ernst (Citation1981) first gave an approximate variance formulae of the AK composite estimator. We re-express the variance formula in Cantwell (Citation1990) as a quadratic function of A for each fixed K. We assume the same conditions as in Huang and Ernst (Citation1981) and Cantwell (Citation1990), which are commonly used conditions in studying the variance of composite estimators under a longitudinal multi-level rotation plan.

(C2)

for all t and i, and and are uncorrelated whenever they are based on groups with different sampled units;

(C3)

Based on the structure of the rotation sample design, the following covariances are possibly not zero and we write them in terms of unknown σ2 and correlation coefficients ρi's:

Let V be the 8 × 8 matrix whose (i, j)th element is Also, let . Following Theorem 1 of Cantwell (Citation1990), we can obtain that By arranging terms as a quadratic form of A, we obtain that (6) where under conditions (C2) and (C3).

Combining results (Equation5) and (Equation6), we obtain the following formula of the MSE of the AK estimator in terms of a quadratic form of A for each fixed K. Under conditions (C1)–(C3), the MSE of AK composite estimator is (7) where

Note that av in (Equation6) should be positive; otherwise the variance in (Equation6) may be negative for some A. Hence, for each fixed K ∈ [0, 1), the MSE in (Equation7) is a quadratic form of A having a minimiser at −(bv + bb)/{2(av + ab)}. Also, for each fixed K, Thus, if all population quantities in (C1)–(C3) are known, then the optimal K and A that minimise the MSE in (Equation7) can be determined through the following algorithm.

Step 1.

Find the optimal K ∈ [0, 1) that minimises (cv + cb) − (bv + bb)2/{4(av + ab)} using some method; for example, a grid search.

Step 2.

The optimal A is then chosen as −(bv + bb)/{2(av + ab)} with the K obtained in step 1.

3. Parameter estimation

The results presented in the previous section are useful only when ai's in (C1) and σ2 and ρj's in (C2) and (C3) are all known. In practice, however, these values are usually unknown. In this section, we consider parameter estimation based on the method of moments and replication.

Consider first the estimation of the biases ai, i = 1,… , 8. The biases are actually not estimable unless some conditions or constraints are imposed to them. The following assumption was considered in Bailar Citation(1975) and Krueger et al. Citation(2017):

(C4)

8i = 1ai = 0.

Condition (C4) means that the average of eight rotation group estimators, , is an unbiased estimator of Yt. More generally, one may assume that ∑8i = 1λiαi = 0 for some specified λi's and then estimate ai unbiasedly by How to specify λi's is a difficult problem and deserves some further research. Our discussion hereafter is based on Assumption (C4).

Under condition (C4), Then, for a total of T months, ai can be estimated unbiasedly by

Now we turn to the estimation of σ2 and ρi's in (C2) and (C3). Under condition (C2), we obtain that, for any ij and t, Hence, Similarly, the following formulas can be derived under (C2) and (C3):

With the available unbiased estimator for the rotation group bias under (C4), approximately unbiased estimators of σ2 and ρi's based on data over T months can be constructed as follows:

Theorem 3.1:

Assume conditions (C1)–(C4). Then, as T → ∞, where Nt is the population size at time t and OP(1) denotes a quantity bounded in probability.

The proof of this result is given in Appendix.

The derived estimators are moment estimators. Theorem 3.1 indicates that they are consistent 2with the convergence rate , under (C1)–(C4). To improve the efficiency by repeated computations, we consider a replication method such as the balanced half sample or balanced repeated replication (e.g. Adam & Fuller, Citation1992; Lent, Citation1991; Lent et al., Citation1999). Suppose that a balanced half sample set of size B has been created. Define as the kth replicate estimate for Yt, i with replicate weights 0.5 and 1.5 in each half sample. We apply the moment method to each replicate and then average over the replicates, which lead to the following estimators for σ2 and ρj's: These estimators will be called replication-moment estimators. Note that this method is different from the replication method in Lent et al. (Citation1999) in two aspects. The first is that we include bias estimators. The second is that we use past data from T months to improve efficiency under (C2) and (C3).

The performance of the different estimation procedures in obtaining the optimal A and K was compared through a numerical study.

4. Numerical study

We generated a finite population of size N = 15, 000, in which the employment status variables were generated such that the unemployment proportion is the same as the estimated unemployment rate by the CPS from 2004 to 2014. We considered the estimation of Yt, the population-level unemployment total at month t, and rotation groups of the same size n = 50, 100 and 200. For each month, we selected a simple random sample of size n from the population, and kept them on track in the next 15 months. Such sampling scheme ensures that condition (C4) holds approximately. Based on the simulated sample, we estimated the optimal coefficients A and K, and obtained the optimal AK estimators for all months in 2014. The proposed estimators based on moment estimation alone and replication moment are denoted by and , respectively, in which the total number of replications is B = 48. We compared and with two other AK composite estimators, the AK composite estimator with fixed A = 0.3 and K = 0.4, and the AK composite estimator with A and K determined by the replication method proposed in Lent et al. (Citation1999). The estimator is based on all data from January 2004 to December 2013. For estimators , and , we tried two different T's, T = 120 (January 2004 to December 2013) and T = 60 (January 2009 to December 2013).

Based on 500 simulation repetitions, we approximated the MSE for each estimator and listed them in for t ranging from January to December 2014. The result of estimation bias for each estimator is omitted, because the bias of each estimator is negligible compared with the standard deviation. We have the following observations from .

1.

, and are much better than , indicating that updating A and K is necessary at least periodically.

Table 1. Simulation MSE of four estimators in 2014 (all results are in %).

2.

is better than in most cases, indicating that the additional computations by replication is worthwhile.

3.

In 63 of the total of 72 cases, is better than , and some of the improvements are substantial. The estimator based on moment estimation alone is better than in 28 of the 72 cases.

Because the bias is negligible in this simulation study, possible reasons for the improvement of our method over the one in Lent et al. (Citation1999) are (i) our method adopts a more precise method to determine the optimal coefficient A, which is supported by the fact that improves in some cases, although the moment estimation of σ2 and ρj's without replication may not be efficient; and (ii) uses more past observations in estimating σ2 and ρj's under (C2) and (C3). Note that the method in Lent et al. (Citation1999) also uses (C2) and (C3) in the derivation of the variance formula of the AK composite estimator, although not in the estimation of σ2 and ρj's. Furthermore, in applications, the biases should be checked empirically before applying the method in Lent et al. (Citation1999).

5. Discussions

Selecting A and K in the AK composite estimators has been a long-standing issue in the CPS. We revisit the MSE formula of the AK composite estimator and express it as a quadratic form of A for each fixed K so that the optimal A and K can be easily obtained. Based on our simulation results, we recommend the proposed replication-moment method in estimating the unknown parameters in the optimal values of A and K. Different from existing methods for choosing A and K in the CPS (Lent et al., Citation1999), our approach selects A in a more precise manner, and uses past data and takes the rotation group bias into consideration when estimating the unknown parameters in the optimal values of A and K.

Both our approach and the one in Lent et al. (Citation1999) are based on (C2) and (C3) which are covariance stationarity conditions when monthly data are viewed as a time series. Without these covariance stationarity conditions, the variance or MSE of does not have any explicit form so that selecting A and K becomes very difficult. Some further research along the following lines is desired.

1.

For a long period of time (a large T), the covariance stationarity conditions are likely to be violated. However, it may be reasonable to assume these conditions in a moderate time period, and our method can be applied periodically.

2.

Some ideas for relaxation of the stationarity conditions are discussed in Cantwell (Citation1990). For example, we may assume that if the two estimators are from the same panel but |ts| months depart. Our method may be extended to this situation where data are non-stationary, but a certain structure is imposed to .

3.

Without any assumption other than the existence of the second-order moments, the MSE of does not have any explicit form. If an accurate MSE estimator can be derived (e.g. by replication or bootstrapping), we may select A and K by minimising the MSE estimator over a grid.

Our result also relies on condition (C4) for the rotation biases. Without any condition, the rotation biases are not estimable. Previous researchers either assumed (C4) or simply ignored the rotation biases (Bailar, Citation1975; Cantwell, Citation1990; Krueger et al., Citation2017; Lent et al., Citation1999). A further investigation on the rotation biases when they are not ignorable is needed.

Combining with Theorem 2 in Cantwell (Citation1990), our method can be applied to the selection of the optimal A and K for estimating the month-to-month change Δt by , because the MSE of is also a quadratic form of A for any fixed K. Note that is always unbiased for Δt and hence its MSE is its variance. The proof of the following result is given in Appendix.

Theorem 5.1:

Let L be the 8 × 8 matrix with 1's on the subdiagonal and 0's elsewhere. Under (C1)–(C3),

for K = 0,

for 0 < K < 1,

where

If we consider the estimation of the total and change together, then we may minimise a compromised loss function with a fixed λ, 0 ≤ λ ≤ 1, to determine the optimal A and K.

Disclosure statement

Any view expressed are those of the authors and not necessarily of the U.S. Census Bureau.

Additional information

Funding

The research of the first two authors were supported by the U.S. Census Bureau Prime Contract No: YA1323-09-CQ-0054. The research of the second author was also partially supported by an NSF grants DMS-1305474 and DMS-1612873.

Notes on contributors

Yang Cheng

Yang Cheng is a lead scientist at the U.S. Census Bureau.

Jun Shao

Jun Shao is a professor in University of Wisconsin-Madison.

Zhou Yu

Zhou Yu is a professor in East China Normal University.

References

  • Adam, A., & Fuller, W. (1992). Covariance of estimators for the current population survey. In Proceedings of the Section on Survey Research Methods (pp. 586–591). Washington, DC: American Statistical Association.
  • Bailar, B. A. (1975). The effects of rotation group bias on estimates from panel surveys. Journal of the American Statistical Association, 70, 23–30.
  • Cantwell, P. J. (1990). Variance formulae for composite estimators in one-and multi-level rotation designs. Survey Methodology, 16(1), 153–163.
  • Cheng, Y. (2012). Overview of current population survey methodology. In Proceedings of the Survey Research Methods Section (pp. 3963–3979). Washington, DC: American Statistical Association.
  • DasGupta, A. (2008). Asymptotic theory of statistics and probability. New York, NY: Springer Verlag.
  • Gurney, M., & Daly, J. F. (1965). A multivariate approach to estimation in periodic sample surveys. In Proceedings of the Social Statistics Section (pp. 242–257). Washington, DC: American Statistical Association.
  • Huang, E. T., & Ernst, L. R. (1981). Comparison of an alternative estimator to the current composite estimator in CPS. In Proceedings of the Section on Survey Research Methods (pp. 303–308). Washington, DC: American Statistical Association.
  • Krueger, A. B., Mas, A., & Niu, X. (2017). The Evolution of Rotation Group Bias: Will the Real Unemployment Rate Please Stand Up?. The Review of Economics and Statistics, 99, 258–264.
  • Kumar, S., & Lee, H. (1983). Evaluation of composite estimation for the canadian labour force survey. Survey Methodology, 9, 178–201.
  • Lent, J. (1991). Variance estimation for current population survey small area labor force estimates. In Proceedings of the Survey Research Methods Section (pp. 11–20). Washington, DC: American Statistical Association.
  • Lent, J., & Cantwell, S. (1994). Effect of composite weights on some estimates from the current population survey. In Proceedings of the Section on Survey Research Methods (pp. 130–139). Washington, DC: American Statistical Association.
  • Lent, J., Miller, S., & Cantwell, P. (1994). Composite weights for the current population survey. In Proceedings of the Section on Survey Research Methods (pp. 867–872). Washington, DC: American Statistical Association.
  • Lent, J., Miller, S. M., Duff, M., & Cantwell, P. J. (1998). Comparing current population survey estimates computed using different composite estimators. In Proceedings of the Section on Survey Research Methods (pp. 564–569). Washington, DC: American Statistical Association.
  • Lent, J., Miller, S. M., Cantwell, P. J., & Duff, M. (1999). Effects of composite weights on some estimates from the current population survey. Journal of Official Statistics, 15(1), 431–448.

Appendix

Proof of Theorem 3.1:

We only prove the results for 's and under (C1)–(C4). We first deal with 's. Let . Then {U1, i,… , UT, i} is a stationary 16-dependent sequence. By Theorem 9.1 in DasGupta Citation(2008), we have as T goes to infinity, where and i = 1,… , 8.

Now we deal with . Let . Then St is also a stationary 16-dependent sequence. Again by Theorem 9.1 in Dasgupta Citation(2008), we can see that as T goes to infinity, where . Then since . The proof is completed.

Proof of Theorem 5.1:

Recall that . By Theorem 2 in Cantwell (Citation1990), we know that for K = 0, and for 0 < K < 1, Then proof is completed by noting that is an unbiased estimator for Δt.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.