Abstract
A crucial input into causal inference is the imputed counterfactual outcome. Imputation error can arise because of sampling uncertainty from estimating the prediction model using the untreated observations, or from out-of-sample information not captured by the model. While the literature has focused on sampling uncertainty, it vanishes with the sample size. Often overlooked is the possibility that the out-of-sample error can be informative about the missing counterfactual outcome if it is mutually or serially correlated. Motivated by the best linear unbiased predictor (BLUP) of Goldberger in a time series setting, we propose an improved predictor of potential outcome when the errors are correlated. The proposed PUP is practical as it is not restricted to linear models, can be used with consistent estimators already developed, and improves mean-squared error for a large class of strong mixing error processes. Ignoring predictability in the errors can distort conditional inference. However, the precise impact will depend on the choice of estimator as well as the realized values of the residuals.
Disclosure Statement
The authors report that there are no competing interests to declare.
Notes
1 We thank Bruce Hansen for this suggestion.
2 Regularization is not necessary to consistently estimate the missing values, but could give a lower rank common component than the one in Bai and Ng (Citation2021).
3 Robinson (Citation1991) provides a survey of its many derivations, including a Kalman filter interpretation, see also Spall (Citation1991). Taub (Citation1979) and Baltagi (Citation2008, 2013) use it in variance components analysis of panel data.
4 Cochrane-Orcutt performs least squares regression of on
for given
using data from
, and then estimates
from an autoregression in
till convergence. The Prais-Winsten estimator additionally exploits information in t = 1. It is also possible to estimate β and
directly from the Durbin equation
error.
5 Brodersen et al. (Citation2015) consider state space estimation of the counterfactual outcomes in the presence of trends, but serial correlation in idiosyncratic shocks and/or the factors are not allowed. Carvalho, Masini, and Medeiros (Citation2018) and Masini and Medeiros (Citation2021, Citation2022) consider causal inference in a high-dimensional setting when the data are persistent and possibly non-stationary.
6 This follows because . Thus,
,
.
7 For large T1, , which equals