ABSTRACT
We propose an optimal-transport-based matching method to nonparametrically estimate linear models with independent latent variables. The method consists in generating pseudo-observations from the latent variables, so that the Euclidean distance between the model’s predictions and their matched counterparts in the data is minimized. We show that our nonparametric estimator is consistent, and we document that it performs well in simulated data. We apply this method to study the cyclicality of permanent and transitory income shocks in the Panel Study of Income Dynamics. We find that the dispersion of income shocks is approximately acyclical, whereas the skewness of permanent shocks is procyclical. By comparison, we find that the dispersion and skewness of shocks to hourly wages vary little with the business cycle. Supplementary materials for this article are available online.
Supplementary material
In addition to these extensions, in the online appendix we provide the proofs and present some additional simulations.
Acknowledgments
We thank to Colin Mallows, Tincho Almuzara, Alfred Galichon, Jiaying Gu, Kei Hirano, Pierre Jacob, Roger Koenker, Thibaut Lamadon, Guillaume Pouliot, Azeem Shaikh, Tim Vogelsang, Daniel Wilhelm, and audiences at various places for comments. Tincho Almuzara and Beatriz Zamorra provided excellent research assistance.
Notes
1 Codes to implement the estimator are available on the second author’s webpage.
2 When is nonsingular and A is known,
recovers X exactly. We are interested in situations, such as deconvolution and filtering, where exact recovery of the latent variables is not possible.
3 An interesting possibility would be to jointly estimate A and the distribution of X. Although we do not study it formally, we comment on this possibility in Section 4.2.
4 Other applications in economics include the estimation of the heterogeneous effects of an exogenous binary treatment under the assumption that the potential outcome in the absence of treatment is independent of the gains from treatment (Heckman, Smith, and Clements Citation1997), and the estimation of the distribution of time-invariant random coefficients of binary treatments in panel data models (Arellano, and Bonhomme Citation2012).
5 See also Botosaru and Sasaki (Citation2015). Our approach may be used to estimate linear autoregressive specifications of the form , where we estimate
—that is, the matrix A—in a first step. An important application of error components models is to relax independence in repeated measurements models such as Equation (1). This can be done provided T is large enough. Modeling
in Equation (1) as a finite-order moving average or autoregressive process with independent innovations preserves the linear independent factor structure of the model (Arellano, and Bonhomme Citation2012; see also Hu, Moffitt, and Sasaki Citation2019). In addition, in model (1) Schennach (Citation2013b) pointed out that full independence between the factors is not necessary, and that sub-independence suffices to establish identification.
6 Ben Moshe (2017) showed how to allow for arbitrary subsets of dependent factors, and proposes characteristic-function based estimators.
7 The sample sizes being the same for Y and X2 is not essential and can easily be relaxed. In a setting where the cdf is known, one can draw a sample from it, or alternatively work with an integral counterpart to our estimator.
8 Specifically, one could compute , with
being R independent permutations. In that case, π would be a generalized permutation, mapping
to
.
9 It is common in applications to assume that some of the Xk’s have zero mean while leaving the remaining means unrestricted. For example, in the repeated measurements model, assuming that suffices for identification. Our algorithm can easily be adapted to such cases.
10 If is unknown and
is a consistent estimate of it, then we replace atk by
in Equation (4). We proceed similarly in the algorithm we propose in the next section. Alternatively, one could jointly minimize the objective function on the right-hand side of Equation (4) with respect to both X and
. Here, we do not study the formal properties of such a joint estimation method.
11 Notice that, since π is a permutation, does not depend on π.
12 See, for example, Galichon (Citation2016, chap. 3) on discrete Monge–Kantorovitch problems.
13 An entropic-regularized counterpart to Equation (4) is, for
14 Strictly speaking, Mallows (Citation2007) redefined for all
at the end of step s, and then applies the random permutation
to the new
values. This difference with the algorithm outlined here turns out to be immaterial, since the composition of
and
is also a random permutation of
.
15 It is not necessary for to be an exact minimizer of Equation (4). As we show in the proof, it suffices that the value of the objective function at
be in an ϵN-neighborhood of the global minimum, for ϵN tending to zero as N tends to infinity.
16 Part (i) ensures that belongs to an
-ball, which is compact under
(Gallant and Nychka Citation1987). Compactness can be preserved when norms are replaced by weighted norms (e.g., using polynomial or exponential weights); see, for example, (Freyberger and Masten Citation2019, theor. 7), and the analysis in Newey and Powell (Citation2003).
17 The alternative density estimator can be shown to be uniformly consistent for
as N tends to infinity under the same conditions.
18 A simple recommendation for practice is based on a truncated normal distribution. Let denote a consistent estimate of the standard deviation of Xk, for example, obtained by covariance-based minimum distance, and let c > 0 be a tuning parameter. Possible penalization constants are:
(upper bound on quantile values),
and
(lower and upper bounds for first derivatives), and
(upper bound on the second derivatives). When c = 1, these constants are binding when Xk follows a normal truncated at the 99th percentiles. As a default choice one may take c = 2.
19 When implementing the Fourier estimator we enforce the non-negativity and integral constraints ex-post. To select the tuning parameter, we minimize the Monte Carlo MISE of the estimator on a grid of values.
20 Indeed, we have
21 For example, Storesletten, Telmer, and Yaron (Citation2004) estimated an AR(1) process for the persistent component, whose baseline value for the autoregressive coefficient is 0.96. While they estimated the model in levels, our motivation for estimating (11) in the first-differences is that differences are robust to heterogeneity between cohorts.
22 We compute the Newey-West formula with one lag. Using two or three lags instead has little impact. In this calculation we do not account for the fact that the quantiles are estimated, our rationale being that the cross-sectional sizes are large relative to the length of the time series.
23 Indeed, in the univariate case for all i is equivalent to
for all
and length-m cycle
.
24 Note that (14) is linear in ’s. However, it may be impractical to enforce all restrictions in (14) in the update step. In applications, a possibility is to select SN restrictions at random, where SN depends on the sample size.