Abstract
We consider inference in linear regression models that is robust to heteroscedasticity and the presence of many control variables. When the number of control variables increases at the same rate as the sample size the usual heteroscedasticity-robust estimators of the covariance matrix are inconsistent. Hence, tests based on these estimators are size distorted even in large samples. An alternative covariance-matrix estimator for such a setting is presented that complements recent work by Cattaneo, Jansson, and Newey. We provide high-level conditions for our approach to deliver (asymptotically) size-correct inference as well as more primitive conditions for three special cases. Simulation results and an empirical illustration to inference on the union premium are also provided. Supplementary materials for this article are available online.
1 Introduction
When performing inference in linear regression models it is common practice to safeguard against (conditional) heteroscedasticity of unknown form. The estimator of the covariance matrix of the least-squares estimator proposed by Eicker (Citation1963, Citation1967) and White (Citation1980) is known to be biased. The bias can be severe if the regressor design contains observations with high leverage (Chesher and Jewitt Citation1987).Footnote 1 A necessary condition for the least-squares estimator to be asymptotically normal is that maximal leverage vanishes as the sample size grows large (Huber Citation1973). This condition, then, also implies consistency of the robust covariance-matrix estimator (under regularity conditions).
The requirement that maximal leverage vanishes is problematic when the regressors include a large set of control variables. Under asymptotics where their number, qn , grows with the sample size, n, the robust covariance-matrix estimator will be inconsistent unless , as demonstrated by Cattaneo, Jansson, and Newey (Citation2018). They obtained the same result for the other members of the so-called HC-class of covariance-matrix estimators (see, e.g., Long and Ervin Citation2000; MacKinnon Citation2012 for reviews) and showed that the jackknife variance estimator of MacKinnon and White (Citation1985), although inconsistent, can be used to perform asymptotically conservative inference under asymptotics where . On the other hand, a bias-corrected covariance-matrix estimator in the spirit of Hartley, Rao, and Kiefer (Citation1969) and Bera, Suprayitno, and Premaratne (Citation2002) is shown to be consistent under conditions that bound maximal leverage below . One implication of these conditions is that . Although not guaranteed to be positive semidefinite (see Bera, Suprayitno, and Premaratne Citation2002 for a discussion), this estimator is attractive as it is asymptotically equivalent to a minimum-norm unbiased estimator in the sense of Rao (Citation1970).
In this article, we discuss an alternative estimator of the covariance matrix that can deal with designs where maximal leverage is bounded away from 1. As such it remains consistent when . To show this, we need to impose additional conditions relative to Cattaneo, Jansson, and Newey (Citation2018). A consistency result is first provided under high-level conditions. Primitive conditions are then given for three special cases; the partially linear regression model, the one-way model for short panel data, and the generic linear model with increasing dimension. Again, our covariance-matrix estimator need not be positive semidefinite in small samples. It is further not invariant to changes in the scale of the regression slopes. Achieving such invariance under asymptotics where appears difficult, as we discuss below.
The idea underlying our variance estimator can be traced back to work by Kline, Saggio, and Sølvsten (Citation2019, Remark 4 and Lemma 5); see below. The chief difference lies in the conditions under which the consistency result is obtained. They considered settings where the observations are independent and the regressors are fixed, and entertained models that are correctly specified with regression functions that are uniformly bounded. This is reasonable in analysis-of-variance problems, which are the main focus of application in their work. Here, we maintain the framework of Cattaneo, Jansson, and Newey (Citation2018). Some dependence between observations is allowed, regressors are stochastic, and the model can feature (vanishing) misspecification bias. Our consistency result can be understood to be an extension of Kline, Saggio, and Sølvsten (Citation2019) to such settings. It allows for regressors to have unbounded support and for observations to depend on a growing number of parameters. Our conditions do rule out dynamic models, for example. Verdier (Citation2020) provides estimation and inference results for two-way models estimated from short panel data that allow for dynamics over time.
In Section 2, we introduce the framework, present our covariance-matrix estimator, and provide a consistency result under a set of high-level conditions. In Section 3, we connect our work to the literature and, notably, to Cattaneo, Jansson, and Newey (Citation2018) and Kline, Saggio, and Sølvsten (Citation2019). In Section 4, we provide primitive conditions for three special cases of our setup. In Section 5, we present and discuss the results from Monte Carlo experiments and apply our variance estimator to perform inference on the union wage premium. A short conclusion ends the article. The supplemental appendix contains technical details and additional simulation results.
2 Inference With Many Regressors
2.1 Framework
Consider the linear model(1) (1) where is a scalar outcome, is a vector of regressors of fixed dimension r, is a vector of covariates whose dimension, qn , may grow with n, and is an unobserved error term. Our aim is to perform asymptotically valid inference on that is robust to heteroscedasticity, when is high-dimensional, in the sense that qn is not a vanishing fraction of the sample size. In such a case, the nuisance parameter is not consistently estimable, in general.
The (ordinary) least-squares estimator of iswhere and denotes the indicator function. We will provide an inference result based on the limit distribution of . We begin by stating a set of high-level conditions that guarantee this distribution to be Gaussian that cover the case where as . Our point of departure for this is Cattaneo, Jansson, and Newey (Citation2018, Theorem 1) and we closely follow their notation.
Let and let denote a collection of random variables such that . We introducewhere , to state our first assumption. We use to denote the cardinality of a set.
Assumption 1
(Sampling). The errors are uncorrelated across i conditional on and , and the collections are independent across g conditional on , where represents a partition of into Gn sets such that .
This assumption covers standard randomly sampled data but also repeated-measurement data (such as short panel data) where strata are independent but dependence between observations within the strata is allowed, for example.
The second assumption contains regularity conditions. We letdenote the Euclidean and Frobenius norms by , and write for the minimum eigenvalue of its argument.
Assumption 2
(Design). With probability approaching one has full rank,and .
The rank condition on the design matrix is standard. Furthermore, given that the slope coefficients on are not of direct interest to us, dropping any covariates that are (perfectly) collinear is not an issue. The second condition contains conventional moment conditions. The third condition, finally, allows for qn to grow at the same rate as the sample size.
Our setting covers situations where the regression in (1) is a linear-in-parameters mean-square approximation to the conditional expectation , in the sense that we allow that . The third assumption contains conditions on how fast such an approximation should improve. They are expressed in terms of the two constants
The assumption also contains a similar restriction on how well can be approximated bythe deviation of from its population linear projection. This is expressed using the constant where
Assumption 3
(Approximations). , and .
The last part of this assumption is a high-level negligibility condition on the residuals from the auxiliary regression of on . Given thatand should vanish in large samples for to be asymptotically normal (Huber Citation1973), this requirement appears close to minimal.
Assumptions 1–3 coincide with Assumptions 1–3 in Cattaneo, Jansson, and Newey (Citation2018). Consequently, by their Theorem 1,(2) (2) as , where and denotes the r × r identity matrix.
2.2 Variance Estimation
Constructing confidence intervals and test statistics based on (2) requires an estimator of , and thus of
When and the errors are permitted to be heteroscedastic, the construction of a consistent estimator is nontrivial. To appreciate the problem, consider the estimator of Eicker (Citation1963, Citation1967) and White (Citation1980), which useswhere are the least-squares residuals. This estimator is well known to be (conditionally) biased. The bias arises from the sampling noise in the least-squares estimator and can be severe (Chesher and Jewitt Citation1987). Unless as , some observations will remain influential, in the sense that maximal leverage does not vanish. This causes the bias in to persist in large samples, implying that it is inconsistent.
The alternative to that we consider in this article is
As stated, this estimator is well defined provided that
Notice that means that the model reserves a parameter for this observation. This implies that the auxiliary regression of the regressors of interest on the other covariates yields a perfect prediction, in the sense that . Consequently, such an observation does not carry information on and can be dropped. It does not affect the least-squares estimator and does not contribute to its covariance matrix . This is important as perfect prediction of this form arises frequently in empirical work when many dummy variables are included.
Additional conditions are needed to show that is consistent. We letand collect one such set of conditions in the following assumption.
Assumption 4
(Variance estimation). ,and .
The first part of Assumption 4 is a small-bias condition; it is relevant only when (1) is misspecified, in the sense that . In that case it is a strengthening of Assumption 3 only when . The conditions on the diagonal entries of the projection matrix are very weak. Providing primitive conditions for them in great generality appears to be difficult. However, when is multivariate normal they follow under as stated in Assumption 2 in the same way as in Cattaneo, Jansson, and Newey (Citation2018). In the one-way panel model they hold automatically while Verdier (Citation2020) gives sufficient conditions for them to be satisfied in the two-way model. To understand why the last part of Assumption 4 is needed, note that the (conditional) variance of depends on . The requirement that allows to control the variance of Weak moment requirements typically suffice for this condition to be satisfied. The condition on is used in concordance with the condition on . One simple sufficient condition for it is that , but it can also be satisfied when . Primitive conditions for Assumption 4 in three special cases are given below.
We can now state our consistency result.
Theorem 1
(Inference). Let Assumptions 1–4 hold. Thenas .
Theorem 1, combined with the limit result in (2), implies thatas , where
This result permits the construction of test statistics that (in large samples) will have correct size and of confidence regions that will exhibit correct coverage.
3 Connections to the Literature
3.1 HC-Class Estimators
The bias in the Eicker (Citation1963, Citation1967) and White (Citation1980) estimator has led to a variety of modifications to it being proposed that, following MacKinnon and White (Citation1985), are often referred to as the HC-class of covariance-matrix estimators. These estimators are reviewed in Long and Ervin (Citation2000) and MacKinnon (Citation2012). Unfortunately, as shown by Cattaneo, Jansson, and Newey (Citation2018, Theorem 3), none of these alternatives is consistent, in general, when . We briefly review their main findings on these estimators here.
The first variance estimator, HC1, differs from the conventional estimator, HC0, in that it performs a degrees-of-freedom correction (Hinkley Citation1977, eq. (2.11)). This estimator will be consistent in the special case where errors are homoscedastic and the covariate design is balanced, that is, when .
The second variance estimator, HC2, usesFootnote 2 as an estimator of (Horn, Horn, and Duncan Citation1975). This estimator will be consistent under homoscedasticity. Because where are fitted values, can be interpreted as a bias-corrected version of the HC2 estimator.
The third variance estimator, HC3, is constructed with
(MacKinnon and White Citation1985). While this estimator is inconsistent, its probability limit exceeds (in the matrix sense). It follows that (under Assumptions 1–3) test procedures based on HC3 will be asymptotically conservative when , both under homoscedasticity and heteroscedasticity.
3.2 Bias-Corrected Estimation
Cattaneo, Jansson, and Newey (Citation2018) also considered the estimatorwhere denotes the elementwise product of the matrix . This estimator has its origins in work by Hartley, Rao, and Kiefer (Citation1969) and Rao (Citation1970) and can be motivated through an (asymptotic) bias calculation; see also Bera, Suprayitno, and Premaratne (Citation2002), and Anatolyev (2018) for a refinement under homoscedasticity. For to be well defined, needs to be nonsingular. Necessary and sufficient conditions for this to be the case are stated in Mallela (Citation1972) but these are neither simple nor intuitive (Horn, Horn, and Duncan Citation1975). As noted by Horn and Horn (Citation1975), a simple sufficient condition is that
Depending on the problem at hand it may also be necessary; an example is the one-way panel model.
Cattaneo, Jansson, and Newey (Citation2018, Theorem 4) show that is consistent for if(3) (3) are added to Assumptions 1–3. Because , , and so is required for (3) to be satisfied. This, in turn, is a strengthening of the condition that in Assumption 2.
Stock and Watson (Citation2008) proposed a covariance-matrix estimator for linear fixed-effect models that is applicable to short panel data. It is based on an explicit calculation of the probability limit of , adjusting the inconsistent by subtracting from it a plug-in estimator of this limit quantity. As discussed in Cattaneo, Jansson, and Newey (Citation2018), can be understood to be a generalization of this approach to the generic setting where .
Like , the estimator need not be positive semidefinite in small samples. Being based on estimators of the individual that are linear combinations of it does, however, retain the invariance to changes in that is inherent in the HC-class estimators. This is a desirable feature and explains why a restriction on the magnitude of is not needed for this estimator to be consistent. Furthermore, jointly estimating in this way is attractive from an efficiency point of view, as it is asymptotically equivalent to a minimum-norm unbiased estimator (see Hartley, Rao, and Kiefer Citation1969; Rao Citation1970 for details).
3.3 Inference on Variance Components
The variance estimator is closely related to the work of Kline, Saggio, and Sølvsten (Citation2019). In our context, their proposal is to estimate each by the cross-fit (Newey and Robins Citation2018) estimatorwhere is the residual for observation i when the slope coefficients are estimated from the sample from which the ith observation has been omitted. When the regression model is correctly specified—that is, when —then is (conditionally) unbiased, provided that the leave-own-out regression slopes are well defined. This will be the case if maximal leverage is bounded away from one.Footnote 3 In this case, from the Sherman–Morrison formula (as in, e.g., Miller Citation1974), under our assumptions, connecting the cross-fit estimator of Kline, Saggio, and Sølvsten (Citation2019) to our approach.
Kline, Saggio, and Sølvsten (Citation2019) used this device to construct unbiased estimators of quadratic forms and present several applications. One of these (Kline, Saggio, and Sølvsten Citation2019, Remark 4 and Lemma 5) is a consistency result for the implied covariance-matrix estimatorin designs with fixed regressors where errors are independent and maximal leverage is bounded away from unity. This result is established under the assumption that and that , together with standard regularity conditions as those in Assumption 2. It follows that our variance estimator can be seen as a modification of theirs that is targeted to a setting with many control variables. Theorem 1 may then be understood to be an extension of their Lemma 5 to settings with stochastic regressors and (vanishing) misspecification bias, where the regressors can have unbounded support and observations may depend on a growing number of parameters.
The implied interpretation of as an (approximate) cross-fit estimator is also useful to highlight an apparent tension between invariance of a covariance-matrix estimator to changes in and the possibility for it to be consistent when we have . A cross-fit estimator of that is invariant is necessarily of the form , where and are two least-squares residuals, each one obtained from a (conditionally) independent subsample that excludes the ith observation. Because these auxiliary regressions can be based on at most and observations such an approach cannot accommodate situations where . The alternative estimators and circumvent the need for two independent estimators of .. by using the level of the outcome variable, , as a proxy for . This allows to deal with cases where but makes the variance estimator sensitive to .
4 Examples
We next provide more primitive conditions in three special cases that fit out general setup. We focus on sufficient conditions for Assumption 4. Cattaneo, Jansson, and Newey (Citation2018) already gave such conditions for the other assumptions—and, notably, for Assumption 3—to hold.
4.1 Partially Linear Model
Suppose that observations on are independent and identically distributed. The partially linear regression model states thatfor an unknown function . A series approximation of order κn of takes the form for and a collection of basis functions such as orthogonal polynomials. Our estimator , then, is the least-squares estimator of in
Note that , in general. Consistency of requires that (and, thus, that ) as . Here,
In this example, the number of covariates can be large when the dimension of is large, so that many terms are included in the approximation even for small κn , or when the underlying functions are not (assumed to be) very smooth, so that a large κn needs to be used to control bias.
Standard smoothness conditions on imply that (Newey Citation1997), yielding the first condition of Assumption 4. The fourth condition can be validated in the same way, by imposing sufficient smoothness on to get , which implies that . This is different from (and stronger than) the first set of primitive conditions discussed by Cattaneo, Jansson, and Newey (Citation2018) to validate Assumption 3, where more smoothness in can be used to compensate for less smoothness in . Alternatively, if and a partitioning estimator (Cattaneo and Farrell Citation2013) is used to approximate then is a band matrix. This is also sufficient to reach the desired result. Moreover, in this case, the first and fourth condition of Assumption 4 are implied by the rate requirements in Assumption 3, and by the second set of restrictions for this example given in Cattaneo, Jansson, and Newey (Citation2018). Next, simple sufficient conditions for the requirement that are the moment conditions and for some . While these two conditions are not imposed in Cattaneo, Jansson, and Newey (Citation2018), they would not appear to be overly strong.
4.2 One-Way Model for Panel Data
For double-indexed data , the fixed-effect model iswhere αg is a group-specific intercept and we assume that . The regressors are assumed independent between groups but may be dependent within each group. The errors are independent between groups and (conditionally) uncorrelated within groups. The usual asymptotic embedding here has with . The number of fixed effects grows at the same rate as the sample size; we have and qn = Gn so that , which does not vanish. These conditions fit Assumption 1.
The fixed-effect estimator is the least-squares estimator of on and Gn dummy variables that indicate group membership. The estimated coefficients on these dummies are computed from M observations and are not consistent under our asymptotic approximation. In this example , where is the M × M matrix that transforms observations into deviations from their within-group mean. Consequently,where , and and are defined in the same way. Further, using that ,
Because the first term constitutes an unbiased estimator of . The second term on the right-hand side is mean zero because the errors are mean-independent of the regressors. Its variance, however, depends on the .
In a two-wave panel,where Δ denotes the first-difference operator; so, for example, . In this case, the variance estimator does not depend on the fixed effects. The factor appears because of the de-meaning and is inconsequential. The implied estimator of the covariance matrix is
The same estimator would be obtained if our procedure would be applied directly to the first-differenced model The standard estimator of Eicker (Citation1963, Citation1967) and White (Citation1980) applied to this model isand is known to be consistent here.
In the one-way panel model our Assumption 4 holds provided thatand for some . Given that Cattaneo, Jansson, and Newey (Citation2018) imposed that to validate Assumption 2, the condition on the fixed effects is the only additional requirement needed for our variance estimator to be consistent in this model.
4.3 Linear Model With Increasing Dimension
Finally, consider the regression model that takes (1) as the data generating process for independent and identically distributed observations , that is,with as in Cattaneo, Jansson, and Newey (Citation2018). The generic nature of this model makes it difficult to specify a single set of simple primitive conditions for our Assumption 4 to hold. First consider the requirement that . A sufficient condition here is that
This will be the case, for example, when is discrete and a saturated regression model is used. It will also hold under smoothness conditions on the function when the are approximating functions, as discussed above. Alternatively, if , a sparsity condition on the projection matrix can be used. One such condition is , together with , where we have introduced . Cattaneo, Jansson, and Newey (Citation2018) showed that the weaker condition can be used to support Assumption 3 under the same moment condition on the . Such a rate would appear difficult to obtain here. Finally, for to hold we again impose that for some . When are approximating functions, a similar moment condition on the function that is being approximated will again suffice. In the case where are just many control variables included in the regression we can again use a sparsity condition. Let denote the number of nuisance parameters on which depends. Then one alternative sufficient condition is thatfor some , together with the assumption that the entries of have moments. A condition on is different from a condition on , as it only pertains to the regression of on and and does not restrict the auxiliary regression of on . When the covariates are normally distributed or have bounded support, for example, we can allow for .
5 Numerical Illustrations
5.1 Simulations
We present numerical results for a setup taken from Cattaneo, Jansson, and Newey (Citation2018). Data are generated aswhere contains a constant term and a collection of zero/one dummy variables, and . The dummy variables are drawn independently with success probability π and . The sample size was fixed to n = 700 throughout and we considered . All statistics reported below were computed over 10,000 Monte Carlo replications and all the variables were redrawn in each replication.
We consider three designs that vary in β and π. Design A is the design of Cattaneo, Jansson, and Newey (Citation2018). It has β = 1 and .Footnote 4 Each dummy variable takes on the value one for about 14 out of 700 observations, on average. Design B is a more sparse design where β = 1 is maintained and π is reduced to 0.01, leading to each dummy switching on for only 7 observations in each replication, on average. Design C, in turn, sets β = 2 and maintains . This implies that the conditional variance of increases by a factor of four.
The results for the three design variations are presented in . Empirical size of the two-sided t-test of the null that β = 1 and the average width of the corresponding confidence interval are given for all variance estimators discussed; HC0, its modifications HC1, HC2, and HC3, the bias-corrected estimator of Cattaneo, Jansson, and Newey (Citation2018) (HCK) and the estimator presented here (HCA).
As the setup features homoscedastic errors, inference based on HC0 will be liberal for large n when is not small (Chesher and Jewitt Citation1987; Cattaneo, Jansson, and Newey Citation2018). This is apparent from inspection of the tables. The degrees-of-freedom correction performed by HC1 alleviates most of this concern here. HC2, which is consistent under homoscedasticity, performs quite similarly to HC1. Both corrections do come with (on average) wider confidence intervals. The simulations also illustrate that HC3 yields conservative inference. The rejection frequency (under the null) approaches zero as increases. The confidence intervals are also wide. This implies that tests based on HC3 will suffer from low power. The relative inefficiency compared to HC1 and HC2 grows as increases.
In Design A, HCK gives close to correct size and confidence intervals of a comparable length as HC1 and HC2 for most of the values of . As increases above this variance estimator does not always exist in each replication. When this happens in our Monte Carlo, HCK defaults to HC0. Consequently, for the larger values of in , the size of the HCK method increases above its nominal size and the average length of the confidence interval equally shrinks relative to HC1 and HC2. In the sparser Design B, the nonexistence of the HCK estimator is more frequent and also arises for smaller values of . This explains the large overrejection rates observed in . The results for Design C in are similar to those in as the performance of HCK is invariant to the scale of the regression slopes.
The simulation results also confirm the theory behind our variance estimator. It yields close to correct inference in all three designs and for all values of considered, although some more overrejection is observed for the top-end values of this ratio. The average length of its confidence interval is comparable to those obtained for HC1, HC2, and HCK and is substantially smaller than those for HC3 for large values of . These results show that this estimator is useful in problems with many covariates when HCK is not available, for example, due to sparseness of the regressor design or because of the presence of high-leverage observations more generally.
The supplemental appendix contains additional simulation results for a partially linear regression model and for a one-way panel model.
5.2 Empirical Example
We next use the different variance estimators available to infer the union membership premium. The data are a balanced panel on 545 working individuals and span 8 years (1980–1987), giving a total of 4360 observations. They are taken from Vella and Verbeek (Citation1998) and are available from the data archive of the Journal of Applied Econometrics (http://qed.econ.queensu.ca/jae/1998-v13.2/vella-verbeek/).
We estimate the union premium as the coefficient from a least-squares regression of log wages on a dummy for union membership, after partialling-out a set of control variables. This set contains average hours worked per week, dummies for marital status and for poor health, a quadratic term in years of experience, and a large collection of dummies, as follows. First, as the data are a panel, both individual fixed effects and year fixed effects are included. Second, as the data contain information on the type of occupation (out of 9 categories) and the industry (out of a total of 12) in which the job is located we also control for these by including sector and industry dummies. Third, we allow for interaction effects between these categorical variables by including occupation-by-year, sector-by-year, and occupation-by-sector dummies as well as occupation-by-sector-by-year dummies. Baseline categories for the year-, sector-, and occupational dummies are chosen and their corresponding dummies are dropped as to avoid a dummy-variable trap. Certain interactions of the occupation and sector dummies never take on the value one and so are equally removed from the analysis. This gives a total of 1086 control variables relative to 4360 observations. Our point estimate of the union premium is 7.43%. This is in line with the literature (see, e.g., Jakubson Citation1991).
The standard errors on this point estimate, obtained by the various methods discussed, together with the implied 95% confidence intervals for the union premium are collected in . A total of 330 observations have leverage that exceeds the one-half threshold. 36 of these observations can be perfectly explained by the control variables because of the inclusion of the occupation-by-sector and the occupation-by-sector-by-year dummies and so contain no information on the union premium. The HCK variance estimator could not be computed here. (We note that the invertibility condition fails also after dropping the 36 non-informative observations.) This explains why an entry for this estimator is not available. HC0 yields the smallest standard error on our point estimate (1.72%). HC3 yields the largest (2.32%). The standard error of HCA (1.93%) is roughly in the middle of these two bounds. HC1 and HC2 give similar, slightly larger, standard errors as HCA (1.99% and 1.98%).
6 Conclusion
This article has presented a heteroscedasticity-robust covariance-matrix estimator for linear regression models that is consistent under an asymptotic scheme where the number of control variables, qn , grows at the same rate as the sample size, n. The estimator is similar to the proposal of Kline, Saggio, and Sølvsten (Citation2019) but our consistency result covers more general settings. The estimator complements work by Cattaneo, Jansson, and Newey (Citation2018), who derived inconsistency results for members of the HC-class of variance estimators, proved asymptotic-conservativeness of the HC3 estimator, and presented an alternative variance estimator (based on Hartley, Rao, and Kiefer Citation1969) that remains consistent when . Under a set of additional high-level conditions, our estimator allows to weaken this restriction to . Primitive conditions for these where given for partially linear models, fixed-effect panel data models, and generic regression models with increasing dimension. Simulations verify the theoretical properties. An empirical application to estimation of the union premium from panel data was also presented.
The idea underlying our variance estimator can be useful as a device to correct for bias more generally. Kline, Saggio, and Sølvsten (Citation2019) utilized it to bias-correct quadratic forms in fixed-effect estimators. Chernozhukov et al. (Citation2018) and Newey and Robins (Citation2018) utilized related cross-fitting techniques to reduce bias in high-dimensional estimation problems that feature machine-learning estimators. Cattaneo, Jansson, and Ma (Citation2019) characterized the bias in (nonlinear) two-step estimators when the first step features a high-dimensional linear regression.
Supplemental Material
Download PDF (247.9 KB)Acknowledgments
An associate editor and three referees provided very constructive feedback on earlier versions of this article and pointed out a non sequitur in the derivation of a primitive condition. Matias Cattaneo generously shared and discussed his replication material. I am most grateful to each of them for their help.
Supplementary Materials
The supplementary material contains proofs, technical details, and additional simulation results.
Additional information
Funding
Notes
1 The leverage of an observation i is defined as the ith diagonal element of the hat matrix, that is, the matrix that transforms the observed outcomes into fitted values. It is bounded between zero and one and measures the influence of the observation on its own fitted value; a larger value reflects a higher influence (see, e.g., Hoaglin and Welsch 1978).
2 We follow Cattaneo, Jansson, and Newey (Citation2018) and construct the HC-class variance estimators using the projection matrix . The original proposals were made in a context where qn is treated as fixed and did not differentiate between and ; they used the annihilator matrix that projects out both sets of variables. The difference is asymptotically negligible under our assumptions.
3 This is a slight strengthening of our requirement in Assumption 4 as we project out only to obtain while Kline, Saggio, and Sølvsten (Citation2019) projected out both the regressors and the covariates to obtain .
4 The description of the simulation design in Cattaneo, Jansson, and Newey (Citation2018, p. 1358) contains a typo that would imply that .
References
- Anatolyev, S. (2018), “Almost Unbiased Variance Estimation in Linear Regressions With Many Covariates,” Economics Letters , 169, 20–23. DOI: https://doi.org/10.1016/j.econlet.2018.05.003.
- Bera, A. K. , Suprayitno, T. , and Premaratne, G. (2002), “On Some Heteroskedasticity-Robust Estimators of Variance-Covariance Matrix of the Least-Squares Estimators,” Journal of Statistical Planning and Inference , 108, 121–136. DOI: https://doi.org/10.1016/S0378-3758(02)00274-4.
- Cattaneo, M. D. , and Farrell, M. H. (2013), “Optimal Convergence Rates, Bahadur Representation, and Asymptotic Normality of Partitioning Estimators,” Journal of Econometrics , 174, 127–143. DOI: https://doi.org/10.1016/j.jeconom.2013.02.002.
- Cattaneo, M. D. , Jansson, M. , and Ma, X. (2019), “Two-Step Estimation and Inference With Possibly Many Included Covariates,” Review of Economic Studies , 86, 1095–1122. DOI: https://doi.org/10.1093/restud/rdy053.
- Cattaneo, M. D. , Jansson, M. , and Newey, W. K. (2018), “Inference in Linear Regression Models With Many Covariates and Heteroskedasticity,” Journal of the American Statistical Association , 113, 1350–1361. DOI: https://doi.org/10.1080/01621459.2017.1328360.
- Chernozhukov, V. , Chetverikov, D. , Demirer, M. , Duflo, E. , Hansen, C. , Newey, W. , and Robins, J. (2018), “Double/Debiased Machine Learning for Treatment and Structural Parameters,” Econometrics Journal , 21, C1–C68. DOI: https://doi.org/10.1111/ectj.12097.
- Chesher, A. , and Jewitt, I. (1987), “The Bias of a Heteroskedasticity Consistent Covariance Matrix Estimator,” Econometrica , 55, 1217–1222. DOI: https://doi.org/10.2307/1911269.
- Eicker, F. (1963), “Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions,” The Annals of Mathematical Statistics , 34, 447–456. DOI: https://doi.org/10.1214/aoms/1177704156.
- Eicker, F. (1967), “Limit Theorems for Regressions With Unequal and Dependent Errors,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), eds. L. Le Cam and J. Neyman , pp. 59–82.
- Hartley, H. O. , Rao, J. N. K. , and Kiefer, G. (1969), “Variance Estimation With One Unit per Stratum,” Journal of the American Statistical Association , 64, 841–851. DOI: https://doi.org/10.1080/01621459.1969.10501016.
- Hinkley, D. V. (1977), “Jackknifing in Unbalanced Situations,” Technometrics , 19, 285–292. DOI: https://doi.org/10.1080/00401706.1977.10489550.
- Hoaglin, D. C. , and Welsch, R. E. (1978), “The Hat Matrix in Regression and ANOVA,” The American Statistician , 32, 17–22.
- Horn, S. D. , and Horn, R. A. (1975), “Comparison of Estimators of Heteroscedastic Variances in Linear Models,” Journal of the American Statistical Association , 70, 872–875. DOI: https://doi.org/10.1080/01621459.1975.10480316.
- Horn, S. D. , Horn, R. A. , and Duncan, D. B. (1975), “Estimating Heteroskedastic Variances in Linear Models,” Journal of the American Statistical Association , 70, 380–385. DOI: https://doi.org/10.1080/01621459.1975.10479877.
- Huber, P. J. (1973), “Robust Regression: Asymptotics, Conjectures, and Monte Carlo,” The Annals of Statistics , 1, 799–821. DOI: https://doi.org/10.1214/aos/1176342503.
- Jakubson, G. (1991), “Estimation and Testing of the Union Wage Effect Using Panel Data,” Review of Economic Studies , 58, 971–991. DOI: https://doi.org/10.2307/2297947.
- Kline, P. , Saggio, R. , and Sølvsten, M. (2019), “Leave-Out Estimation of Variance Components,” arXiv no. 1806.01494.
- Long, J. S. , and Ervin, L. H. (2000), “Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model,” The American Statistician , 54, 217–224.
- MacKinnon, J. G. (2012), “Thirty Years of Heteroscedasticity-Robust Inference,” in Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis , eds. X. Chen and N. R. Swanson , New York: Springer, pp. 437–461.
- MacKinnon, J. G. , and White, H. (1985), “Some Heteroskedasticity-Consistent Covariance Matrix Estimators With Improved Finite Sample Properties,” Journal of Econometrics , 29, 305–325. DOI: https://doi.org/10.1016/0304-4076(85)90158-7.
- Mallela, P. (1972), “Necessary and Sufficient Conditions for MINQU-Estimation of Heteroskedastic Variance in Linear Models,” Journal of the American Statistical Association , 67, 486–487. DOI: https://doi.org/10.1080/01621459.1972.10482416.
- Miller, R. G. (1974), “An Unbalanced Jackknife,” The Annals of Statistics , 2, 880–891. DOI: https://doi.org/10.1214/aos/1176342811.
- Newey, W. K. (1997), “Convergence Rates and Asymptotic Normality for Series Estimators,” Journal of Econometrics , 79, 147–168. DOI: https://doi.org/10.1016/S0304-4076(97)00011-0.
- Newey, W. K. , and Robins, J. M. (2018), “Cross-Fitting and Fast Remainder Rates for Semiparametric Estimation,” arXiv no. 1801.09138.
- Rao, C. R. (1970), “Estimation of Heteroscedastic Variances in Linear Models,” Journal of the American Statistical Association , 65, 161–172. DOI: https://doi.org/10.1080/01621459.1970.10481070.
- Stock, J. H. , and Watson, M. W. (2008), “Heteroskedasticity-Robust Standard Errors for Fixed Effects Panel Data Regression,” Econometrica , 76, 155–174. DOI: https://doi.org/10.1111/j.0012-9682.2008.00821.x.
- Vella, F. , and Verbeek, M. (1998), “Whose Wages Do Unions Raise? A Dynamic Model of Unionism and Wage Rate Determination for Young Man,” Journal of Applied Econometrics , 13, 163–183. DOI: https://doi.org/10.1002/(SICI)1099-1255(199803/04)13:2<163::AID-JAE460>3.0.CO;2-Y.
- Verdier, V. (2020), “Estimation and Inference for Linear Models With Two-Way Fixed Effects and Sparsely Matched Data,” Review of Economics and Statistics , 102, 1–16. DOI: https://doi.org/10.1162/rest_a_00807.
- White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity,” Econometrica , 48, 817–838. DOI: https://doi.org/10.2307/1912934.