14,521
Views
70
CrossRef citations to date
0
Altmetric
Theory and Methods

On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments

, , &
Pages 1339-1350 | Received 01 Jun 2016, Published online: 13 Nov 2018

ABSTRACT

We investigate the behavior of the Lasso for selecting invalid instruments in linear instrumental variables models for estimating causal effects of exposures on outcomes, as proposed recently by Kang et al. Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. We show that for this setup, the Lasso may not consistently select the invalid instruments if these are relatively strong. We propose a median estimator that is consistent when less than 50% of the instruments are invalid, and its consistency does not depend on the relative strength of the instruments, or their correlation structure. We show that this estimator can be used for adaptive Lasso estimation, with the resulting estimator having oracle properties. The methods are applied to a Mendelian randomization study to estimate the causal effect of body mass index (BMI) on diastolic blood pressure, using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI. Supplementary materials for this article are available online.

1. Introduction

Instrumental variables estimation is a procedure for the identification and estimation of causal effects of exposures on outcomes where the observed relationships are confounded by nonrandom selection of exposure. This problem is likely to occur in observational studies, but also in randomized clinical trials if there is selective participant noncompliance. An instrumental variable (IV) can be used to solve the problem of nonignorable selection. To do this, an IV needs to be associated with the exposure, but only associated with the outcome indirectly through its association with the exposure. The former condition is referred to as the “relevance” and the latter as the “exclusion” condition. Examples of instrumental variables are quarter-of-birth for educational achievement to determine its effect on wages, see Angrist and Krueger (Citation1991), randomization of patients to treatment as an instrument for actual treatment when there is noncompliance, see, for example, Greenland (Citation2000), and Mendelian randomization studies use IVs based on genetic information, see, for example, Lawlor et al. (Citation2008). For recent reviews and further examples see, for example, Clarke and Windmeijer (Citation2012), Imbens (Citation2014), Burgess, Small, and Thompson (Citation2017), and Kang et al. (Citation2016).

Whether instruments are relevant can be tested from the observed association between exposure and instruments. The effects on the standard linear IV estimator of “weak instruments,” that is, the case where instruments are only weakly associated with the exposure of interest, have been derived for the linear model using weak instrument asymptotics by Staiger and Stock (Citation1997). This has led to the derivation of critical values for the simple F-test statistic for testing the null of weak instruments by Stock and Yogo (Citation2005). Another strand of the literature focuses on instrument selection in potentially high-dimensional settings, see, for example, Belloni et al. (Citation2012), Belloni et al. (Citation2014), Chernozhukov et al. (Citation2015), and Lin et al. (Citation2015), where the focus is on identifying important covariate effects and selecting optimal instruments from a (large) set of a priori valid instruments, where optimality is with respect to the variance of the IV estimator.

In this article, we consider violations of the exclusion condition of the instruments, following closely the setup by Kang et al. (Citation2016) for the linear IV model where some of the available instruments can be invalid in the sense that they can have a direct effect on the outcomes or are associated with unobserved confounders. Kang et al. (Citation2016) proposed a Lasso-type procedure to identify and select the set of invalid instruments. Liao (Citation2013) and Cheng and Liao (Citation2015) also considered shrinkage estimation for identification of invalid instruments, but in their setup there is a subset of instruments that is known to be valid and that contains sufficient information for identification and estimation of the causal effects. In contrast, Kang et al. (Citation2016) did not assume any prior knowledge about which instruments are potentially valid or invalid. This is a similar setup as in Andrews (Citation1999) who proposed a selection procedure using information criteria based on the so-called J-test of over-identifying restrictions, as developed by Sargan (Citation1958) and Hansen (Citation1982). The Andrews (Citation1999) setup is more general than that of Kang et al. (Citation2016) and requires a large number of model evaluations, which has a negative impact on the performance of the selection procedure.

This article assesses the performance of the Kang et al. (Citation2016) Lasso-type selection and estimation procedure in their setting of a fixed number of potential instruments. If the set of invalid instruments were known, the oracle two-stage least squares (2SLS) estimator would be the estimator of choice in their setting. As the focus is estimation of and inference on the causal effect parameter, denoted by β, and as the standard Lasso approach does not have oracle properties, see, for example, Zou (Citation2006), we show how the adaptive Lasso procedure by Zou (Citation2006) can be used to obtain an estimator with oracle properties. To do so, we propose an initial consistent estimator of the parameters that is consistent also when the irrepresentable condition for consistent Lasso selection of Zhao and Yu (Citation2006) and Zou (Citation2006) fails. The oracle property in this setup is when an estimator for β has the same limiting distribution as the oracle 2SLS estimator.

Applying the irrepresentable condition to this IV setup, we derive conditions under which the Lasso method does not consistently select the invalid instruments. As is well known from Zhao and Yu (Citation2006), Zou (Citation2006), Meinshausen and Bühlmann (Citation2006), and Wainwright (Citation2009), certain correlation structures of the variables prevent consistent selection. New in our results are the conditions on the strength of the invalid instruments relative to that of the valid ones that result in violations of the irrepresentable condition, where the strength of an instrument is its standardized effect on the exposure. From this we can show that consistent selection of the invalid instruments may not be possible if these are relatively strong, even when less than 50% of the instruments are invalid, which is a sufficient condition for the identification of the parameters.

We show that under the condition that less than 50% of the instruments are invalid, a simple median-type estimator is a consistent estimator for the parameters in the model, independent of the strength of the invalid instruments relative to that of the valid instruments, or their correlation structure. It can therefore be considered for use in the adaptive Lasso procedure as proposed by Zou (Citation2006). With n the sample size, we show that the median estimator converges at the n rate, but with an asymptotic bias, as the limiting distribution is that of an order statistic. It does, however, satisfy the conditions for the adaptive Lasso procedure to enjoy oracle properties.

Because of this oracle property, and as in practice instrument strength is very likely to vary by instruments and invalid instruments could be relatively strong, it will be important to consider our adaptive Lasso approach for assessing instrument validity and estimating causal effects. In Mendelian randomization studies it is clear that genetic markers have differential impacts on exposures from examining the results from genome-wide association studies and one cannot rule out ex ante that invalid instruments with a direct effect are also stronger predictors for the exposure. (Bowden et al. (Citation2015) and Kolesar et al. (Citation2015) allowed for all instruments to be invalid and showed that the causal effect can be consistently estimated if the number of instruments increases with the sample size under the assumption of uncorrelatedness of the instrument strength and their direct effects on the outcome variable.)

The next section, Section 2, introduces the model and the Lasso estimator as proposed by Kang et al. (Citation2016). In Section 3, we derive the irrepresentable condition for this particular Lasso selection problem and present the result on the relationship between the relative strengths of the instruments and consistent selection. Section 4 presents the median estimator, establishes its consistency, and shows that its asymptotic properties are such that the adaptive Lasso estimator enjoys oracle properties. Section 5 presents some Monte Carlo simulation results. In Section 5.2, we link the Andrews (Citation1999) method to the Lasso selection problem and show how the test of overidentifying restrictions can be used as a stopping rule. Section 5.3 investigates how close the behavior of the adaptive Lasso estimator is to that of the oracle 2SLS estimator in the Monte Carlo simulations, by comparing the performances of the Wald tests on the causal parameter under the null for different sample sizes. Further analyses and simulation results investigating the effects of varying the information content by varying the strength of the instruments and the size of the direct effects of the invalid instruments on the outcome are presented in Section B in the supplementary materials. In Section 6, the methods are applied to a Mendelian randomization study to estimate the causal effect of body mass index (BMI) on diastolic blood pressure using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI. Section 7 concludes.

The following notation is used in the remainder of the article. For a full column rank matrix X with n rows, MX = InPX, where PX = X(XX)− 1X′ is the projection onto the column space of X, and In is the n -dimensional identity matrix. A k-vector of ones is denoted as ιk. The lp-norm is denoted by ‖.‖p, and the l0-norm, ‖.‖0, denotes the number of nonzero components of a vector. We use ‖.‖ to denote the maximal element of a vector.

2. Model and Lasso Estimator

We follow Kang et al. (Citation2016; KZCS from now on), who considered the following potential outcomes model. For i = 1, …, n, let Y(d, z)i, be the potential outcome if the individual i were to have exposure d and instrument values z. The observed outcome for an individual i is denoted by the scalar Yi, the treatment by the scalar Di, and the vector of L potential instruments by Zi.. The instruments may not all be valid and can have a direct or indirect effect. For two possible values of the exposure d*, d and instruments z*, z , assume the following potential outcomes model (1) Yid*,z*-Yid,z=z*-z'ϕ+d*-dβ(1) (2) E[Yi(0,0)|Zi.]=Zi.'ψ,(2) where φ measures the direct effect of z on Y, and ψ represents the presence of unmeasured confounders that affect both the instruments and the outcome.

We have a random sample {Yi, Di, Zi.}ni = 1. Combining (Equation1) and (Equation2), the observed data model for the random sample is given by (3) Yi=Diβ+Zi.'α+εi,(3) where α = φ + ψ; εi=Yi0,0-E[Yi(0,0)|Zi.]and hence Ei|Zi.] = 0. For ease of exposition, we further assume that E2i|Zi.] = σ2ϵ.

The KZCS definition of a valid instrument is then linked to the exclusion restriction and given as follows: Instrument j, j ∈ {1, …, L}, is valid if αj = 0 and it is invalid if αj ≠ 0. As in the KZCS setting, we are interested in the identification and estimation of the scalar treatment effect β in large samples with a fixed number L of potential instruments.

Let y and d be the n-vectors of n observations on {Yi} and {Di}, respectively, and let Z be the n × L matrix of potential instruments. As an intercept is implicitly present in the model, y, d, and the columns of Z have all been taken in deviation from their sample means. Following the notation of Zou (Citation2006), let ZA be the set of invalid instruments, A = {j: αj ≠ 0} and αA the associated coefficient vector. The oracle instrumental variables or two-stage least square (2SLS) estimator is obtained when the set ZA is known. Let RA=[dZA], the oracle 2SLS estimator is then given by (4) θ^or=β^orα^A=RA'PZRA-1RA'PZy.(4) Let d^=PZd, with individual elements D^i, then θ^or is the OLS estimator in the model Yi=D^iβ+ZA,i.'αA+ξi,where ξi is defined implicitly, and hence (5) α^A=ZA'Md^ZA-1ZA'Md^y=ZA'Md^ZA-1ZA'Md^PZy.(5) The oracle 2SLS estimator for β is given by β^or=d^'MZAd^-1d^'MZAy.Under standard assumptions, as defined below, (6) nβ^or-βdN0,σβor2,(6) where (7) σβor2=σε2(E[Zi.Di]'E[Zi.Zi.']-1E[Zi.Di]-E[ZA,i.Di]'E[ZA,i.ZA,i.']-1E[ZA,i.Di])-1.(7)

The vector d^ is the linear projection of d on Z. If we define γ^=(Z'Z)-1Z'd, then d^=Zγ^, or D^i=Zi.'γ^. We specify (8) Di=Zi.'γ+vi,(8) where γ = E[Zi.Zi.]− 1E[Zi.Di], and hence E[Zi.vi] = 0. Further, as in KZCS, let Γ = E[Zi.Zi.]− 1E[Zi.Yi] = γβ + α. Then define πj as (9) πjΓjγj=β+αjγj,(9) for j = 1, …, L. Theorem 1 in KZCS states the conditions under which, given knowledge of γ and Γ, a unique solution exists for values of β and αj. A necessary and sufficient condition to identify β and the αj is that the valid instruments form the largest group, where instruments form a group if they have the same value of π. Corollary 1 in KZCS then states a sufficient condition for identification. Let s = ||α||0 be the number of invalid instruments. A sufficient condition is that s < L/2, as then clearly the largest group is formed by the valid instruments.

In model (Equation3), some elements of α are assumed to be zero, but it is not known ex ante which ones they are and the selection problem therefore consists of correctly identifying those instruments with nonzero α. KZCS proposed to estimate the parameters α and β by using l1 penalization on α and to minimize (10) α^n,β^n=argminα,β12PZy-dβ-Zα22+λnα1,(10) where ‖α1 = ∑jj|. This method is closely related to the Lasso, and the regularization parameter λn determines the sparsity of the vector α^(n). From (Equation5), a fast two-step algorithm is proposed as follows. For a given λn solve (11) α^n=argminα12Md^PZy-Md^Zα22+λnα1(11) and obtain β^(n) by (12) β^n=d^'y-Zα^nd^'d^.(12)

To find α^(n) in (Equation11), the Lasso modification of the LARS algorithm of Efron et al. (Citation2004) can be used and KZCS had developed an R-routine for this purpose, called sisVIVE (some invalid and some valid IV estimator), where the regularization parameter λn is obtained by cross-validation.

For the random variables and iid sample {Yi, Di, Zi.}ni = 1, and model (Equation3) and (Equation8), we assume throughout that the following conditions hold:

Assumption 1.

E[Zi.Zi.] = Q, with Q a finite and full-rank matrix.

Assumption 2.

Let ui = (ϵivi)′. Then E[ui] = 0; E[uiui']=[σε2σεvσεvσv2]=Σ. The elements of Σ are finite.

Assumption 3.

plim(n− 1ZZ) = E[Zi.Zi.]; plim(n− 1Zd) = E[Zi.Di]; plim(n− 1Zϵ) = E[Zi.ϵi] = 0; plim(n− 1Zv) = E[Zi.vi] = 0; plim(n− 1ni = 1ui) = 0; plim(n− 1ni = 1uiui) = Σ.

Assumption 4.

γ = (E[Zi.Zi.])− 1E[Zi.Di], γj ≠ 0, j = 1, …, L.

The setting is thus a relatively straightforward one with fixed parameters β, α, and γ, and fixed number Ln of potential instruments. This is the setting under which the oracle 2SLS estimator has the limiting distribution (Equation6), and is a setting of interest in many applications. To identify in this simple setting an ex ante unknown subset of invalid instruments using the Lasso is challenging, as highlighted in the next section where we investigate the irrepresentable condition for this setting.

For the case of many weak instruments, even the oracle 2SLS estimator would not be the estimator of choice, due to its poor asymptotic performance, and the median estimator may not be consistent. Oracle estimators with better asymptotic properties in this setting are the limited information maximum likelihood (LIML) estimator, see Bekker (Citation1994) and Hansen, Hausman and Newey (Citation2008), or the continuous updating estimator (CUE), see Newey and Windmeijer (Citation2009). Selection of invalid instruments in this setting is outside the scope of this article.

3. Irrepresentable Condition

As Z'Md^Md^PZy=Z'Md^PZy=Z'Md^y, it follows that Md^PZy-Zα22=y'PZMd^PZy-2y'Md^Zα+α'Z'Md^Zα=y'PZMd^PZy-2y'Z˜α+α'Z˜'Z˜α,where Z˜=Md^Z. As y-Z˜α22=y'y-2y'Z˜α+α'Z˜'Z˜α,it follows that the Lasso estimator α^(n) as defined in (Equation11) can equivalently be obtained as (13) α^n=argminα12y-Z˜α22+λnα1.(13) This minimization problem looks very much like a standard Lasso approach with Z˜ as explanatory variables. However, an important difference is that Z˜ does not have full rank, but its rank is equal to L − 1. This is related to the standard Lasso case where we have an overcomplete dictionary implying that the OLS solution is not feasible. Intuitively, we cannot set λn = 0 in (Equation13) as we have to shrink at least one element of α to zero to identify the parameter β. All just-identified models with L − 1 instruments included as invalid result in a residual correlation of 0, and hence setting λn = 0 does not lead to a unique 2SLS estimator.

We assume throughout that E[Z˜i.Z˜i.'] is finite. Let C= plim (n-1Z˜'Z˜) , then it follows from Assumptions 1, 3, and 4 that C = Q − Qγ(γ)− 1γQ is finite.

We follow Zhao and Yu (Citation2006) and Zou (Citation2006), who developed the irrepresentable conditions for consistent Lasso variable selection. As before, let A = {j: αj ≠ 0} and assume wlog that A = {1, 2, …, s}, s < L. (We will use subscripts A and 1 interchangeably from here onward, and subscript 2 for associations with the set Ac = {j: αj = 0}.) Let (14) C=C11C21'C21C22,(14) where C11 is an s × s matrix. Further, define A^n={j:α^j(n)0}. Let s(α1) denote the vector sgn(α1), where α1 = αA = (α1, …, αs)′, sgn(a) = 1 if a > 0, sgn(a) = −1 if a < 0, and sgn(a) = 0 if a = 0. The irrepresentable condition (15) C21C11-1sα1<1,(15) is an (almost) necessary and sufficient condition for consistent Lasso variable selection. While (Equation15) refers to the formulation of the weak irrepresentable condition of Zhao and Yu (Citation2006), they showed that in this setting of a random design with fixed L and constant parameters α, their strong and weak irrepresentable conditions are equivalent to (Equation15) almost surely (Zhao and Yu Citation2006, p. 2544).

If (Equation15) is satisfied, and if λn satisfies λn/n → 0 and λn/n, then limnP(A^n=A)=1, see Theorem 1 in Zhao and Yu (Citation2006). Necessity means that consistent model selection implies the irrepresentable condition. As Zou (Citation2006) showed, if limnP(A^n=A)=1 and under the same conditions λn/n → 0 and λn/n, then the following condition must hold (16) C21C11-1sα11.(16) While in the standard linear model setup λn/n → 0 guarantees estimation consistency, see Lemma 1 in Zou (Citation2006), this is not the case in the IV setup here because of the rank deficiency of Z˜. Choosing λn = 0 in the standard setup would simply result in consistent OLS estimation of a model that includes all variables, which is not possible here as discussed above. Therefore, if the necessary irrepresentable condition (Equation16) does not hold, consistent Lasso selection is not possible and even λn/n → 0 does not guarantee estimation consistency in this rank deficient IV case.

We now analyze under what conditions the irrepresentable condition does or does not hold in the IV setup, focusing particularly on the relative strengths γ1 and γ2 of the invalid and valid instruments.

Partition Q = plim(n− 1ZZ) and γ commensurate with the partitioning of C as (17) Q=Q11Q21'Q21Q22,γ=γ1γ2,(17) where the instruments have been standardized such the diagonal elements of Q are equal to 1. In contrast to C, Q is not rank deficient. Then for the Lasso specification (Equation13), we have the following result.

Proposition 1.

Consider the observational models (Equation3) and (Equation8) under Assumptions 1, 3, and 4. Let C= plim (n-1Z˜'Z˜); Q = plim(n− 1ZZ); and C11, C21, Q11, Q21, Q22, γ1, and γ2 as specified in (Equation14) and (Equation17 ). Then C21C− 111 is given by (18) C21C11-1=Q21Q11-1-Q˜22γ2γ1'+γ2'Q21Q11-1γ2'Q˜22γ2,(18) where Q˜22=Q22-Q21Q11-1Q21'= plim n-1Z2'MZ1Z2.

Proof.

See Section A.1 in the supplementary materials.

Proposition 1 shows that consistent selection of the instruments is not only affected by the correlation structure of the instruments, but also by the values of γ1 and γ2. The next Proposition derives conditions on γ1 and γ2 under which the necessary condition for consistent variable selection (Equation16) does not hold.

Proposition 2.

Under the assumptions of Proposition 1, if |γ1s(α1)| > ‖γ21, then ‖C21C− 111s(α1)‖ > 1.

Proof.

It follows from (Equation18) that γ2'C21C11-1sα1=γ1'sα1.Therefore, γ21C21C11-1sα1γ1'sα1C21C11-1sα1γ1'sα1γ21.Hence, ‖C21C− 111s(α1)‖ > 1 if |γ1s(α1)| > ‖γ21.

Remark 1.

If s(α1) = s(γ1), then |γ1s(α1)| = ‖γ11, its maximum. Regardless of the correlation structure of the instruments, ‖C21C− 111s(α1)‖ > 1 and hence the necessary condition for consistent Lasso variable selection does not hold in that case if ‖γ11 > ‖γ21, that is, when the invalid instruments are stronger (in l1-norm) than the valid ones.

From Proposition 1, we can investigate consistent selection for various cases of interest. Related to the Monte Carlo simulations in KZCS and in Section 5, Corollary 1 considers the case with γ1=γ˜1ιs and γ2=γ˜2ιL-s.

Corollary 1.

If γ1=γ˜1ιs and γ2=γ˜2ιL-s, then |γ1s(α1)| > ‖γ21 if |γ˜1γ˜2||ιs's(α1)|>L-s. Let g = |ιss(α1)|, then it follows that ‖C21C− 111s(α1)‖ > 1 if |γ˜1γ˜2|g>L-s. Hence, if g = s, ||C21C− 111s(α1)|| > 1 if s>L/(1+|γ˜1γ˜2|).

When instruments are uncorrelated, such that Q = IL, it follows that ‖C21C− 111s(α1)‖ < 1 if s<L-|γ˜1γ˜2|g. Hence, if g = s, ||C21C− 111s(α1)|| < 1 if s<L/(1+|γ˜1γ˜2|).

Remark 2.

For equal strength instruments, γ˜1=γ˜2, the result of Corollary 1 shows that the necessary condition (Equation16) does not hold for all possible configurations of α1 if s > L/2. For uncorrelated equal strength instruments, the irrepresentable condition (Equation15) holds for all possible configurations of α1 if s < L/2.

4. A Consistent Estimator when s < L/2 and Adaptive Lasso

As the results above highlight, the Lasso path may not include the correct model, leading to an inconsistent estimator of β. This is the case even if less than 50% of the instruments are invalid because of differential instrument strength and/or correlation patterns of the instruments. Indeed, we find in the simulation exercise of Section 5.1 that the Lasso selects the valid instruments as invalid if these are relatively weak, ‖γ21 < ‖γ11, for a design with s(α1) = s(γ1). In this section, we present an estimation method that consistently selects the invalid instruments when less than 50% of the potential instruments are invalid. This is the same condition as that for the Lasso selection problem to satisfy the irrepresentable condition for equal strength uncorrelated instruments, but the proposed estimator below is consistent when the instruments have differential strength and/or have a general correlation structure.

We consider the adaptive Lasso approach of Zou (Citation2006) using an initial consistent estimator of the parameters. In the standard linear case, the OLS estimator in the model with all explanatory variables included is consistent. As explained in Section 3, in the instrumental variables model this option is not available. We build on the result of Han (Citation2008), who shows that the median of the L IV estimates of β using one instrument at the time is a consistent estimator of β in a model with invalid instruments, but where the instruments cannot have direct effects on the outcome, unless the instruments are uncorrelated.

Let Γ^=(Z'Z)-1Z'y; γ^=(Z'Z)-1Z'd and let π^ be the L-vector with jth element (19) π^j=Γ^jγ^j.(19) Under the standard assumptions, Theorem 1 shows that the median of the π^j, denoted β^m, is a consistent estimator for β when s < L/2, without any further restrictions on the relative strengths or correlations of the instruments. Theorem 1 also shows that n(β^m-β) converges in distribution to that of an order statistic. From these results it follows that the consistent estimator α^m=Γ^-γ^β^m can be used for the adaptive Lasso approach of Zou (Citation2006), resulting in oracle properties of the resulting estimator of β.

Theorem 1.

Under model specifications (Equation3) and (Equation8) with Assumptions 14, let π^ be the L-vector with elements as defined in (Equation19). If s < L/2, then the estimator β^m defined as β^m= median π^is a consistent estimator for β, plim β^m=β.Let π^2 be the Ls vector with elements π^j, j = s + 1, …, L. The limiting distribution of β^m is given by nβ^m-βdql,L-s,where for L odd, q[l], Ls is the lth-order statistic of the limiting normal distribution of n(π^2-βιL-s), where l is determined by L, s, and the signs of δj=αjγj, j = 1, …, s. For L even, q[l], Ls is defined as the average of either the [l] and [l − 1]-order statistics, or the [l] and [l + 1]-order statistics.

Proof.

See Section A.2 in the supplementary materials.

Given the consistent estimator β^m, we obtain a consistent estimator for α as α^m=Z'Z-1Z'y-dβ^m=Γ^-γ^β^m,which can then be used for the adaptive Lasso specification of (Equation13) as proposed by Zou (Citation2006). The adaptive Lasso estimator for α is defined as (20) α^adn=argminα12y-Z˜α22+λnl=1Lαlα^m,lυ,(20) and, for given values of υ can be estimated straightforwardly using the LARS algorithm, see Zou (Citation2006). The resulting adaptive Lasso estimator for β is obtained as β^adn=d^'y-Zα^adnd^'d^.As the result for the limiting distribution of the median estimator shows, β^m, although converging at the n rate, has an asymptotic bias. This clearly also results in an asymptotic bias of α^m. As n(α^m-α)=Op(1), Theorem 2 together with Remark 1 in Zou (Citation2006) states the following properties of the adaptive Lasso estimator α^ad(n), where A^ad,n={j:α^ad,j(n)0}.

Proposition 3.

Suppose that λn=o(n) and (n)ν-1λn, then the adaptive Lasso estimator α^ad(n) satisfies

1.

Consistency in variable selection: limnP(A^ad,n=A)=1.

2.

Asymptotic normality: n(α^ad,A(n)-αA)dN(0,σ2C11-1).

Proof.

See Zou (Citation2006), Theorem 2 and Remark 1.

From the results of Proposition 3, it follows that the limiting distribution of β^ad(n) is that of the oracle 2SLS estimator of β, as stated in the next Corollary.

Corollary 2.

Under the conditions of Proposition 3, the limiting distribution of the adaptive Lasso estimator β^ad(n) is given by (21) n(β^adn-β)dN(0,σβor2),(21) with σβor2 as defined in (Equation7).

5. Simulation Results

5.1. Relative Strength of Instruments

We start with presenting some estimation results from a Monte Carlo exercise which is similar to that in KZCS. The data are generated from Yi=Diβ+Zi.'α+εiDi=Zi.'γ+vi,where εiviN00,1ρρ1;Zi.N(0,IL);and we set β = 0; L = 10; ρ = 0.25; s = 3, and the first s elements of α are equal to a = 0.2. Further, γ1=γ˜1ιs and γ2=γ˜2ιL-s. Note that none of the estimation results presented here and below depend on the value of β. presents estimation results for estimators of β in terms of bias, standard deviation, root mean squared error (rmse), and median absolute deviation (mad) for 1000 replications for sample sizes of n = 500, n = 2000, and n = 10, 000 for an equal strength design, with γ˜1=γ˜2=0.2.

Table 1. Estimation results for 2SLS and Lasso estimators for β; L = 10, s = 3, γ˜1=γ˜2.

The information content for IV estimation can be summarized by the concentration parameter, see Rothenberg (Citation1984). For the oracle estimation of β by 2SLS, the concentration parameter is given by μ2n = γ2Z2MZ1Z2γ22v. For this data-generating process with independent instruments, the concentration parameter is therefore approximately n(Ls)(0.22) and hence equal to 140 , 560, and 2800 for the three sample sizes. μ2n can be seen as a population Wald statistic for testing H0: γ2 = 0. The corresponding population F-statistics are equal to n(0.22), or 20, 80, and 400 for the sample sizes 500, 2000, and 10,000, respectively.

A summary measure of the information content for Lasso selection is the (squared) signal-to-noise ratio (SNR), denoted by η2. It is defined as η2=α1'C11α1σε2,see, for example, Bühlmann and van der Geer (Citation2011, p. 25). Analogously to the concentration parameter, nη2 can be interpreted as a population Wald statistic for testing H0: α1 = 0. We analyze the effects of varying μ2n and η2 more extensively in Section B.2 in the supplementary materials, where we derive that, for this design, (22) η2=L-sa2γ˜1γ˜22+L-ss,(22) resulting in η2 = 0.084 for the parameter values considered in Table 1.

The “2SLS” results are for the naive 2SLS estimator of β that treats all instruments as valid. The probability limit of this estimator is given by (23) plim β^naive=β+γ'Qαγ'Qγ=β+γ1'Q11α1+γ2'Q21α1γ1'Q11γ1+2γ2'Q21γ1+γ2'Q22γ2.(23) Therefore, in the design specified here, we have plim (β^naive)=s/L=0.3.

The “2SLS or” is the oracle 2SLS estimator that correctly includes the three invalid instruments in the model as explanatory variables. For the Lasso estimates, the value for λn has been obtained by 10-fold cross-validation, using the one-standard error rule, as in KZCS. This estimator is denoted “Lassocvse” and is the one produced by the sisVIVE routine. We also present results for the cross-validated estimator that does not use the one-standard error rule, denoted “Lassocv.” For the Lasso estimation procedure, we standardize throughout such that the diagonal elements of Z˜'Z˜/n are equal to 1.

We further present results for the so-called post-Lasso estimator, see, for example, Belloni et al. (Citation2012), which is called the LARS-OLS hybrid by Efron et al. (Citation2004). This is here simply the 2SLS estimator in the model that includes ZA^n , the set of instruments with nonzero estimated Lasso coefficients. Clearly, when A^n=A, the post-Lasso 2SLS estimator is equal to the oracle 2SLS estimator. The post-Lasso 2SLS estimator is expected to have a smaller bias as it avoids the bias in the Lasso estimate of β due to the shrinkage of the Lasso estimate of α toward 0, see also Hastie, Tibshirani, and Friedman (Citation2009, p. 91). This shrinkage bias effect on β^(n) for models where AA^n is in the direction of the bias of β^naive, where α is assumed to be 0. (In an OLS setting, Belloni and Chernozhukov (Citation2013) showed that the post-Lasso estimator can perform at least as well as Lasso in terms of rate of convergence, but is less biased even if the Lasso-based model selection misses some components of the true model.)

Further entries in are the average number of instruments selected as invalid, that is, the average number of instruments in A^n={j:α^j(n)0}, together with the minimum and maximum number of selected instruments, and the proportion of times the instruments selected as invalid include all three invalid instruments.

The results in reveal some interesting patterns. First of all, the Lassocv estimator outperforms the Lassocvse estimator in terms of bias, rmse, and mad for all sample sizes, but this is reversed for the post-Lasso estimators, that is, the post-Lassocvse outperforms the post-Lassocv. The Lassocv estimator selects on average around 6.5 instruments as invalid, which is virtually independent of the sample size. The Lassocvse estimator selects on average around 3.8 instruments as invalid for n = 2000 and n = 10, 000, but fewer, 3.16 for n = 500. Although the three invalid instruments are always jointly selected as invalid for the larger sample sizes, the Lassocvse is substantially biased, the biases being larger than twice the standard deviations. The post-Lassocvse estimator performs best, but is still outperformed by the oracle 2SLS estimator at n = 10, 000. Although the post-Lassocvse estimator has a larger standard deviation than the Lassocvse estimator, it has a smaller bias, rmse, and mad for all sample sizes.

We focus below on the performance of the median and adaptive Lasso estimators for a design with invalid instruments that are stronger than the valid ones, but for comparison we present results for these estimators for this equal strength instruments design in Section B.1 in the supplementary materials, which also includes a more detailed analysis of the differences in performances of the Lasso and post-Lasso estimators in this design.

presents estimation results for the same Monte Carlo design as in , but now with stronger invalid than valid instruments, with γ˜2=0.2 and γ˜1=3γ˜2. At these relative values, the necessary condition (Equation16) is not satisfied and the Lasso selection will here select the valid instruments as invalid. Note that the behavior of the oracle 2SLS estimator is the same as in . In this case, β+a/γ˜2=0+0.2/0.6=0.33 , which is the parameter value estimated by the invalid instruments. From (Equation22), it follows that the SNR is smaller here, with η2 = 0.0247. The estimation results for the adaptive Lasso are based on setting υ=1. The resulting estimators are denoted as “ALasso.” As L is even here, the median is defined as β^m=(π^[5]+π^[6])/2, where π^[j] is the jth-order statistic.

Table 2. Estimation results for estimators of β; L = 10, s=3,γ˜1=3γ˜2.

The results in confirm that, for large sample sizes, the Lasso selects the valid instruments as invalid because of the relative strength of the invalid instruments. The post-ALassocvse estimator does not perform well for n = 500, but does for the sample sizes of n = 2000, and n = 10, 000, with results for the latter very similar to the oracle 2SLS results. The Post-ALassocv estimator performs better at n = 500, as it selects more instruments as invalid with a larger proportion correctly selecting all invalid instruments, although it is outperformed there by the simple median estimator β^m.

5.2. Alternative Stopping Rule

The results for the Lasso estimator in show that the 10-fold cross-validation method tends to select too many valid instruments as invalid over and above the invalid ones, and that the ad hoc one-standard error rule does improve the selection. The fact that the cross-validation method selects too many variables is well known, see, for example, Bühlmann and van der Geer (Citation2011), who argued that use of the cross-validation method is appropriate for prediction purposes, but that the penalty parameter needs to be larger for variable selection, as achieved by the one-standard error rule. Selecting valid instruments as invalid in addition to correctly selecting the invalid instruments clearly does not lead to an asymptotic bias, but results in a less efficient estimator as compared to the oracle estimator.

We propose a stopping rule for the LARS/Lasso algorithm based on the approach of Andrews (Citation1999) for moment selection, which is particularly well-suited for the IV selection problem. We can use this approach because the number of instruments Ln. This stopping rule is computationally less expensive than cross-validation.

Consider again the oracle model (24) y=dβ+ZAαA+ε=RAθA+ε.(24) Let gn(θA) = n− 1Z′(y − RAθA), and Wn a kz × kz weight matrix, then the oracle generalized method of moments (GMM) estimator is defined as θ^A,gmm=argminθAgn(θA)'Wn-1gn(θA),see Hansen (Citation1982). 2SLS is a one-step GMM estimator, setting Wn = n− 1ZZ. Given the moment conditions E[Zi.ϵi] = 0, 2SLS is efficient under conditional homoscedasticity, E2i|Zi.) = σ2ϵ. Under general forms of conditional heteroscedasticity, an efficient two-step oracle GMM estimator is obtained by setting Wn=Wn(θ^A,1)=n-1i=1n((yi-RA,i.'θ^A,1)2Zi.Zi.'),where θ^A,1 is an initial consistent estimator, with a natural choice the 2SLS estimator. Then, under the null that the moment conditions are correct, E[Zi.ϵi] = 0, the Hansen (Citation1982) J-test statistic and its limiting distribution are given by Jnθ^A,gmm=ngnθ^A,gmm'Wn-1θ^A,1gnθ^A,gmmdχL-dimRA2.For any set A+, such that AA+, we have that Jnθ^A+,gmmdχL-dimRA+2,whereas for any set A, such that A¬A-, Jn(θ^A-,gmm)=Op(n).

Note that the J-test is a robust score, or Lagrange multiplier, test for testing H0: αC = 0 in the just identified specification y=dβ+ZBαB+ZCαC+ε,where ZB is a kB set of instruments included in the model and ZC is any selection of LkB − 1 instruments from the LkB set of instruments not in ZB, see, for example, Davidson and MacKinnon (Citation1993, p. 235). This makes clear the link between the J-test and testing for additional invalid instruments of the form as specified in model (Equation3).

We can now combine the LARS/Lasso algorithm with the Hansen J-test, which is a directed downward testing procedure in the terminology of Andrews (Citation1999). Compute Jn(θ^A^n[j]) at every LARS/Lasso step j = 0, 1, 2, …, where A^n[0]= and A^n[1]0=1, compare it to a corresponding critical value ζn, Lk of the χ2(Lk) distribution, where k=dim(RA^n[j]) . We then select the model with the largest degrees of freedom Lk, for which Jn(θ^A^n[j]) is smaller than the critical value. If two models of the same dimension pass the test, which can happen with a Lasso step, the model with the smallest value of the J-test gets selected. (If there is no empirical evidence at all for any invalid instruments, that is, if Jn(θ^A^n[0]) is smaller than its corresponding critical value, then the model with all instruments as valid gets selected.) Clearly, this approach is a post-Lasso approach, where the LARS/Lasso algorithm is used purely for selection of the invalid instruments. For consistent model selection, the critical values ζn, Lk need to satisfy (25) ζn,L-kforn,andζn,L-k=on,(25) see Andrews (Citation1999).

As the oracle model is on the adaptive LARS/Lasso path in large samples, this approach leads to consistent selection, limnP(A^n,ahad=A)=1, the subscript ah standing for Andrews/Hansen. As Guo et al. (Citation2018, Theorem 2) showed, consistent selection implies that the limiting distribution of the 2SLS estimator β^A^n,ahad is the same as that of the oracle 2SLS estimator, that is, n(β^A^n,ahad-β)dN(0,σβor2). We call β^A^n,ahad the post-ALassoah estimator. This approach also leads to consistent selection along the Lasso path when the irrepresentable condition (Equation15) holds, resulting in oracle properties of the resulting post-Lassoah estimator.

Let ζn, Lk = χ2Lk(pn) be the 1 − pn quantile of the χ2Lk distribution. Here, pn is the p-value of the test. This combination of the Andrews/Hansen method with the LARS/Lasso steps therefore results in having to choose a p-value pn instead of a penalty parameter λn. Keeping n fixed, choosing a large value for pn leads to selecting a larger set as invalid instruments as compared to choosing a smaller value for pn. Finite sample inference will not be straightforward, as this method is essentially a sequential approach where the model at step j is only considered when the model at step j − 1 is rejected. Using the consistent selection properties, we will investigate the behavior of the Wald test in the next section and find in our simulation designs that this method performs quite well and similar to the ALassocvse method in the unequal instrument strength design, and also performs well using the post-Lassoah estimator for the equal strength design.

presents the estimation results using this stopping rule as a selection device for the Lasso estimator for the design with equal strength instruments and the adaptive Lasso estimator for the unequal instrument strength design, as in and . We denote the resulting 2SLS estimators as ”post-(A)Lassoah.” The p-values here are chosen as pn = 0.1/ln (n), following Belloni et al. (Citation2012), and are equal to 0.0161, 0.0132, and 0.0109 for n equal to 500, 2000, and 10, 000, respectively. For the equal strength design, the ah approach selects too few invalid instruments for n = 500, resulting in an upward bias, with bias, std dev, rmse, and mad very similar to those of the post-Lassocvse estimator in . For n = 2000 and n = 10, 000, this post-Lasso procedure performs well with properties very similar to that of the oracle 2SLS estimator, and with smaller bias, rmse, and mad than the post-Lassocvse method. For the unequal strength design, for n = 10, 000 the results are virtually identical to those of the oracle and post-ALassocvse estimators, whereas the post-ALassoah estimator performs better in terms of bias, std dev, rmse, and mad than the post-ALassocvse estimator when n = 2000. Again, when n = 500, the method does not select the invalid instruments.

Table 3. Results for post-(A)Lassoah 2SLS estimators for β; L = 10, s = 3.

5.3. Inference

From the limiting distribution result (Equation21), a simple approach to estimating the asymptotic variance of the post-ALasso 2SLS estimator for β is by calculating the standard 2SLS variance estimator. The post-ALasso 2SLS estimator is given by β^ad,postn=d^'MZA^ad,nd^-1d^'MZA^ad,nyand its estimated variance given by (26) va^rβ^ad,postn=σ^ε2d^'MZA^ad,nd^-1,(26) where σ^ε2=ε^'ε^/n, ε^=y-dβ^ad,post(n)-ZA^ad,nα^A^ad,n,post(n). Under the conditions of Proposition 3, the standard assumptions and conditional homoscedasticity, nva^r(β^ad,post(n))pσβor2. A standard robust version, robust to general forms of heteroscedasticity, is given by va^rrβ^ad,postn=d^'MZA^ad,nd^-1d^'MZA^ad,nH^MZA^ad,nd^d^'MZA^ad,nd^-1,where H^ is an n × n diagonal matrix with diagonal elements H^ii=ε^i2, for i = 1, …, n. The robust Wald test for the null H0: β = β0 is then given by Wβ,r=(β^ad,post(n)-β0)2va^rr(β^ad,postn).

From the results for the post-ALassocvse and post-ALassoah estimators for the unequal strength instruments design as presented in and , respectively, one would expect this approach to work well for the large sample case, n = 10,000, as there the estimation results are very close to those of the oracle 2SLS estimator. The robust Wald test for the null H0: β = 0, the true value of β, at the 10% level for n = 10,000 has a rejection frequency of 9.3% and 9.2% for the post-ALassocvse and post-ALassoah estimators, respectively, very close to that of the robust Wald test based on the oracle 2SLS estimator, which has a rejection frequency of 9.0%.

For the equal strength instruments design, we perform the same analysis for the post-Lasso estimators. (a)–(c) shows the performance of the robust Wald test Wβ, r, its rejection frequency at the 10% level, as a function of the sample size in steps of 500, n = 500, 1000, …, 5000. (a) and (b) shows the results for the post-Lasso and post-ALasso estimators for the equal strength instruments design. (c) shows the results for the post-ALasso estimators for the unequal strength instruments design.

Figure 1. (a–c) Rejection frequencies of robust Wald tests for H0: β = 0 at 10% level as a function of sample size, in steps of 500. Equal strength instruments design, Post-Lasso in (a), Post-ALasso in (b). Unequal strength instruments design, Post-ALasso in (c). Based on 1000 MC replications for each sample size.

Figure 1. (a–c) Rejection frequencies of robust Wald tests for H0: β = 0 at 10% level as a function of sample size, in steps of 500. Equal strength instruments design, Post-Lasso in (a), Post-ALasso in (b). Unequal strength instruments design, Post-ALasso in (c). Based on 1000 MC replications for each sample size.

(a) clearly shows that the Lassocv and Lassocvse procedures do not result in consistent selection and the resulting post-Lasso estimators do not have oracle properties. The Wald test rejection frequencies remain constant for increasing sample size and larger than those of the oracle estimator. In contrast, the post-Lassoah estimator behaves very similar to the oracle estimator in this design from n = 1500 onward. (b) shows that both the post-ALassocvse and post-ALassoah behave like the oracle estimator, again from n = 1500 onward in this design. The results in (c) show that for the unequal instruments strength design considered here, the performances of the post-adaptive Lasso estimators are far from that of the oracle estimator in small samples, as expected from the results in and . The post-ALassoah behaves like the oracle estimator here from n = 4000 onward, with the post-ALassocvse estimator behaving similarly, but having a larger rejection frequency for all sample sizes considered here that are less than n = 5000.

The results in and (a)–(c) show clearly that the information content in the data, given the parameter values chosen here, is insufficient at n = 500 for the (adaptive) Lasso procedures to correctly select the invalid instruments and hence the resulting estimators have poor properties, far removed from those of the oracle estimator. At these levels of information, the ALassocv estimator is actually the preferred estimator as it counteracts the selection of too few invalid instruments of the ALassocvse and ALassoah estimators. We further explore how the performances of the estimators depend on the information content of the data-generating process in Section B.2 in the supplementary materials.

6. The Effect of BMI on Diastolic Blood Pressure Using Genetic Markers as Instruments

We use data on 105,276 individuals from the UK Biobank and investigate the effect of BMI on diastolic blood pressure (DBP). See Sudlow et al. (Citation2015) for further information on the UK Biobank. We use 96 single nucleotide polymorphisms (SNPs) as instruments for BMI as identified in independent GWAS studies, see Locke et al. (Citation2015).

With Mendelian randomization studies, the SNPs used as potential instruments can be invalid for various reasons, such as linkage disequilibrium, population stratification, and horizontal pleiotropy, see, for example, von Hinke et al. (Citation2016) or Davey Smith and Hemani (Citation2014). For example, an SNP has pleiotropic effects if it not only affects the exposure but also has a direct effect on the outcome. While we guard against population stratification by considering only white European origin individuals in our data, the use of the Lasso methods can be extremely useful here to identify the SNPs with direct effects on the outcome and to estimate the causal effect of BMI on diastolic blood pressure taking account of this.

Because of skewness, we log-transformed both BMI and DBP. The linear model specification includes age, age2, and sex, together with 15 principal components of the genetic relatedness matrix as additional explanatory variables. presents the estimation results for the causal effect parameter, which is here the percentage change in DBP due to a 1% change in BMI. As p-value for the Hansen test-based procedures we take again 0.1/ln (n) = 0.0086.

Table 4. Estimation results, the effect of ln(BMI) on ln(DBP).

The OLS estimate of the causal parameter is equal to 0.206 (s.e. 0.003), whereas the 2SLS estimate treating all 96 instruments as valid is much smaller at 0.087 (s.e. 0.016), with a 95% confidence interval of [0.056, 0.118]. The J-test, however, rejects the null that all the instruments are valid. The Lassocv estimator identifies a large number of 56 instruments as invalid and the Lassocv estimate is equal to 0.126, the post-Lassocv estimate is equal to 0.145. The Lassocvse procedure identifies 20 instruments as invalid and the Lassocvse estimate is equal to 0.111. The post-Lassocvse estimate is larger and equal to 0.142, which is in line with our findings above that the Lasso estimator is biased toward the 2SLS estimator that treats all instruments as valid due to shrinkage. The post-Lassoah procedure selects a subset of 12 instruments as invalid, and the post-Lassoah parameter estimate is equal to 0.122.

The median estimate β^m is equal to 0.148. Using this estimate for the adaptive Lasso results in the cv method selecting 54 instruments as invalid and the cvse method selecting 17 instruments as invalid. The adaptive Lassoah method selects a subset of 11 instruments as invalid. The post-ALassocv, post-ALassocvse, and post-ALassoah estimates are equal to 0.161, 0.151, and 0.163, respectively, with the 95% confidence intervals of the post-ALassocvse and post-ALassoah estimators given by [0.113,0.189] and [0.127,0.198 ], respectively. These results indicate that the OLS estimator is less confounded than suggested by the 2SLS estimation results using all 96 instruments as valid instruments.

The strongest potential instrument is the FTO SNP. For all Lasso estimators in , it is selected as an invalid instrument. The value for π^FTO=-0.009, that is, negative, which is contrary to the direction of the found causal effect.

The F-test statistic for H0: γ2 = 0 for the model resulting from the ALassoah procedure is equal to 18.21 with the associated estimate of the concentration parameter equal to 1547.81. The F-test result indicates that the 2SLS estimator may have some many weak instruments bias, see Stock and Yogo (Citation2005). However, the LIML (limited information maximum likelihood) estimator in this model is very similar to the 2SLS estimator and is equal to 0.159 (s.e. 0.019), indicating that there is not a many weak instruments problem here, see Davies et al. (Citation2015).

7. Conclusions

Instrumental variables estimation is a well-established procedure for the identification and estimation of causal effects of exposures on outcomes where the observed relationships are confounded by nonrandom selection of exposure. The main identifying assumption is that the instruments satisfy the exclusion restriction, that is, they only affect the outcomes through their relationship with the exposure. In an important contribution, Kang et al. (Citation2016) showed that the Lasso method for variable selection can be used to select invalid instruments in linear IV models, even though there is no prior knowledge about which instruments are valid.

We have shown here that, even under the sufficient condition for identification that less than 50% of the instruments are invalid, the Lasso selection may select the valid instruments as invalid if the invalid instruments are relatively strong, that is, the case where an invalid instrument explains more of the exposure variance than a valid instrument. Consistent selection of invalid instruments also depends on the correlation structure of the instruments.

We show that a median estimator is consistent when less than 50% of the instruments are invalid, and its consistency does not depend on the relative strength of the instruments or their correlation structure. This initial consistent estimator can be used for the adaptive Lasso estimator of Zou (Citation2006) and we show that it performs well for larger sample sizes/information settings in our simulations. This adaptive Lasso estimator has the same limiting distribution as the oracle 2SLS estimator, and solves the inconsistency problem of the Lasso method when the relative strength of the invalid instruments is such that the Lasso method selects the valid instruments as invalid.

Supplementary Materials

The document contains the proofs of Proposition 1 and Theorem 1 in Section A, and further simulation results and discussions in Section B.

The Stata module “SIVREG” implements the post-ALassoah method. Further details and documentation are provided in Farbmacher (Citation2017).

Acknowledgments

Helpful comments were provided by Kirill Evdokimov, Chirok Han, Whitney Newey, Hyunseung Kang, Chris Skeels, Martin Spindler, Jonathan Temple, Ian White, and seminar participants at Amsterdam, Bristol, Lausanne, Monash, Oxford, Princeton, Seoul, Sydney, the RES Conference Brighton, the Info-Metrics Conference Cambridge, and the UK Causal Inference Meeting London.

Supplemental material

Supplemental Material

Download Zip (235.8 KB)

Additional information

Funding

This research was partly funded by the Medical Research Council, MC_UU_12013/1, MC_UU_12013/9, and MC_UU_00011/1. Neil Davies further acknowledges support from the Economics and Social Research Council via a Future Leaders Grant, ES/N000757/1. Helmut Farbmacher acknowledges funding from the Fritz Thyssen Stiftung.

References

  • Andrews, D. W. K. (1999), “Consistent Moment Selection Procedures for Generalized Method of Moments Estimation,” Econometrica, 67, 543–564.
  • Angrist, J. D., and Krueger, A. B. (1991), “Does Compulsory School Attendance Affect Schooling and Earnings?” Quarterly Journal of Economics, 106, 979–1014.
  • Bekker, P. A. (1994), “Alternative Approximations to the Distributions of Instrumental Variable Estimators,” Econometrica, 62, 657–681.
  • Belloni, A., Chen, D., Chernozhukov, V., and Hansen, C. (2012), “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain,” Econometrica, 80, 2369–2429.
  • Belloni, A., and Chernozhukov, V. (2013), “Least Squares After Model Selection in High-dimensional Sparse Models,” Bernoulli, 19, 521–547.
  • Belloni, A., Chernozhukov, V., and Hansen, C. (2014), “Inference on Treatment Effects after Selection among High-Dimensional Controls,” Review of Economic Studies, 81, 608–650.
  • Bowden, J., Smith, G. D., Burgess, S. (2015), “Mendelian Randomization with Invalid Instruments: Effect Estimation and Bias Detection through Egger Regression,” International Journal of Epidemiology, 44, 512–525.
  • Bühlmann, P., and van der Geer, S. (2011), Statistics for High-Dimensional Data. Methods Theory and Applications (Springer Series in Statistics), Heidelberg: Springer.
  • Burgess, S., Small, D. S., and Thompson, S. G. (2017), “A Review of Instrumental Variable Estimators for Mendelian Randomization,” Statistical Methods in Medical Research, 26, 2333–2355.
  • Cheng, X., and Liao, Z. (2015), “Select the Valid and Relevant Moments: An Information-based LASSO for GMM with Many Moments,” Journal of Econometrics, 186, 443–464.
  • Chernozhukov, V., Hansen, C., and Spindler, M. (2015), “Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments,” American Economic Review, 105, 486–490.
  • Clarke, P. S., and Windmeijer, F. (2012), “Instrumental Variable Estimators for Binary Outcomes,” Journal of the American Statistical Association, 107, 1638–1652.
  • Davey Smith, G., and Hemani, G. (2014), “Mendelian randomization: Genetic Anchors for Causal Inference in Epidemiological Studies,” Human Molecular Genetics, 23, R89–R98.
  • Davidson, R., and MacKinnon, J. G. (1993), Estimation and Inference in Econometrics, Oxford: Oxford University Press.
  • Davies, N. M., von Hinke Kessler Scholder, S., Farbmacher, H., Burgess, S., Windmeijer, F., and Smith, G. D. (2015), “The Many Weak Instruments Problem and Mendelian Randomization,” Statistics in Medicine, 34, 454–468.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), “Least Angle Regression,” The Annals of Statistics, 32, 407–451.
  • Farbmacher, H. (2017), SIVREG: Stata Module to Perform Adaptive Lasso with Some Invalid Instruments, Statistical Software Components S458394, Boston College Department of Economics.
  • Greenland, S. (2000), “An Introduction to Instrumental Variables for Epidemiologists,” International Journal of Epidemiology, 29, 722–729.
  • Guo, Z., Kang, H., Cai, T., and Small, D. (2018), “Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting,” Journal of the Royal Statistical Society, Series B, 80, 793–815.
  • Han, C., (2008), “Detecting Invalid Instruments using L1-GMM, Economics Letters, 101, 285–287.
  • Hansen, C., Hausman, J., and Newey, W. K. (2008), “Estimation with Many Instrumental Variables,” Journal of Business & Economic Statistics, 26, 398–422.
  • Hansen, L. P. (1982), “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, 50, 1029–1054.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning. Data Mining, Inference, and Prediction (Springer Series in Statistics, 2nd ed.), New York: Springer Science and Business Media.
  • Imbens, G. W. (2014), “Instrumental Variables: An Econometrician’s Perspective,” Statistical Science, 29, 323–358.
  • Kang, H., Zhang, A., Cai, T. T., and Small, D. S. (2016), “Instrumental Variables Estimation with some Invalid Instruments and its Application to Mendelian Randomization,” Journal of the American Statistical Association, 111, 132–144.
  • Kolesar, M., Chetty, R., Friedman, J., Glaeser, E., Imbens, G. W. (2015), “Identification and Inference with Many Invalid Instruments,” Journal of Business and Economic Statistics, 33, 474–484.
  • Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., and Davey Smith, G. (2008), “Mendelian Randomization: Using Genes as Instruments for Making Causal Inferences in Epidemiology,” Statistics in Medicine, 27, 1133–1163.
  • Liao, Z. (2013), “Adaptive GMM Shrinkage Estimation with Consistent Moment Selection,” Econometric Theory, 29, 857–904.
  • Lin, W., Feng, R., and Li, H. (2015), “Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics,” Journal of the American Statistical Association, 110, 270–288.
  • Locke, A. E. (2015), “Genetic Studies of Body Mass Index Yield New Insights for Obesity Biology,” Nature, 518, 197–206.
  • Meinshausen, N., and Bühlmann, P. (2006), “High-Dimensional Graphs and Variable Selection with the Lasso,” Annals of Statistics, 34, 1436–1462.
  • Newey, W. K., and Windmeijer, F. (2009), “Generalized Methods of Moments with Many Weak Moment Conditions,” Econometrica, 77, 687–719.
  • Rothenberg, T. J. (1984), “Approximating the Distributions of Econometric Estimators and Test Statistics,” in Handbook of Econometrics (Vol. 2), eds. Z. Griliches, and M. D. Intriligator, Amsterdam: North Holland, pp. 881–935.
  • Sargan, J. D. (1958), “The Estimation of Economic Relationships Using Instrumental Variables,” Econometrica, 26, 393–415.
  • Staiger, D., and Stock, J. H. (1997), “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65, 557–586.
  • Stock, J. H., and Yogo, M. (2005), “Testing for Weak Instruments in Linear IV Regression,” in Identification and Inference for Econometric Models, Essays in Honor of Thomas Rothenberg, eds. D. W. K. Andrews, and J. H. Stock, New York: Cambridge University Press, pp. 80–108.
  • Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., Liu, B., Matthews, P., Ong, G., Pell, J., Silman, A., Young, A., Sprosen, T., Peakman, T., and Collins, R. (2015), “UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age,” PLoS Medicine, 12, e1001779.
  • von Hinke, S., Smith, G. D., Lawlor, D. A., Propper, C., and Windmeijer, F. (2016), “Genetic Markers as Instrumental Variables,” Journal of Health Economics, 45, 131–148.
  • Wainwright, M. J. (2009), “Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using ℓ1-Constrained Quadratic Programming (Lasso), IEEE Transactions on Information Theory, 55, 2183–2202.
  • Zhao, P., and Yu, B. (2006), “On Model Selection Consistency of Lasso,” Journal of Machine Learning Research, 7, 2541–2563.
  • Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429.