1,540
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Double Negative Control Inference in Test-Negative Design Studies of Vaccine Effectiveness

ORCID Icon, ORCID Icon, &
Received 28 Mar 2022, Accepted 31 Mar 2023, Published online: 10 Jul 2023

Abstract

The test-negative design (TND) has become a standard approach to evaluate vaccine effectiveness against the risk of acquiring infectious diseases in real-world settings, such as Influenza, Rotavirus, Dengue fever, and more recently COVID-19. In a TND study, individuals who experience symptoms and seek care are recruited and tested for the infectious disease which defines cases and controls. Despite TND’s potential to reduce unobserved differences in healthcare seeking behavior (HSB) between vaccinated and unvaccinated subjects, it remains subject to various potential biases. First, residual confounding may remain due to unobserved HSB, occupation as healthcare worker, or previous infection history. Second, because selection into the TND sample is a common consequence of infection and HSB, collider stratification bias may exist when conditioning the analysis on tested samples, which further induces confounding by latent HSB. In this article, we present a novel approach to identify and estimate vaccine effectiveness in the target population by carefully leveraging a pair of negative control exposure and outcome variables to account for potential hidden bias in TND studies. We illustrate our proposed method with extensive simulations and an application to study COVID-19 vaccine effectiveness using data from the University of Michigan Health System. Supplementary materials for this article are available online.

1 Introduction

1.1 Test-Negative Design Studies of Vaccine Effectiveness

The test-negative design (TND) has become a standard approach to evaluate real-world vaccine effectiveness (VE) against the risk of acquiring infectious diseases (Sullivan, Feng, and Cowling Citation2014; Chua et al. Citation2020). In an outpatient Influenza VE TND study, for example, symptomatic individuals seeking care and meeting eligibility criteria are enrolled and their Influenza virus infection status is subsequently confirmed via a laboratory test. VE against flu infection is then measured by comparing the prevalence of vaccination between the test-positive “cases” and test-negative “controls” (Jackson and Nelson Citation2013; Jackson et al. Citation2017). Besides Influenza, the TND and its variants have also been used to study VE against pneumococcal disease (Broome, Facklam, and Fraser Citation1980), dengue (Anders et al. Citation2018), rotavirus (Boom et al. Citation2010), and other infectious diseases. Recently, the TND has increasingly been used in post-licensure evaluation of COVID-19 VE (Patel, Jackson, and Ferdinands Citation2020; Dean, Hogan, and Schnitzer Citation2021; Hitchings et al. Citation2021; Thompson et al. Citation2021; Dagan et al. Citation2021; Olson et al. Citation2022).

Test-negative designs are believed to reduce unmeasured confounding bias due to healthcare-seeking behavior (HSB), whereby care seekers are more likely to be vaccinated, have healthier behaviors that reduce the risk of infection, and get tested when ill (Jackson et al. Citation2006). By restricting analysis to care seekers who are tested for the infection in view (e.g., Influenza or COVID-19), the vaccinated and unvaccinated are more likely to share similar HSB and underlying health characteristics. Misclassification of infection status is also reduced because the analysis is restricted to tested individuals (Jackson and Nelson Citation2013).

Sullivan, Tchetgen Tchetgen, and Cowling (Citation2016) used directed acyclic graphs (DAG) to illustrate the rationale behind TND in the context of evaluating VE against flu infection, as shown in . We denote flu vaccination status by A and flu infection by Y, so that the arrow AY represents VE against flu infection. Selection into the TND study sample, denoted by S, is triggered by a subject experiencing flu-like symptoms, seeking care at clinics or hospitals, and getting tested for Influenza infection, hence, the YS edge. Healthcare-seeking behavior, denoted by HSB, may affect S, A, and Y because subjects with certain healthcare-seeking proclivities may be more likely to seek care, take annual flu shots, and participate in healthy and preventive behaviors. The above variables may be subject to effects of other baseline covariates, such as age, season, and high-risk conditions, included in as confounders X. The TND presumes that by restricting recruitment to care seekers, the study subjects essentially have identical HSB; in other words, conditioning the analysis on S = 1 is equivalent to conditioning on HSB =1, which would then completely control for HSB (). Measured covariates X are further adjusted for by including these factors in a logistic regression model or by inverse probability weighting (Bond, Sullivan, and Cowling Citation2016; Thompson et al. Citation2021).

Fig. 1 Causal relationships of variables in a test-negative design. Sullivan, Tchetgen Tchetgen, and Cowling (Citation2016) used (a) to illustrate the causal relationship between variables in a test-negative design in the general population, and used (b) to illustrate the assumption implicit in the common approach to estimate VE from the study data that study subjects have identical healthcare-seeking behavior (HSB) (Sullivan, Tchetgen Tchetgen, and Cowling Citation2016). (c) shows that if HSB remains partially unobserved, then the backdoor paths AHSBY and AHSBS=1Y indicate unmeasured confounding bias and selection bias, respectively. Other unmeasured confounders, such as occupation as a healthcare worker and previous infection, open additional backdoor paths between A and Y and result in additional confounding bias. (d) shows a simplified DAG from (c) that combines the unmeasured confounders into a single variable U. (e) illustrates our approach to estimate VE leveraging negative control exposure Z and outcome W. Dashed arrows indicate effects that are not required. (f) shows a scenario with the AS arrow where the causal odds ratio can still be identified under additional assumptions.

Fig. 1 Causal relationships of variables in a test-negative design. Sullivan, Tchetgen Tchetgen, and Cowling (Citation2016) used (a) to illustrate the causal relationship between variables in a test-negative design in the general population, and used (b) to illustrate the assumption implicit in the common approach to estimate VE from the study data that study subjects have identical healthcare-seeking behavior (HSB) (Sullivan, Tchetgen Tchetgen, and Cowling Citation2016). (c) shows that if HSB remains partially unobserved, then the backdoor paths A←HSB→Y and A←HSB→S=1←Y indicate unmeasured confounding bias and selection bias, respectively. Other unmeasured confounders, such as occupation as a healthcare worker and previous infection, open additional backdoor paths between A and Y and result in additional confounding bias. (d) shows a simplified DAG from (c) that combines the unmeasured confounders into a single variable U. (e) illustrates our approach to estimate VE leveraging negative control exposure Z and outcome W. Dashed arrows indicate effects that are not required. (f) shows a scenario with the A→S arrow where the causal odds ratio can still be identified under additional assumptions.

However, the TND remains subject to potential hidden bias. First, it is unrealistic that all study subjects seeking care are lumped into a single category HSB =1. It may be more realistic that HSB is not a deterministic function of S and remains a source of confounding bias even after conditioning on S. Furthermore, there might be other mismeasured or unmeasured confounders, denoted by U. For example, healthcare workers are at increased risk of flu infection due to higher exposure to flu patients and are more likely to seek care and receive vaccination due to health agency guidelines (Black et al. Citation2018). Prior flu infection history may also be a source of confounding if it alters the likelihood of vaccination and care-seeking, while also providing immunity against circulating strains (Sullivan, Tchetgen Tchetgen, and Cowling Citation2016; Krammer Citation2019). These potential sources of confounding, if not properly accounted for, can result in additional confounding bias, as illustrated in . Finally, collider stratification bias is likely present due to conditioning on S, which is a common consequence of HSB, other risk factors (X, U), and Influenza infection Y (Lipsitch, Jha, and Simonsen Citation2016). That is, conditioning on S unblocks the backdoor path A(X,U,HSB)SY, which would in principle be blocked if study subjects had identical levels of HSB and other risk factors (Sullivan, Tchetgen Tchetgen, and Cowling Citation2016).

Accounting for these potential sources of bias is well known to be challenging, and potentially infeasible without additional assumptions or data. This can be seen in , which is a simplified version of where the unmeasured confounders U include individuals’ occupation as a healthcare worker, previous flu infection, HSB, and so on. indicates that the unmeasured confounders U induce both confounding bias through the path AUY and collider stratification bias through the path AUSY. In presence of both unmeasured confounding and collider bias, causal bounds may be available (Gabriel, Sachs, and Sjölander Citation2022) but likely too wide to be informative; causal identification in TND therefore remains to date an important open problem in the causal inference literature which we aim to resolve.

1.2 Negative Control Methods

In recent years, negative control variables have emerged as powerful tools to detect, reduce, and potentially correct for unmeasured confounding bias (Lipsitch, Tchetgen Tchetgen, and Cohen Citation2010; Miao, Geng, and Tchetgen Tchetgen Citation2018; Shi, Miao, and Tchetgen Tchetgen Citation2020). The framework requires that at least one of two types of negative control variables are available which are a priori known to satisfy certain conditions: a negative control exposure (NCE) known to have no direct effect on the primary outcome; or a negative control outcome (NCO), known not to be an effect of the primary exposure. Such negative control variables are only valid and therefore useful to address unmeasured confounding in a given setting to the extent that they are subject to the same source of confounding as the exposure-outcome relationship of primary interest. Thus, an observed association between a valid NCE and the outcome (conditional on the exposure and observed covariates) or one between a valid NCO and the exposure can indicate the presence of residual confounding bias. For example, in a cohort study to investigate flu VE against hospitalization and death among seniors, to detect potential unmeasured confounding, Jackson et al. (Citation2006) used hospitalization/death before and after the flu season as NCOs and found that the association between flu vaccination and hospitalization was virtually the same before and during the flu season, suggesting that the lower hospitalization rate observed among vaccinated seniors versus unvaccinated seniors was partially due to healthy-user bias.

Recently, new causal methods have been proposed to not only detect residual confounding when present, but also to potentially de-bias an observational estimate of a treatment causal effect in the presence of unmeasured confounders when both an NCE and an NCO are available, referred to as the double negative control (Miao, Geng, and Tchetgen Tchetgen Citation2018; Tchetgen Tchetgen et al. Citation2020). In this recent body of work, the double negative control design was extended in several important directions including settings in which confounding proxies routinely measured in well-designed observational studies may be used as negative control variables, a framework termed proximal causal inference; longitudinal settings where one is interested in the joint effects of time-varying exposures (Ying et al. in press), potentially subject to both measured and unmeasured confounding by time-varying factors; and in settings where one aims to estimate direct and indirect effects in mediation analysis subject to unmeasured confounding or unmeasured mediators (Dukes, Shpitser, and Tchetgen Tchetgen in press; Ghassami, Shpitser, and Tchetgen Tchetgen Citation2023). Other recent papers in this fast-growing literature include Qi, Miao, and Zhang (Citation2022), Egami and Tchetgen Tchetgen (in press), Kallus, Mao, and Uehara (Citation2022), Imbens, Kallus, and Mao (Citation2021), Deaner (Citation2021), Ghassami et al. (Citation2022), and Ghassami, Shpitser, and Tchetgen Tchetgen (Citation2022). Notably, existing identification results in negative control and proximal causal inference literature have been restricted to iid settings (Miao, Shi, and Tchetgen Tchetgen Citation2020) and time series settings (Shi et al. Citation2023), and to date, to the best of our knowledge, outcome-dependent sampling settings such as TND have not been considered, particularly one where confounding and selection bias might co-exist.

1.3 Contribution and Outline

In this article, we introduced a novel double negative control approach to debias VE estimates from a test-negative design study. To our knowledge, this is the first work in the negative control method literature to address coexistent confounding and selection bias in the setting of a post-market vaccine effectiveness study. In Sections 2.1, we introduce notation and the identification under no unmeasured confounding nor selection bias . Next we develop our identification strategy and describe a new debiased estimator under a double negative control TND study in Sections 2.2–2.4, assuming no direct effect of vaccination on selection into the TND sample. In Section 2.5, we relax this assumption and introduce sufficient conditions under which VE remains identified. In Section 3, we demonstrate the performance of our method with simulations. In Section 4, the approach is further illustrated in an application to estimate COVID-19 VE against infection in a TND study nested within electronic health records from University of Michigan Health System. We conclude with a discussion in Section 5. We relegate all proofs, derivations, additional tables and figures, and detailed discussions to Sections A–L of the supplementary materials.

2 Method

2.1 Preliminary: Estimation Under No Unmeasured Confounding and No Selection Bias

To fix ideas, we first review estimation assuming all confounders (U, X) are fully observed and the study sample is randomly drawn (rather than selected by testing) from a source population, referred to as the “target population.” That is, we observe data on (A,Y,U,X) which are independent and identically distributed in the target population. For each individual, we write Y(a) as the binary potential infection outcome had, possibly contrary to fact, the person’s vaccination status been A = a, a = 0, 1. Our goal is to provide identification and estimation strategies for the causal risk ratio (RR) defined as RR=E[Y(1)]/E[Y(0)]. Let β0 denote the log causal RR, that is, RR=exp(β0). Following Hudgens and Halloran (Citation2006), we define VE as one minus the causal RR: VE=1exp(β0). Let Q(A=a,U,X)=1/P(A=a|U,X) denote the inverse of the probability of vaccination status A = a given confounders. Under the standard assumptions of consistency (which involves the assumption of no interference: a subject’s potential outcome is not affected by the treatment of other subjects Cole and Frangakis Citation2009), ignorability (given U, X) and positivity (Hernán and Robins Citation2020), it is well known that, if U were observed, the mean potential outcome E[Y(a)] can be identified by inverse probability of treatment weighting (IPTW): (1) P[Y(a)=1]=E[I(A=a)Q(A=a,U,X)Y],(1) for a = 0, 1. Therefore, the log causal RR β0 satisfies the following equation E[Q(A=1,U,X)AYexp(β0)]E[Q(A=0,U,X)(1A)Y]=0.

Equivalently, we have (2) E[V0(A,Y,U,X;β0)]=0(2) where V0(A,Y,U,X;β)=(1)1AQ(A,U,X)Yexp(βA) is an unbiased estimating function for β0.

2.2 Tackling Selection Bias under a Semiparametric Risk Model

Next, consider a TND study for which data (A,Y,X,U) is observed only for the tested individuals with S = 1. Because S is influenced by other factors such as infection, the estimating function V0(A,Y,U,X;β0) may not be unbiased with respect to the study sample; that is, E[V0(A,Y,U,X;β0)|U,X,S=1]0 without another assumption about the selection process into the TND sample.

For a TND sample of size n, we denote the ith study subject’s variables as (Ai,Yi,Ui,Xi),i=1,,n. For generalizability, we make the key assumption that vaccination A is unrelated to selection S other than through a subject’s infection status Y and confounders (U, X).

Assumption 1 (Treatment-independent sampling). S​​​A|Y,U,X.

In a TND study, this assumption requires that an individual’s decision to seek care and get tested only depends on the presence of symptoms and his/her underlying behavioral or socioeconomic characteristics, including HSB (contained in (U, X)), and therefore vaccination status does not directly affect selection. Such assumption is commonly made in previous works on test-negative design studies (Jackson and Nelson Citation2013; Sullivan, Tchetgen Tchetgen, and Cowling Citation2016) and selection bias (Didelez, Kreiner, and Keiding Citation2010; Bareinboim and Pearl Citation2012), although the latent factor U is typically assumed absent in such literature. The DAGs in in fact encode this conditional independence condition. We relax this assumption in Section 2.5. Outcome-dependent sampling distinguishes our work from standard proximal causal inference, which has exclusively assumed a random sample of subjects from the target population (Miao, Geng, and Tchetgen Tchetgen Citation2018; Tchetgen Tchetgen et al. Citation2020; Cui et al. in press). We further consider the following effect homogeneity condition.

Assumption 2 (No effect modification). For a = 0, 1, (3) P(Y=1|A=a,U,X)=exp(β0a)g(U,X)(3) where g(U,X)=P(Y=1|A=0,U,X) is an unknown function only restricted by 0P(Y=1|A,U,X)1.

Assumption 2 defines a semiparametric multiplicative risk model which states that vaccine effectiveness, measured on the RR scale, is constant across (U, X) strata in the target population. In other words, VE is not modified by U, X. This assumption is stronger than necessary for our methods but simplifies the exposition. In Section H of the supplementary materials, we relax the assumption to allow for effect modification by measured confounders X. Infection risk for control subjects P(Y=1|A=0,U,X)=g(U,X) is the nonparametric component of the model which is left unspecified.

Under Assumption 2, one can verify that exp(β0)=E[Y(1)]/E[Y(0)], which is the marginal causal RR. Therefore, the estimating Equationequation (2) implies that it is possible to identify β0 even though the potential outcome means E[Y(0)] and E[Y(1)] cannot be identified due to selection bias. The following proposition indicates that the same is true when the data are subject to selection bias of a certain structure.

Proposition 1.

Under Assumptions 1 and 2, the parameter β0 satisfies (4) E[V0(A,Y,U,X;β0)|U,X,S=1]=0.(4)

From Proposition 1, the IPTW estimating function V0 derived from the target population remains unbiased in the TND sample. In principle, one could estimate β0 with β̂, the solution to (5) 1ni=1n(1)1Aic(Xi)Q̂(Ai,Ui,Xi)Yiexp(β̂Ai)=0,(5) where c(·) is a user specified function, and Q̂(Ai,Ui,Xi)=1/P̂(A=Ai|Ui,Xi) is the estimated probability of having vaccination status A = Ai given confounders (Ui, Xi). Letting c(Xi)=1, the resulting estimator (6) β̂0=log{[i=1nQ̂(Ai,Ui,Xi)AiYi]    /[i=1nQ̂(Ai,Ui,Xi)(1Ai)Yi]}(6) is essentially the IPTW estimator of marginal RR in Schnitzer (Citation2022) assuming (Ui, Xi)’s are all observed.

However, Q(A,U,X) cannot be estimated because U is unobserved. Furthermore, even if U were observed, Q̂(Ai,Ui,Xi) may not be identified from the TND sample due to selection bias. In the next section, we describe a new framework to account for unmeasured confounding in a TND setting, leveraging negative control exposure and outcome variables.

2.3 Tackling Unmeasured Confounding Bias Leveraging Negative Controls

2.3.1 Negative Control Exposure (NCE) and the Treatment Confounding Bridge Function

As shown in , suppose that one has observed a valid possibly vector-valued NCE, denoted as Z, which is a priori known to satisfy the following key independence conditions:

Assumption 3 (NCE independence conditions). Z​​​(Y,S)|A,U,X.

Assumption 3 essentially states that any existing ZY association conditional on (X, A) in the target population must be a consequence of their respective association with U, therefore, indicating the presence of confounding bias. Importantly, the NCE must a priori be known to have no causal effect on the infection status (Miao, Shi, and Tchetgen Tchetgen Citation2020). Likewise, the association between Z and S conditional on (X, A) is completely due to their respective association with U. presents a graphical illustration of an NCE that satisfies Assumption 3.

Shi, Miao, and Tchetgen Tchetgen (Citation2020) provided some general guidelines and examples of how to select an NCE in different settings. In the Influenza VE setting, a candidate NCE can be vaccination status for the preceding year, or other vaccination status such as Tdap (Tetanus, Diphtheria, Pertussis) vaccine, as both are known to effectively provide no protection against the circulating flu strain in a given year. We emphasize that an appropriate NCE should have no direct effect on selection into the TND sample. In other words, the selected NCE should be irrelevant to the study’s inclusion/exclusion criteria other than through U, X. We now provide an intuitive description of our approach to leverage Z as an imperfect proxy of U for identification despite not directly observing U.

To motivate the rationale behind identification, ignore selection bias for now and suppose that Q(A,U)=α0+α1A+α2U, also suppressing measured confounders X. Although U is unobserved, suppose further that Z satisfies E[Z|A,U]=γ0+γ1A+γ2U. Then we have U=E[U˜(A,Z)|A,U], where U˜(A,Z)=(Zγ0γ1A)/γ2. Replacing U with U˜(A,Z) in Q(A, U), we get q(A,Z)=α0+α1A+α2U˜(A,Z), which does not depend on unmeasured confounder U and can recover the inverse probability of vaccination from Q(A,U)=E[q(A,Z)|A,U]. If all parameters of q were known, it would naturally follow that the IPTW method in (1) can be recovered by E[Y(a)]=E{I(A=a)E[q(A,Z)|A,U]Y}=A.3E[I(A=a)q(A,Z)Y].

Therefore, β0 can be identified if the distribution of (A, Y, Z) in the target population is available provided that parameters indexing q can be identified. The above insight motivates the following assumption:

Assumption 4 (treatment confounding bridge function). There exists a function q(A,Z,X) that satisfies, for every a, u and x, (7) Q(A=a,U=u,X=x)=E[q(A,Z,X)|A=a,U=u,X=x].(7)

A function q that satisfies (7) is called a treatment confounding bridge function Cui et al. in press. Below we give two examples where (7) admits a closed form solution.

Example 1

(Binary U and Z). Suppose that U is binary, and so is the NCE Z. For simplicity we suppress X. Write pza.u=P(Z=z,A=a|U=u). We prove in Section C of the supplementary materials that q(a, z) has a closed form given by (8) q(a,z)=[p1a.1p1a.0+(p0a.0p0a.1p1a.1+p1a.0)z]/(p0a.0p1a.1p0a.1p1a.0).(8)

The result can be similarly extended to polytomous Z.

Example 2

(Continuous U and Z). Suppose the unmeasured confounder U and the NCE Z are both continuous. Further assume that A|U,XBernoulli([1+exp(μ0AμUAUμXAX)]1)Z|A,U,XN(μ0Z+μAZA+μUZU+μXZX,σZ2).

The treatment confounding bridge function q(A,Z,X) can then be shown equal to (9) q(A,Z,X)=1+exp[(1)A(τ0+τ1A+τ2Z+τ3X)](9) where τ0=μ0AμUAμ0ZμUZσz2μUA22μUZ2, τ1=σz2μUA2μUZ2μUAμAZμUZ, τ2=μUA/μUZ, and τ3=μXAμXZμUA/μUZ.

Formally, (7) defines a Fredholm integral equation of the first kind, with q(A,Z,X) as its solution (Cui et al. in press). Heuristically, the existence of a solution requires that variation in Z induced by U is sufficiently correlated with variation in A induced by U. For instance, in Example 2, the existence of a treatment confounding bridge function amounts to the condition μUZ0, which again requires Z​​​U|A,X. Cui et al. (in press) provided formal conditions sufficient for the existence of the treatment confounding bridge function satisfying EquationEquation (7). These conditions are reproduced for completeness in Section B of the supplementary materials.

Thus, under Assumption 4, we propose to construct a new unbiased estimating function for β0 by replacing Q(A,U,X) with q(A,Z,X) in V0(A,Y,U,X;β0).

Theorem 1

(Moment restriction for β0). Under Assumptions 1–4, we have that E[V1(A,Y,Z,X;β0)|U,X,S=1]=0

where V1(A,Y,Z,X;β0)=(1)1Aq(A,Z,X)Yexp(β0A).

Theorem 1 immediately implies that E[(1)1Ac(X)q(A,Z,X)Yexp(β0A)|S=1]=0for any function c(X). In practice, if one can consistently estimate the treatment confounding bridge function q(A,Z,X) with q̂(A,Z,X), then β0 can be estimated by solving the estimating equation (10) 1ni=1n(1)1Aic(Xi)q̂(Ai,Zi,Xi)Yiexp(β0Ai)=0,(10)

for unidimensional c(·)0, which results in the closed form estimator β̂0=log(c(Xi)q̂(Ai,Zi,Xi)AiYic(Xi)q̂(Ai,Zi,Xi)(1Ai)Yi).

Importantly, although (7) may not have a unique solution, any solution uniquely identifies the causal log RR β0 for a fixed function c(·). Furthermore, although the choice of c(·) does not impair unbiasedness of (10), it does impact efficiency of the resulting estimator β̂0. In practice, one may simply set c(Xi)1.

There remains the question of identifying and estimating the treatment confounding bridge function from the TND sample which we consider next.

2.3.2 Negative Control Outcome (NCO) for Identification of Treatment Confounding Bridge Function

For identification and estimation of q, we propose to leverage NCOs to construct feasible estimating equations for the treatment confounding bridge function as in Cui et al. (in press). Similar to NCEs, NCOs can be viewed as imperfect proxies of U. However, unlike NCEs, a valid NCO, denoted by W, is (i) known a priori not to be a causal effect of either the primary exposure A or the NCE Z; and (ii) is associated with (A, Z) conditional on X only to the extent that it is associated with U. Formally, we make the following assumption.

Assumption 5 (NCO Independence Conditions). (a) W​​​A|U,X; (b) W​​​Z|A,U,X,Y; (c) S​​​(A,Z)|U,X,W,Y.

Assumption 5(a) and (b) formalize the requirement that neither the exposure nor NCE can have direct effects on the NCO. Assumption 5(c) complements Assumption 3 and states that conditioning on W in addition to (A,U,X,Y) does not alter the conditional independence of Z with S. General guidelines for selecting NCOs are discussed by Shi, Miao, and Tchetgen Tchetgen (Citation2020). In TND studies in which there may be multiple possible test-negative illnesses, an NCO may be selected as one or more of the test-negative illnesses. In flu VE studies, a candidate NCO can be an infection whose risk is not causally affected by either A or Z. For example, if the selected NCE is Tdap vaccination, then a potential NCO may be current-year respiratory syncytial virus infection, as its risk is unlikely to be affected by Influenza or Tdap vaccination. Recent outpatient visits for other acute illnesses can also serve as NCO, such as blepharitis, wrist/hand sprain, lipoma, ingrowing nail, etc. (Leung et al. Citation2011). In contrast with an NCE, an NCO can have a direct effect on the selection S. illustrates an NCO W that satisfies Assumption 5(a) and (b).

Similar to Cui et al. (in press), we leverage an NCO as an additional proxy to identify the treatment confounding bridge function. However, a complication arises due to the lack of a random sample from the target population, a key requirement in the approach outlined in Cui et al. (in press). In general, it is not possible to obtain sufficient information about either the distribution of W or that of U in the target population from only the TND data without an additional structural assumption (Bareinboim and Pearl Citation2012). In the following, we avoid imposing such an additional structural assumption by leveraging an important feature of several infectious diseases such as Influenza and COVID-19; mainly that contracting such an infection at any point in time is a rare event in most target populations of interest, and therefore information from the target population relevant to estimating the treatment confounding bridge function can be recovered from the test-negative control group. Formally, we make the following rare disease assumption.

Assumption 6 (Rare infection). There exist a small positive number δ>0 such that (11) P(Y=1|A=a,W=w,U=u,X=x)δ,for almost every a,w,u,x.(11)

Assumption 6 states that infected subjects, whether vaccinated or not and regardless of their NCO, only constitute a small proportion of each (U, X) stratum in the target population. This assumption implies that 11δP(A,Z|U=u,X=x,Y=0)P(A,Z|U=u,X=x)1δ. Thus, under Assumptions 1, 3, and 6, P(A=a,Z=z|U=u,X=x)P(A=a,Z=z|U=u,X=x,Y=0,S=1) for all a,z,x,u. We now introduce a key property of the treatment confounding bridge function in Theorem 2.

Theorem 2

(Identification of the treatment confounding bridge function). Under Assumptions 1, 3, 4, 5, and 6, for a = 0, 1 we have that (1δ)3P(A=a|W,X,Y=0,S=1)    <E[q(a,Z,X)|W,A=a,X,Y=0,S=1]    <1(1δ)3P(A=a|W,X,Y=0,S=1).

Thus, provided δ is small, Theorem 2 suggests that an approximation to the treatment confounding bridge function can be obtained by solving the following integral equation involving only observed data (12) E[q*(A,Z,X)|W,A=a,X,Y=0,S=1]=1/P(A=a|W,X,Y=0,S=1)(12) provided a solution exists. Accordingly, hereafter suppose that the following assumption holds.

Assumption 7 (Existence of a unique solution to (12)). There exists a unique square-integrable function q*(A,Z,X) that satisfies (12).

Heuristically, uniqueness of a solution to (12) requires that variation in W is sufficiently informative about variation in Z, in the sense that there is no variation in W that is not associated with corresponding variation in Z. See Section D of the supplementary materials for further elaboration of completeness condition and Newey and Powell (Citation2003), D’Haultfoeuille (Citation2011) for related use of the assumption in the literature. Below we briefly illustrate Assumption 7 in the examples of Section 2.3.1.

Example 1’. Suppose U and Z are both binary, and a binary NCO W is also observed. Let pza.w=P(Z=z,A=a|W=w,Y=0,S=1) for z,a,w{0,1}, then solving (12) gives q*(a,z)=[p1a.1p1a.0+(p0a.0p0a.1p1a.1+p1a.0)z]/(p0a.0p1a.1 p0a.1p1a.0). The probabilities pza.w can all be estimated from the study sample.

We emphasize that the solution to (12) is ultimately an approximation to the (nonidentifiable) treatment confounding bridge function in the target population. The accuracy of this approximation relies on the extent to which the rare disease assumption holds in the target population of interest. We study the potential bias due to departure from this key assumption in Section E of the supplementary materials. We further observe that, under the null hypothesis of no vaccine effectiveness, or if W has no direct effects on Y or S, then the function q*(A,Z,X) matches the treatment confounding bridge exactly, even if the disease is not rare, as stated in the corollary below.

Corollary 1.

Under the Assumptions of Theorem 1, Assumption 7, and the null hypothesis of no vaccine effect against infection, such that Y​​​A|U,X, then E[q(A,Z,X)|W,A=a,X,Y=0,S=1]=1/P(A=a|W,X,Y=0,S=1).

From Theorem 2, we immediately have the following corollary which provides a basis for estimation of q*(A,Z,X) from the observed TND sample.

Corollary 2.

Under the conditions of Theorem 2, for any function m(W,A,X), the solution q*(A,Z,X) to (12) also solves the population moment equation (13) E[m(W,A,X)q*(A,Z,X)m(W,1,X)m(W,0,A)|Y=0,S=1]=0.(13)

In practice, a parametric model q*(A,Z,X;τ) for the treatment confounding bridge function might be appropriate, where τ is an unknown finite dimensional parameter, Corollary 2 suggests one can then estimate τ by solving the estimating equation (14) 1ni=1n(1Yi)[m(W,A,X)q(A,Z,X;τ̂)m(W,1,X)m(W,0,X)]=0,(14) where m(W,A,X) is a user-specified function with dimension no smaller than τ’s.

Example 1”. If Z and W are both binary, rather than solving the system of equations implied by (12), one can instead specify a saturated model (15) q*(A,Z;τ)=τ0+τ1Z+τ2A+τ3ZA(15) and estimate τ=(τ0,τ1,τ2,τ3)T by solving (14) with m(W,A)=(1,W,A,WA)T. Extension to Z and X with multiple categories is relatively straightforward.

Example 2’. In case of continuous (U, X, Z), (9) suggests the model (16) q*(A,Z,X;τ)=1+exp[(1)A(τ0+τ1A+τ2Z+τ3X)].(16)

If a univariate NCO W is available, we may solve (14) with m(W,A,X) defined as a vector including (W, A, X), their high-order interactions and an intercept.

2.4 Estimation and Inference

In previous sections, we have defined the structural parameter of interest β0 as stratum-specific log risk ratio, introduced the treatment confounding bridge function as a key ingredient to identification of β0, and presented a strategy to estimate the treatment confounding bridge function leveraging an NCO. We summarize the steps of our estimation framework in Algorithm 1 and present the large-sample properties of the resulting estimator (β̂,τ̂) in Theorem 3.

Algorithm 1

Negative control method to estimate vaccine effectiveness from a test-negative design

1: Identify the variables in the data according to , in particular the NCEs and NCOs.

2: Estimate the treatment confounding bridge function by solving (14) with a suitable parametric model q*(A,Z,X;τ) and a user-specified function m(W,A,X). Write τ̂ as the resulting estimate of τ.

3: Estimate β0 by β̂=log(c(Xi)q*(a,Zi,Xi;τ̂)AiYic(Xi)q*(Ai,Zi,Xi;τ̂)(1Ai)Yi)where q̂(A,Z,X)=q*(A,Z,X;τ̂) and c(·) is a user-specified one-dimensional function.

Theorem 3 (Inference based on (β̂,τ̂)). Under Assumptions 1–7 and suitable regularity conditions provided in Section F, the estimator (β̂,τ̂) in Algorithm 1, or equivalently, the solution to the estimating equation 1ni=1nGi(β,τ)=0 is regular and asymptotically linear with the ith influence function IFi(β,τ)=Ω(β,τ)+Gi(β,τ), where Ω(β,τ)+ denotes the Moore-Penrose inverse of Ω(β,τ), Gi(β,τ)=((1)1Aic(Xi)q*(Ai,Zi,Xi;τ)Yiexp(βAi)(1Yi)[m(Wi,Ai,Xi)q*(Ai,Zi,Xi;τ)m(Wi,1,Xi)m(Wi,0,Xi)])

and Ω(β,τ)=(E[Gi(β,τ)/βT],E[Gi(β,τ)/τT]).

Large-sample standard errors and confidence intervals follow from standard estimating equation theory (see Van der Vaart Citation2000, Theorem 5.21).

The estimator β̂ and its standard error are constructed under Assumption 6. If Assumption 6 fails to hold, β̂ may be biased and confidence intervals may not be well-calibrated. However, by Corollary 1, under the null hypothesis of no vaccine effect, the estimated q*(A,Z,X) converges to the true treatment confounding bridge function and β̂ is consistent for β0=0. This implies that while our methods are approximately asymptotically unbiased for rare infections, they provide a valid test of no vaccine causal effect even if the infection is not rare.

2.5 Estimating VE Under Treatment-Induced Selection

Thus far, unbiasedness of the estimating function V0 has crucially relied on Assumption 1 that A does not have a direct effect on S. In some settings, this assumption may be violated if an infected person who is vaccinated is on average more likely to present to the ER than an unvaccinated infected person with similar symptoms, so that treatment or vaccination-induced selection into the TND sample is said to be present. In such settings, the estimator β̂ produced by Algorithm 1 may be severely biased. Crucially, we note that this form of selection bias can be present even in context of a randomized trial in which vaccination/treatment is known to the analyst, if the outcome is ascertained using a TND, for example in recent cluster-randomized test-negative design studies of community-level dengue intervention effectiveness (Anders et al. Citation2018; Jewell et al. Citation2019; Dufault and Jewell Citation2020; Wang et al. in press). In this section, we provide sufficient conditions for identification under treatment-induced selection. In this vein, consider the following assumptions:

Assumption 1’. P(S=1|A=a,Y=1,U,X)/P(S=1|A=a,Y=0,U,X)=exp(h(U,X)) for a=0,1.

That is, the risk ratio association between infection status and selection into the TND sample is independent of vaccination status. Equivalently, the conditional probability of selection can be factorized as P(S=1|A,Y,U,X)=h1(A,U,X)h2(Y,U,X). This assumption may be reasonable if, given (U, X), the treatment A and outcome Y affect sample selection through independent mechanisms such that A does not modify the effect of Y on sample selection S on the multiplicative scale. Assumptions similar to Assumption 1 have been used previously to illustrate settings where two independent causes of sample selection remain conditionally independent in the selected sample (see Hernán, Hernández-Díaz, and Robins Citation2004 Appendix A.3). We highlight that Assumption 1 (which requires A​​​S|U,X,Y) implies Assumption 1 and therefore is more stringent than the latter assumption. Furthermore,

Assumption 2’. (No effect modification by confounders on the OR scale). P(Y=1|A=1,U,X)/P(Y=0|A=1,U,X)P(Y=1|A=0,U,X)/P(Y=0|A=0,U,X)=exp(β0).

Recall that Assumption 2 posited a constant vaccination causal effect on the RR scale across levels of (U, X), while Assumption 2 instead posits that the corresponding causal effect on the odds ratio scale is constant. In case of a rare infection in the target population, the OR and RR are approximately equal, in which case VE is well approximated by 1OR.

Furthermore, identificatio0n relies on the following modified definition of a treatment confounding bridge function:

Assumption 4’. There exists a treatment confounding bridge function q˜ such that for a = 0, 1, (17) E[q˜(a,Z,X)|A=a,U,X]=1/P(A=a|U,X,Y=0,S=1)almost surely.(17)

We now obtain the identification of the OR in the following theorem:

Theorem 1 . Under Assumption 1 , 2, 3, and 4, we have that E[V˜1(A,Y,Z,X;β0)|U,X,S=1]=0where V˜1(A,Y,Z,X;β) is defined as V1 in Theorem 1 except with q˜ replacing q.

Importantly, the theorem establishes that the estimating function V1 previously developed in the paper can, under stated conditions, remain unbiased for the odds ratio association of vaccination with testing positive for the infection, even in the presence of treatment-induced selection into the TND sample.

Estimation of the treatment confounding bridge function q˜(A,Z,X) requires a negative control outcome that satisfies:

Assumption 5’. (NCO Independence Conditions) W​​​(A,Z,S)|U,X,Y.

In addition to Assumption 5, this last assumption requires that neither Y nor S is a causal effect of W. illustrates a DAG that satisfies our assumptions regarding (Z, W). As can be verified in the graph, Assumption 5 is needed to ensure that collider stratification bias induced by the path A[S=1]W upon conditioning on S = 1 is no longer present. Identification of the function q˜ is given below:

Theorem 2’. Under Assumptions 3, 4 and 5, for a = 0, 1 we have that E[q˜(a,Z,X)|A=a,W,X,Y=0,S=1]=1/P(A=a|W,X,Y=0,S=1).

As a result of Theorem 2 , the parameters in the treatment confounding bridge function can be estimated by solving EquationEquation (14).

In summary, the above discussion suggests that one can continue to use Algorithm 1 to estimate VE in presence of treatment induced selection bias, albeit on the OR scale and under a modified set of negative control conditions. Theorem 3 continues to apply.

As a side note, Assumption 1 automatically holds under Assumption 1, hence, the above results also apply to the setting in previous sections illustrated in . We state this result in the following corollary.

Corollary 3.

Under Assumptions 1, 2, 3 and 4, we have E[V˜1(A,Y,Z,X;β0)|U,X,S=1]=0.

With Assumption 1, the treatment confounding bridge function q˜ can be estimated by solving Equationequation (14) either under Assumptions 5 and 7, or Assumptions 5, 6, and 7 as an approximation under the rare disease assumption. Corollary 3 leads to an interesting observation: under treatment-independent sampling (Assumption 1), the estimator β̂ from Algorithm 1 can be viewed as either log RR or log OR, depending on the setting.

Finally, as Schnitzer (Citation2022) pointed out, under the additional assumptions that (a) subjects included in the TND analysis who do not test positive for the infection in view are nevertheless subject to another infection, referred to as a test-negative infection, (b) test-positive and test-negative infections are mutually exclusive, and (c) vaccination of interest has no causal effect on the test-negative infection, the conditional odds ratio exp(β0) also identifies the conditional causal risk ratio. We further discussed this result in Section G of the supplementary materials.

3 Simulation Study

To assess the empirical performance of our proposed method, we perform simulation studies mimicking a TND with binary vaccination, infection, and testing. We generate a target population of N=7,000,000 according to . We consider two scenarios where U, Z, and W are binary or continuous. In each scenario, we consider the true value of the log risk ratio β0=log(0.2),log(0.5),log(0.7) and 0. We set baseline log risk η0=log(0.01) so that the outcome is rare in the target population. The generated TND sample had sample size ranging between n=43,000 and n=52,000.

For each scenario, we evaluated the performance of bias and coverage rates of 95% confidence intervals of the proposed NC estimator for β0 over 500 Monte Carlo samples. As the estimated treatment confounding bridge function in Algorithm 1 is only an approximation under Assumption 6, whose bias may affect the estimation for β0, we also include the NC-Oracle estimator that uses the true treatment confounding bridge function. For comparison, we included two standard estimators for β0 in test-negative designs, logistic regression (Bond, Sullivan, and Cowling Citation2016) and the IPTW estimator (Schnitzer Citation2022), both of which do not account for bias due to the latent factor U.

Due to the page limit, we relegated the details of the data generating process and figures for the results to Section I of the supplementary materials. in Section I reports the bias of the four estimators we considered and coverage probability of their 95% confidence intervals. In both settings, both NC and NC-Oracle are essentially unbiased whereas logistic regression and IPTW give biased estimates in all scenarios. NC-Oracle exhibits slightly higher precision than NC, which implies that estimating the treatment confounding bridge function in the TND is only slightly more variable. Confidence intervals of NC and NC-Oracle both attain nominal coverage, whereas logistic regression and IPTW based confidence intervals undercover severely.

To investigate the performance of our method for a nonrare infection, we repeat the simulation under the same setup but increasing baseline log risk η0. In our simulation setting, exp(η0) equals δ in Assumption 6 that describes the prevalence of the infection outcome in the target population, which determines the bias of estimating the treatment confounding bridge function. The results are reported in Figures S.2 and S.3 in Section I of the supplementary materials. While the NC-Oracle estimator remains unbiased and maintained calibrated confidence intervals, the bias of the NC estimator increases with increasing δ and has anti-conservative 95% CIs. With δ=0.5 and β0=log(0.2), the bias of the NC estimator increases to–0.27 (corresponding to 5% overestimation on the VE scale). The coverage of 95% CI for the NC estimator drops to 0% when δ0.2. Notably, the NC estimator remains unbiased with calibrated confidence intervals under the null hypothesis β0=0, which suggests a Wald test of no vaccine effect still has the correct size. Both logistic regression and IPTW estimators are severely biased in all settings.

4 Application

We applied our proposed method to a TND study of COVID-19 VE of two-dose Moderna vaccine (mRNA-1273), two-dose Pfizer-BioNTech vaccine (BNT162b2), and single-dose Johnson & Johnson’s Janssen vaccine (Ad26.COV2.S) against SARS-Cov-2 infection nested in the University of Michigan Health System. The selected study sample included patients who interacted with the University of Michigan Health System and experienced COVID-19 symptoms, had suspected exposure to COVID-19 virus, or sought to screen for SARS-Cov-2 infection, between April 5, 2021 and December 7, 2021. In addition, selected test-positive subjects had at least one positive lab test for SARS-Cov-2 infection after April 5. Vaccination history was obtained through electronic health records. A study subject was considered fully vaccinated if they received at least one dose of Johnson & Johnson’s Janssen vaccine or at least two doses of Moderna or Pfizer-BioNTech vaccine. If a subject tested positive before or within 14 days after their first dose of Janssen vaccine or within 14 days after their second dose of Moderna or Pfizer-BioNTech vaccine, they were considered unvaccinated (Moline et al. Citation2021).

We selected immunization visits before December 2020 for NCE since COVID-19 vaccines were unavailable before December 2020 and prior immunization was unlikely to affect SARS-Cov-2 infection risk; nor that of the selected NCOs we describe next. For NCO, we selected a binary indicator of having at least one of the following “negative control outcome” conditions after April 5, 2021: arm/leg cellulitis, eye/ear disorder, gastro-esophageal disease, atopic dermatitis, injuries, and general adult examination visits. Such candidate NCE and NCO are likely to satisfy the requisite conditional independence conditions of valid negative control variables and to be related to a patient’s latent HSB. We adjusted for age group (<18, between 18 and 60, or 60), gender, race (white or non-white), Charlson comorbidity score 3, and the calendar month of a test-positive subject’s first positive COVID test or a test-negative subject’s last COVID test. Table S.2 in Section J of the supplementary materials summarizes the distribution of negative control variables, demographic variables and SARS-Cov-2 infection among vaccinated and unvaccinated subjects.

Because NCE is expected not to be associated with either the outcome or NCO in a fully adjusted analysis unless there is unmeasured confounding, we first fit regression models to detect presence of residual confounding bias. Conditioning on the baseline covariates, in both vaccinated and unvaccinated groups, NCE was significantly associated with SARS-Cov-2 infection (p < 0.001) and NCO (p < 0.001) in corresponding adjusted logistic regression models, suggesting the presence of hidden bias (See Section J Table S.3 and S.4 of the supplementary materials).

We implemented Algorithm 1 to estimate VE. We specified a linear model for the treatment confounding bridge function with an interaction term between COVID-19 vaccination and the NCE, and set the function m to include an intercept term, COVID-19 vaccination, the NCO, and baseline covariates, as well as all two-way interactions. For comparison, we implemented adjusted logistic regression model and the IPTW estimator of marginal risk ratio of Schnitzer (Citation2022), where propensity score is estimated with logistic regression.

shows the estimated VE and 95% confidence intervals for the three estimators. The VE estimates with our double NC method are notably higher than those of standard logistic regression and IPTW estimator for all three vaccines.

Table 1 Estimated VE and 95% confidence intervals of the negative control estimator, logistic regression and IPTW estimator with the University of Michigan Health System data.

Recent TND studies estimated the mRNA COVID-19 vaccine (Pfizer-BioNTech and Moderna) effectiveness ranging between 80% and 98% against lab-confirmed SARS-COV-2 infection of different variants (Dagan et al. Citation2021; Bruxvoort et al. Citation2021; Israel et al. Citation2021). Our NC approach provided VE estimates that are closer to these prior studies. We hypothesize that the standard logistic regression and IPTW estimator underestimate the VE by overlooking residual confounding bias due to HSB and related factors, which our proposed double NC approach appears to control to some extent.

5 Discussion

In this article, we have introduced a statistical approach leveraging negative control variables to account for hidden bias due to residual confounding and/or selection mechanism in a test-negative design, both of which have raised major concerns (Sullivan, Tchetgen Tchetgen, and Cowling Citation2016; Schnitzer Citation2022). Negative control variables abound in practice, such as vaccination history which is routinely collected in insurance claims and electronic health records. Hence, the proposed method may be particularly useful in such real-world settings to obtain improved estimates of vaccine effectiveness. Beyond TND, our method is also applicable to other study designs with outcome-dependent sampling, such as a case-control study for a rare disease where unmeasured confounding is of concern. With simple modifications, our approach can also be applied to settings with polytomous treatment and/or outcome, as discussed in Section K of the supplementary materials.

The TND is a challenging setting in causal inference where selection bias and unmeasured confounding co-exist, the selection is outcome-dependent, and unmeasured confounders also impact selection. Jackson et al. (Citation2018) performed simulation studies to evaluate the selection bias of VE estimate by logistic regression and concluded that selection bias due to healthcare-seeking behavior is unlikely to be meaningful in practice. However, their simulation study did not consider the effect of healthcare-seeking behavior on the infection outcome. Didelez, Kreiner, and Keiding (Citation2010) used graphical models to study conditions for the estimability of the odds ratio and testability of the hypothesis of null causal effect under outcome-dependent sampling, but they did not consider the setting where the unmeasured confounders also affect selection. A widely adopted approach to adjust for selection bias is to reweight each observation’s contribution by their corresponding inverse probability of selection into the sample (Hernán, Hernández-Díaz, and Robins Citation2004), but such weights are unlikely to be available in most TNDs without access to a random sample from the target population and accurate measurements of latent healthcare-seeking behavior. Bareinboim, Tian, and Pearl (Citation2014) formally showed that causal effects cannot be recovered from outcome-dependent sampling without an additional assumption. Bareinboim and Pearl (Citation2012) also showed that the causal odds ratio cannot be recovered under both confounding and selection bias. We have established that, however, progress can be made under a semiparametric multiplicative model, provided the outcome is rare in the target population, and that double negative control variables are available. To this end, this article showcases the potential power of negative control methods and proximal causal inference in epidemiologic research (Shi, Miao, and Tchetgen Tchetgen Citation2020; Tchetgen Tchetgen et al. Citation2020).

We primarily focused on outpatient TND studies, where recruitment is restricted to subjects who seek care voluntarily. TNDs have also been applied to inpatient settings for studying VE against, for example, flu hospitalization (Foppa et al. Citation2016; Feng, Cowling, and Sullivan Citation2016). In inpatient TNDs, differential access to healthcare and underlying health characteristics between vaccinated and unvaccinated subjects are likely the main source of confounding bias (Feng, Cowling, and Sullivan Citation2016). Our methods are still applicable in such settings, but negative control variables should be selected to be relevant to this source of unmeasured confounding. For example, previous vaccination and hospitalization outside of the flu season or hospitalization due to other flu-like illness are viable candidate NCE and NCO, respectively (Jackson et al. Citation2006).

Our approach is also suitable for post-market TND studies where real-world VE is of interest and vaccination history is obtained retrospectively, possibly through electronic health records. For vaccine efficacy in a controlled trial setting, Wang et al. (in press) recently developed estimation and inference of RR in cluster-randomized TND, aiming to correct for bias due to intervention-induced confounding by HSB induced due to unblinding. They proposed a log-RR estimator which corrects for selection bias by leveraging a valid test-negative outcome, under an assumption that either (i) the vaccine does not have a causal effect in the population, and the causal impact of vaccination on selection is equal for test-positive and -negative subsamples; or (ii) among care seekers, the incidence of test-negative outcomes does not differ between vaccinated and unvaccinated, and the intervention effect among care seekers is generalizable to the whole population. Even under randomization, identification conditions given in Section 2.5 are neither stronger nor weaker than those of Wang et al. (in press) described above, as neither set of assumptions appear to imply the other. An important advantage of our proposed methods is that they can be used to account for selection bias in a TND study irrespective of randomization.

Our methods target RR as a measure of VE instead of the more common OR (Jackson et al. Citation2006; Sullivan, Tchetgen Tchetgen, and Cowling Citation2016). These two measures are approximately equal for rare infections as described in Section L of the supplementary materials. Schnitzer (Citation2022) recently considered estimation of a marginal causal RR in the TND sample and justified the use of an inverse probability of treatment weighted (IPTW) estimator in a setting in which an unmeasured common cause of infection and selection into the TND sample does not cause vaccination (and thus there is no unmeasured confounding). Instead, our methods allow for an unmeasured common cause of vaccination, infection, and selection into the TND sample; however, in order to estimate a causal RR, we invoke both, an assumption of no effect modification by an unmeasured confounder, and a rare-disease condition. As we establish, the latter assumption can be relaxed to test the null hypothesis of no causal effect of the vaccine on infection risk. In Section 2.5, we establish that under a homogeneous OR condition, and an alternative definition of the treatment bridge function, our methods can identify a causal effect of the vaccine on the OR scale without invoking the rare disease condition.

Throughout the article, we have assumed diagnostic tests are accurate and individuals who seek care are sparsely distributed, such that the vaccination of a given subject in the TND sample does not protect another study subject from infection, that is, there is no interference in the TND sample, a common assumption in TND literature. This latter assumption may be violated if members of the same households are present in the ER, in which case block interference must be accounted for using results from interference literature (Hudgens and Halloran Citation2008; Tchetgen Tchetgen and VanderWeele Citation2012).

We have proposed a parametric approach to estimate treatment confounding bridge function. While we have provided examples where certain parametric models are appropriate, in general, such a parametric approach may result in model misspecification bias. Nonparametric methods that have been developed for proximal causal inference, such as kernel machine learning (Ghassami et al. Citation2022) may be adapted to our setting. Another potential nonparametric approach is sieve generalized method of moments that uses basis functions to approximate the nuisance function (Ai and Chen Citation2003; Chen Citation2007). We leave these topics for future research.

Supplementary Materials

The supplementary materials contain proofs and derivations of all technical results presented in the article and further discussions.

Supplemental material

2220935_A_supplemental.zip

Download Zip (4.2 MB)

Disclosure Statement

The authors reported no conflict of interest.

Additional information

Funding

The authors gratefully acknowledge NIH grants R01AI27271, R01CA222147, R01AG065276, and R01GM139926.

References

  • Ai, C., and Chen, X. (2003), “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions,” Econometrica, 71, 1795–1843. DOI: 10.1111/1468-0262.00470.
  • Anders, K. L., Indriani, C., Ahmad, R. A., Tantowijoyo, W., Arguni, E., Andari, B., Jewell, N. P., Rances, E., O’Neill, S. L., Simmons, C. P., et al. (2018), “The awed Trial (Applying Wolbachia to Eliminate Dengue) to Assess the Efficacy of Wolbachia-Infected Mosquito Deployments to Reduce Dengue Incidence in Yogyakarta, Indonesia: Study Protocol for a Cluster Randomised Controlled Trial,” Trials, 19, 1–16. DOI: 10.1186/s13063-018-2670-z.
  • Bareinboim, E., and Pearl, J. (2012), “Controlling Selection Bias in Causal Inference,” Artificial Intelligence and Statistics, 22, 100–108.
  • Bareinboim, E., Tian, J., and Pearl, J. (2014), “Recovering from Selection Bias in Causal and Statistical Inference,” in Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 28). DOI: 10.1609/aaai.v28i1.9074.
  • Black, C. L., Yue, X., Ball, S. W., Fink, R. V., de Perio, M. A., Laney, A. S., Williams, W. W., Graitcer, S. B., Fiebelkorn, A. P., Lu, P.-J., et al. (2018), “Influenza Vaccination Coverage Among Health Care Personnel—United States, 2017–18 Influenza Season,” Morbidity and Mortality Weekly Report, 67, 1050–1054. DOI: 10.15585/mmwr.mm6738a2.
  • Bond, H., Sullivan, S., and Cowling, B. (2016), “Regression Approaches in the Test-Negative Study Design for Assessment of Influenza Vaccine Effectiveness,” Epidemiology & Infection, 144, 1601–1611. DOI: 10.1017/S095026881500309X.
  • Boom, J. A., Tate, J. E., Sahni, L. C., Rench, M. A., Hull, J. J., Gentsch, J. R., Patel, M. M., Baker, C. J., and Parashar, U. D. (2010), “Effectiveness of Pentavalent Rotavirus Vaccine in a Large Urban Population in the United States,” Pediatrics, 125, e199–e207. DOI: 10.1542/peds.2009-1021.
  • Broome, C. V., Facklam, R. R., and Fraser, D. W. (1980), “Pneumococcal Disease After Pneumococcal Vaccination: An Alternative Method to Estimate the Efficacy of Pneumococcal Vaccine,” New England Journal of Medicine, 303, 549–552. DOI: 10.1056/NEJM198009043031003.
  • Bruxvoort, K. J., Sy, L. S., Qian, L., Ackerson, B. K., Luo, Y., Lee, G. S., Tian, Y., Florea, A., Aragones, M., Tubert, J. E., et al. (2021), “Effectiveness of mRNA-1273 against Delta, mu, and Other Emerging Variants of SARS-CoV-2: Test Negative Case-Control Study,” British Medical Journal, 375, e068848. DOI: 10.1136/bmj-2021-068848.
  • Chen, X. (2007), “Large Sample Sieve Estimation of Semi-Nonparametric Models,” Handbook of Econometrics, 6, 5549–5632.
  • Chua, H., Feng, S., Lewnard, J. A., Sullivan, S. G., Blyth, C. C., Lipsitch, M., and Cowling, B. J. (2020), “The Use of Test-Negative Controls to Monitor Vaccine Effectiveness: A Systematic Review of Methodology,” Epidemiology, 31, 43–64. DOI: 10.1097/EDE.0000000000001116.
  • Cole, S. R., and Frangakis, C. E. (2009), “The Consistency Statement in Causal Inference: A Definition or an Assumption?” Epidemiology, 20, 3–5. DOI: 10.1097/EDE.0b013e31818ef366.
  • Cui, Y., Pu, H., Shi, X., Miao, W., and Tchetgen Tchetgen, E. (in press), “Semiparametric Proximal Causal Inference,” Journal of the American Statistical Association, DOI: 10.1080/01621459.2023.2191817.
  • Dagan, N., Barda, N., Kepten, E., Miron, O., Perchik, S., Katz, M. A., Hernán, M. A., Lipsitch, M., Reis, B., and Balicer, R. D. (2021), “BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting,” New England Journal of Medicine, 384, 1412–1423. DOI: 10.1056/NEJMoa2101765.
  • Dean, N. E., Hogan, J. W., and Schnitzer, M. E. (2021), “Covid-19 Vaccine Effectiveness and the Test-Negative Design,” New England Journal of Medicine, 385, 1431–1433. DOI: 10.1056/NEJMe2113151.
  • Deaner, B. (2021), “Proxy Controls and Panel Data,” arXiv:1810.00283v7 [econ.EM].
  • Didelez, V., Kreiner, S., and Keiding, N. (2010), “Graphical Models for Inference Under Outcome-Dependent Sampling,” Statistical Science, 25, 368–387. DOI: 10.1214/10-STS340.
  • Dufault, S. M., and Jewell, N. P. (2020), “Analysis of Counts for Cluster Randomized Trials: Negative Controls and Test-Negative Designs,” Statistics in Medicine, 39, 1429–1439. DOI: 10.1002/sim.8488.
  • Dukes, O., Shpitser, I., and Tchetgen Tchetgen, E. (in press), “Proximal Mediation Analysis,” Biometrika.
  • D’Haultfoeuille, X. (2011), “On the Completeness Condition in Nonparametric Instrumental Problems,” Econometric Theory, 27, 460–471. DOI: 10.1017/S0266466610000368.
  • Egami, N., and Tchetgen Tchetgen, E. (in press), “Identification and Estimation of Causal Peer Effects Using Double Negative Controls for Unmeasured Network Confounding,” Journal of the Royal Statistical Society, Series B.
  • Feng, S., Cowling, B. J., and Sullivan, S. G. (2016), “Influenza Vaccine Effectiveness by Test-Negative Design–Comparison of Inpatient and Outpatient Settings,” Vaccine, 34, 1672–1679. DOI: 10.1016/j.vaccine.2016.02.039.
  • Foppa, I. M., Ferdinands, J. M., Chaves, S. S., Haber, M. J., Reynolds, S. B., Flannery, B., and Fry, A. M. (2016), “The Case Test-Negative Design for Studies of the Effectiveness of Influenza Vaccine in Inpatient Settings,” International Journal of Epidemiology, 45, 2052–2059. DOI: 10.1093/ije/dyw022.
  • Gabriel, E. E., Sachs, M. C., and Sjölander, A. (2022), “Causal Bounds for Outcome-Dependent Sampling in Observational Studies,” Journal of the American Statistical Association, 117, 939–950. DOI: 10.1080/01621459.2020.1832502.
  • Ghassami, A., Shpitser, I., and Tchetgen Tchetgen, E. (2022), “Combining Experimental and Observational Data for Identification of Long-Term Causal Effects,” arXiv:2201.10743v3 [stat.ME] .
  • Ghassami, A., Shpitser, I., and Tchetgen Tchetgen, E. (2023), “Causal Inference with Hidden Mediators,” arXiv:2111.02927v2 [stat.ST] .
  • Ghassami, A., Ying, A., Shpitser, I., and Tchetgen Tchetgen, E. (2022), “Minimax Kernel Machine Learning for A Class of Doubly Robust Functionals with Application to Proximal Causal Inference,” in International Conference on Artificial Intelligence and Statistics, pp. 7210–7239.
  • Hernán, M. A., Hernández-Díaz, S., and Robins, J. M. (2004), “A Structural Approach to Selection Bias,” Epidemiology, 15, 615–625. DOI: 10.1097/01.ede.0000135174.63482.43.
  • Hernán, M. A., and Robins, J. M. (2020), Causal Inference: What If, Boca Raton: Chapman & Hall/CRC.
  • Hitchings, M. D., Ranzani, O. T., Dorion, M., D’Agostini, T. L., de Paula, R. C., de Paula, O. F. P., de Moura Villela, E. F., Torres, M. S. S., de Oliveira, S. B., Schulz, W., et al. (2021), “Effectiveness of ChAdOx1 Vaccine in Older Adults during SARS-CoV-2 Gamma Variant Circulation in São Paulo,” Nature Communications, 12, 6220. DOI: 10.1038/s41467-021-26459-6.
  • Hudgens, M. G., and Halloran, M. E. (2006), “Causal Vaccine Effects on Binary Postinfection Outcomes,” Journal of the American Statistical Association, 101, 51–64. DOI: 10.1198/016214505000000970.
  • Hudgens, M. G., and Halloran, M. E. (2008), “Toward Causal Inference with Interference,” Journal of the American Statistical Association, 103, 832–842. DOI: 10.1198/016214508000000292.
  • Imbens, G., Kallus, N., and Mao, X. (2021), “Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models,” arXiv:2108.03849v1 [stat.ME] .
  • Israel, A., Merzon, E., Schäffer, A. A., Shenhar, Y., Green, I., Golan-Cohen, A., Ruppin, E., Magen, E., and Vinker, S. (2021), “Elapsed Time Since BNT162b2 Vaccine and Risk of SARS-CoV-2 Infection: Test Negative Design Study,” BMJ, 375, e067873. DOI: 10.1136/bmj-2021-067873.
  • Jackson, L. A., Jackson, M. L., Nelson, J. C., Neuzil, K. M., and Weiss, N. S. (2006), “Evidence of Bias in Estimates of Influenza Vaccine Effectiveness in Seniors,” International Journal of Epidemiology, 35, 337–344. DOI: 10.1093/ije/dyi274.
  • Jackson, M. L., Chung, J. R., Jackson, L. A., Phillips, C. H., Benoit, J., Monto, A. S., Martin, E. T., Belongia, E. A., McLean, H. Q., Gaglani, M., et al. (2017), “Influenza Vaccine Effectiveness in the United States during the 2015–2016 Season,” New England Journal of Medicine, 377, 534–543. DOI: 10.1056/NEJMoa1700153.
  • Jackson, M. L., and Nelson, J. C. (2013), “The Test-Negative Design for Estimating Influenza Vaccine Effectiveness,” Vaccine, 31, 2165–2168. DOI: 10.1016/j.vaccine.2013.02.053.
  • Jackson, M. L., Phillips, C. H., Benoit, J., Kiniry, E., Madziwa, L., Nelson, J. C., and Jackson, L. A. (2018), “The Impact of Selection Bias on Vaccine Effectiveness Estimates from Test-Negative Studies,” Vaccine, 36, 751–757. [10] DOI: 10.1016/j.vaccine.2017.12.022.
  • Jewell, N. P., Dufault, S., Cutcher, Z., Simmons, C. P., and Anders, K. L. (2019), “Analysis of Cluster-Randomized Test-Negative Designs: Cluster-Level Methods,” Biostatistics, 20, 332–346. DOI: 10.1093/biostatistics/kxy005.
  • Kallus, N., Mao, X., and Uehara, M. (2022), “Causal Inference under Unmeasured Confounding with Negative Controls: A Minimax Learning Approach,” arXiv:2103.14029v4 [stat.ML] .
  • Krammer, F. (2019), “The Human Antibody Response to Influenza a Virus Infection and Vaccination,” Nature Reviews Immunology, 19, 383–397. DOI: 10.1038/s41577-019-0143-6.
  • Leung, J., Harpaz, R., Molinari, N.-A., Jumaan, A., and Zhou, F. (2011), “Herpes Zoster Incidence among Insured Persons in the United States, 1993–2006: Evaluation of Impact of Varicella Vaccination,” Clinical Infectious Diseases, 52, 332–340. DOI: 10.1093/cid/ciq077.
  • Lipsitch, M., Jha, A., and Simonsen, L. (2016), “Observational Studies and the Difficult Quest for Causality: Lessons from Vaccine Effectiveness and Impact Studies,” International Journal of Epidemiology, 45, 2060–2074. DOI: 10.1093/ije/dyw124.
  • Lipsitch, M., Tchetgen Tchetgen, E., and Cohen, T. (2010), “Negative Controls: A Tool for Detecting Confounding and Bias in Observational Studies,” Epidemiology, 21, 383–388. DOI: 10.1097/EDE.0b013e3181d61eeb.
  • Miao, W., Geng, Z., and Tchetgen Tchetgen, E. (2018), “Identifying Causal Effects with Proxy Variables of An Unmeasured Confounder,” Biometrika, 105, 987–993. DOI: 10.1093/biomet/asy038.
  • Miao, W., Shi, X., and Tchetgen Tchetgen, E. (2020), “A Confounding Bridge Approach for Double Negative Control Inference on Causal Effects,” arXiv:1808.04945v3 [stat.ME] .
  • Moline, H. L., Whitaker, M., Deng, L., Rhodes, J. C., Milucky, J., Pham, H., Patel, K., Anglin, O., Reingold, A., Chai, S. J., et al. (2021), “Effectiveness of COVID-19 Vaccines in Preventing Hospitalization Among Adults Aged ≥ 65 Years—COVID-NET, 13 states, February–April 2021,” Morbidity and Mortality Weekly Report, 70, 1088–1093.
  • Newey, W. K., and Powell, J. L. (2003), “Instrumental Variable Estimation of Nonparametric Models,” Econometrica, 71, 1565–1578. DOI: 10.1111/1468-0262.00459.
  • Olson, S. M., Newhams, M. M., Halasa, N. B., Price, A. M., Boom, J. A., Sahni, L. C., Pannaraj, P. S., Irby, K., Walker, T. C., Schwartz, S. P., et al. (2022), “Effectiveness of BNT162b2 Vaccine against Critical Covid-19 in Adolescents,” New England Journal of Medicine, 386, 713–723. DOI: 10.1056/NEJMoa2117995.
  • Patel, M. M., Jackson, M. L., and Ferdinands, J. (2020), “Postlicensure Evaluation of COVID-19 Vaccines,” Journal of the American Medical Association, 324, 1939–1940. DOI: 10.1001/jama.2020.19328.
  • Qi, Z., Miao, R., and Zhang, X. (2022), “Proximal Learning for Individualized Treatment Regimes Under Unmeasured Confounding,” Journal of the American Statistical Association (just-accepted), 1–33. DOI: 10.1080/01621459.2022.2147841.
  • Schnitzer, M. E. (2022), “Estimands and Estimation of Covid-19 Vaccine Effectiveness Under the Test-Negative Design: Connections to Causal Inference,” Epidemiology, 33, 325–333. DOI: 10.1097/EDE.0000000000001470.
  • Shi, X., Miao, W., Hu, M., and Tchetgen Tchetgen, E. (2023), “Theory for Identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework,” arXiv:2108.13935v4 [stat.ME] .
  • Shi, X., Miao, W., and Tchetgen Tchetgen, E. (2020), “A Selective Review of Negative Control Methods in Epidemiology,” Current Epidemiology Reports, 109–202. DOI: 10.1007/s40471-020-00243-4.
  • Sullivan, S. G., Feng, S., and Cowling, B. J. (2014), “Potential of the Test-Negative Design for Measuring Influenza Vaccine Effectiveness: A Systematic Review,” Expert Review of Vaccines, 13, 1571–1591. DOI: 10.1586/14760584.2014.966695.
  • Sullivan, S. G., Tchetgen Tchetgen, E., and Cowling, B. J. (2016), “Theoretical Basis of the Test-Negative Study Design for Assessment of Influenza Vaccine Effectiveness,” American Journal of Epidemiology, 184, 345–353. DOI: 10.1093/aje/kww064.
  • Tchetgen Tchetgen, E., and VanderWeele, T. J. (2012), “On Causal Inference in the Presence of Interference,” Statistical Methods in Medical Research, 21, 55–75. DOI: 10.1177/0962280210386779.
  • Tchetgen Tchetgen, E., Ying, A., Cui, Y., Shi, X., and Miao, W. (2020), “An Introduction to Proximal Causal Learning,” arXiv:2009.10982v1 [stat.ME].
  • Thompson, M. G., Stenehjem, E., Grannis, S., Ball, S. W., Naleway, A. L., Ong, T. C., DeSilva, M. B., Natarajan, K., Bozio, C. H., Lewis, N., et al. (2021), “Effectiveness of COVID-19 Vaccines in Ambulatory and Inpatient Care Settings,” New England Journal of Medicine, 385, 1355–1371. DOI: 10.1056/NEJMoa2110362.
  • Van der Vaart, A. (2000), Asymptotic Statistics (Vol. 3), Cambridge: Cambridge University Press.
  • Wang, B., Dufault, S. M., Small, D. S., and Jewell, N. P. (in press), “Randomization Inference for Cluster-Randomized Test-Negative Designs with Application to Dengue Studies: Unbiased Estimation, Partial Compliance, and Stepped-Wedge Design,” Annals of Applied Statistics.
  • Ying, A., Miao, W., Shi, X., and Tchetgen Tchetgen, E. (in press), “Proximal Causal Inference for Complex Longitudinal Studies,” Journal of the Royal Statistical Society, Series B.