Full article: Unveiling the Unobservable: Causal Inference on Multiple Derived Outcomes

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

In many applications, the interest is in treatment effects on random quantities of subjects, where those random quantities are not directly observable but can be estimated based on data from each subject. In this article, we propose a general framework for conducting causal inference in a hierarchical data generation setting. The identifiability of causal parameters of interest is shown under a condition on the biasedness of subject level estimates and an ignorability condition on the treatment assignment. Estimation of the treatment effects is constructed by inverse propensity score weighting on the estimated subject level parameters. A multiple testing procedure able to control the false discovery proportion is proposed to identify the nonzero treatment effects. Theoretical results are developed to investigate the proposed procedure, and numerical simulations are carried out to evaluate its empirical performance. A case study of medication effects on brain functional connectivity of patients with Autism spectrum disorder (ASD) using fMRI data is conducted to demonstrate the utility of the proposed method. Supplementary materials for this article are available online.

Keywords:

1 Introduction

Studying treatment effects using observational data is one of the key questions in scientific, economic and social research. There is extensive research on causal inference under the ignorability assumption (Rosenbaum and Rubin Citation1983). However, the existing frameworks of causal inference are mainly built on the treatment effects of observable random variables. With the advance of automatic and high-throughput data collection techniques, collecting large-scale repeated measurements from each subject becomes more and more common in many studies, and the treatment effects of interest are typically on a large set of subject level parameters that take into account the heterogeneity among subjects. Compared to the classical setting which assumes variables of interest are observable and data from different subjects are iid, those subject level parameters are not directly observable, and the subject heterogeneity needs to be learned based on repeated measurements from each subject. The existing causal inference methods may not be directly applicable to the treatment effects on unobservable subject level parameters.

Our motivation example is to study treatment effects on brain connectivity using functional magnetic resonance imaging (fMRI) data. Here, brain functional connectivities are the subject level parameters of interest, which need to be estimated based on time course fMRI data from each subject (Qiu and Zhou Citation2022). To measure brain functional connectivity, Pearson’s correlation between distinct units within a brain nervous system (Andrews-Hanna et al. Citation2007; Palaniyappan et al. Citation2013) is widely used due to its simplicity and interpretability, which shows linear similarity/dissimilarity between brain signals arising from two regions. Other dependence measures have also been applied to study brain connectivity—such as precision coefficient or partial correlation (Marrelec et al. Citation2006), Spearman’s ρ and Kendall’s τ (Berto et al. Citation2022). Our interest is the average difference of brain connectivity under treatment and control. This is an important question in medical research as neurodevelopmental and neurodegenerative diseases, like Autism spectrum disorder and Alzheimer’s disease, may alter the connectivities among brain regions (Andrews-Hanna et al. Citation2007; Kana et al. Citation2014; Berto et al. Citation2022). Various treatments have been developed to maintain the brain functionality of patients, and their effects need to be evaluated using clinical trial data. Other examples include genotype effects on change rates of gene expression levels over plant growth and drug effects on variability of heart rate measured from wearable devices. To conduct statistical inference for treatment effects in this type of problems, it requires a new framework of causal inference on derived random variables under hierarchical models, and a multiple testing procedure to recover the nonzero treatment effects on the subject level parameters of interest.

Inverse probability weighting (IPW) estimators are commonly used to estimate average treatment effects (ATE) under the ignorability condition (Rosenbaum and Rubin Citation1983). This condition assumes the treatment selection is independent of the potential outcomes given the covariates, which means that there are no unobserved confounding variables that affect both the potential outcomes and the treatment selection. The effectiveness of IPW estimators was studied in Hirano, Imbens, and Ridder (Citation2003). See the literature review and the practice of IPW estimation methods in Austin and Stuart (Citation2015) and Imbens and Rubin (Citation2015).

If an outcome regression model is also assumed, the augmented inverse probability weighted estimator (Bang and Robins Citation2005; Funk et al. Citation2011) can be used to estimate ATEs, which is doubly robust and is the efficient estimator for ATE if both the outcome and propensity score models are correctly specified (Cao, Tsiatis, and Davidian Citation2009). However, imposing valid outcome models could be challenging in our problem, due to the high dimensionality of subject level parameters and possible restricted parameter spaces. The restriction on the models of subject level parameters could further put constraints on the models of observed responses for each subject. Taking modeling correlations for fMRI data as an example, although an inverse Wishart distribution can be made on individual covariance matrices if the covariances are independent of covariates (Marrelec et al. Citation2006), it is unclear how to model the relationship between covariates and correlations that satisfies the positive definiteness constraint of the correlation matrix under any value of the covariates. Bayesian is another approach for causal inference under a joint potential outcome model and a propensity score model (Rubin Citation1978). Indeed, Marrelec et al. (Citation2006) used Bayesian inference to model partial correlations among brain regions. Hierarchical models may also be employed for the estimation of treatment effect (Feller and Gelman Citation2015) and high-dimensional covariates can be added as well (Brown, Vannucci, and Fearn Citation1998). However, due to the nontrivial outcome models under restricted parameter spaces and the possible computational burden of Bayesian approaches, we choose the IPW estimation to study treatment effects on brain connectivity in this work.

Compared to the existing causal inference problems under the ignorability condition, our problem faces three new challenges. First, the subject level parameters of interest are not observable. The traditional identifiability conditions can not be directly applied to treatment effects on subject level parameters. New conditions for identifiability under a hierarchical data generation setting need to be developed. Second, the number of subject level parameters is usually large. A multiple testing procedure needs to be constructed to identify the nonzero treatment effects from a large set of candidates. Third, various types of dependence may exist in the data, which include the dependence among subject level parameters from a population distribution in a hierarchical model, the variable dependence among high-dimensional responses, and the temporal dependence of repeated measurements within each subject. Moreover, all the estimated treatment effects are dependent as they share common propensity score estimates. A valid testing procedure needs to control the dependence among the estimates of treatment effects.

Although multiple hypotheses testing has been well studied in the literature, how to control the false discovery proportion (FDP) under arbitrary dependence among test statistics is still an open question. Genovese and Wasserman (Citation2004) studied the stochastic properties of FDP. van der Laan, Dudoit, and Pollard (Citation2004a) constructed a step-down process on maximum statistics to control the family-wise error rate (FWER). van der Laan, Dudoit, and Pollard (Citation2004b) and Genovese and Wasserman (Citation2006) proposed an augmentation step to control the FDP exceedance rate. It is also worthwhile to mention the multiple testing procedures for sparse correlation and partial correlation matrices of one subject based on iid data (Cai and Liu Citation2016; Qiu and Zhou Citation2020). However, those works typically assume independence or sparse dependence among the test statistics for different hypotheses. Some works also assume exact Uniform(0, 1) distribution for the p-values under the null hypotheses. Our problem does not satisfy those assumptions due to the massive dependence among the data as mentioned above. Although van der Laan, Dudoit, and Pollard (Citation2004a) considered controlling the FWER for dependent statistics by bootstrapping or by estimating the covariances of their statistics, this procedure is unlikely to succeed for high-dimensional parameters since the sample covariance matrix is no longer consistent in a high-dimensional setting, and no theoretical justification is provided for their bootstrap procedure.

In this article, we propose a general framework for studying treatment effects on derived random variables from each subject. Our framework is built on a hierarchical setting that assumes the observed data from each subject follow a subject-specific distribution, and the parameters of the subject level distributions are drawn from a population distribution. This setting allows heterogeneity among subjects so that the subject level distributions could be different. The interest is in the causal effect of treatment on the subject level parameters. As those subject level parameters are not observable, the ignorability condition of treatment assignment is not directly made on those parameters. Instead, the identifiability of the causal effect is derived under an ignorability condition on the potential responses of each subject and a condition regulating the bias of the estimates for the subject level parameters of interest. The estimation of the treatment effect is constructed by inverse propensity score weighting on each of the estimated subject level parameters. Simultaneous confidence intervals for all treatment effects are constructed based on Gaussian approximation (GA) for the maximum of the estimated effects. Our GA result is valid under general dependence among the IPW estimates, which extends the original GA for sample means (Chernozhukov, Chetverikov, and Kato Citation2013) to IPW estimates with derived outcomes. A multiple testing procedure built on multiplier bootstrap is proposed to identify the nonzero treatment effects among a large set of parameters. Theoretical results are established which show that the proposed test is able to control the exceedance rate of the FDP. Examples of treatment effects on subject level correlations and coefficients from nodewise regressions are provided. Numerical simulations are carried out to evaluate the empirical performance of the proposed multiple testing procedure. A case study of medication effects on brain functional connectivity of patients with Autism spectrum disorder (ASD) using fMRI data is conducted to demonstrate the utility of the proposed method.

This article is organized as follows. Section 2 introduces the hierarchical setting of data generation and the causal parameters of interest in our problem. Section 3 presents the conditions and shows the identifiability of the causal parameters on derived random variables. Section 4 proposes the IPW estimator for the treatment effects and constructs the simultaneous confidence intervals. Section 5 proposes the multiple testing procedure and derives its theoretical results. Section 6 and 7 present the simulation results and a case study of medication effects on brain connectivity for ASD patients. Section 8 concludes the article. The technical proofs are provided in the supplementary material (SM).

2 Preliminary and Parameter of Interest

Suppose that we have a random sample of size n from a target population. For the ith subject, let $D_{i} \in {0, 1}$ be the treatment indicator, where D_i = 1 if this subject receives the treatment and 0 if not. Let $W_{i} = {(W_{i, 1}, \dots, W_{i, q})}^{T}$ be the q-dimensional time-independent covariates associated with the ith subject. Let denote the potential response of the jth variable at time t from the ith subject for d = 1 and 0, where $i = 1, \dots, n, j = 1, \dots, p$ and $t = 1, \dots, m$ . Let $X_{i, t} (d) = {(X_{i, t, 1} (d), \dots, X_{i, t, p} (d))}^{T}$ be the p-dimensional time series of potential responses, $X_{i}^{m} (d) = {X_{i, t} (d)}_{t = 1}^{m}$ be the collection of $X_{i, t} (d)$ over time, and $X_{i, t} = D_{i} X_{i, t} (1) + (1 - D_{i}) X_{i, t} (0)$ be the observed response vector. Given the covariates W_i, let $F_{d, i} (\cdot) = F_{d, i} (\cdot, T_{i} (d) | W_{i})$ be the subject level distribution function of $X_{i, t} (d)$ , where $T_{i} (d)$ denotes the set of all subject level parameters that determine the distribution $F_{d, i} (\cdot)$ for the ith subject and d = 0, 1. We assume $X_{i, t} (d) | T_{i} (d), W_{i} \sim F_{d, i} (\cdot) forall t, and T_{i} (d) | W_{i} \sim T_{d} (W_{i}),$ where $T_{d} (W_{i})$ is the super population of $T_{i} (d)$ . Under this model, ${X_{i, t} (d)}_{t = 1}^{m}$ are stationary with the same subject level distribution $F_{d, i} (\cdot)$ . We do not assume specified parametric forms for $F_{d, i} (\cdot)$ or $T_{d} (W_{i})$ .

Let $Y_{i} (d) = {(Y_{i, 1} (d), \dots, Y_{i, p_{0}} (d))}^{T}$ be the potential outcomes of p₀-dimensional subject level parameters of interest, where $Y_{i} (1)$ and $Y_{i} (0)$ denote the target parameters under treatment and control, respectively. Note that $Y_{i} (d)$ is a function or a part of $T_{i} (d)$ for d = 0, 1, and $T_{i} (d)$ also includes all other nuisance parameters for each subject. As only one level of D_i can be observed for each subject, in our context, we use potential responses and potential outcomes to denote the subject level data $X_{i, t} (d)$ and parameters of interest $Y_{i} (d)$ under D_i = d, respectively. Let $Y_{i} = D_{i} Y_{i} (1) + (1 - D_{i}) Y_{i} (0)$ . Different from the classical causal inference literature (Rubin Citation1974), $Y_{i}$ is not observable, but can be estimated based on the observed measurements ${X_{i, t}}_{t = 1}^{m}$ of each subject. We are interested in the population average treatment effect (ATE) on the subject level parameters, (2.1) $τ = E {Y_{i} (1) - Y_{i} (0)},$ (2.1) where $τ = {(τ_{1}, \dots, τ_{p_{0}})}^{T}$ . Here, the numbers of responses p and parameters of interest p₀ could be much larger than n and m, but the number of covariates q is a fixed constant.

For example, in a resting-state fMRI study, D_i could indicate the treatment group of the ith subject. The covariates $W_{i}$ may include age, gender, medical records, preexisting conditions and genetic information, and the observed responses ${X_{i, t}}$ are the measurements extracted from the time course fMRI data, representing the blood flow intensities in p brain regions of interest at time t. The potential outcomes $Y_{i} (d)$ could be the subject level correlations or partial correlations among the brain regions. For resting-state fMRI data, it is reasonable to assume $X_{i, t}$ being stationary so that $Y_{i} (d)$ is fixed over time. We are interested in studying the treatment effects on brain connectivity, reflected by the average differences of the potential outcomes between treatment and control.

Note that the subject level parameters of interest $Y_{i} (d)$ and the distributions of the potential responses ${X_{i, t} (d)}_{t = 1}^{m}$ depend on the covariates W_i and vary across subjects. Our setting allows subject heterogeneity, where the data are generated from a hierarchical system. In the first hierarchy, each subject has a specific distribution $F_{d, i} (\cdot)$ depending on the covariates and treatment; in the second hierarchy, repeated measurements from the subject-specific distribution are taken over time. This causal inference framework under the hierarchical model can be applied to general applications with multi-layer structured data, for example, assessing drug effects using data from wearable devices in healthcare, testing genotype effects of crops on gene expression patterns changing over time, and studying treatment effects using repeated measurements or longitudinal data of subjects. In the following, we provide two brain imaging applications under our framework in detail. In both examples, we consider resting-state fMRI data.

Example 1

(Treatment Effects on Correlations). We are interested in the effects of certain treatments on brain connectivity measured by Pearson’s correlations among brain regions. In this case, the causal parameters of interest are the subject level correlations, where $Y_{i} (d) = {Y_{i, j_{1} j_{2}} (d) : 1 \leq j_{1} < j_{2} \leq p}$ and (2.2) $Y_{i, j_{1} j_{2}} (d) = Corr {X_{i, t, j_{1}} (d), X_{i, t, j_{2}} (d) | W_{i}, T_{i} (d)}$ (2.2) for d = 0, 1. Here, the correlation is taken with respect to the subject level distribution $F_{d, i} (\cdot)$ . The ATEs on correlations are (2.3) $τ = {(τ_{12}, τ_{13}, \dots, τ_{p - 1 p})}^{T} where τ_{j_{1} j_{2}} = E {Y_{i, j_{1} j_{2}} (1) - Y_{i, j_{1} j_{2}} (0)}$ (2.3) for $1 \leq j_{1} < j_{2} \leq p$ . The parameter $τ_{j_{1} j_{2}}$ represents the average difference of the correlation between the j₁th and j₂th brain regions for taking the treatment. If $τ_{j_{1} j_{2}} \neq 0$ , the treatment has a nonzero average effect on the marginal correlation between the two brain regions over the whole population. Despite the sample correlations for each pair of brain regions can be readily obtained for each subject, the simultaneous inference procedure for the estimated treatment effects on subject level correlations is nontrivial due to the dependence among the large set of estimates.

Example 2

(Treatment Effects on Regression Coefficients). While Pearson’s correlation measures marginal dependence among variables, in many applications, we are also interested in the conditional association between two variables given all other variables. This motivates us to consider the nodewise regression (Meinshausen and Bühlmann Citation2006) (2.4) $X_{i, t, j_{1}} (d) = Y_{i, j_{1}}^{*} (d) + \sum_{j_{2} \neq j_{1}} X_{i, t, j_{2}} (d) Y_{i, j_{1} j_{2}}^{*} (d) + ϵ_{i, t, j_{1}} (d),$ (2.4) where $ϵ_{i, t, j_{1}} (d)$ and $X_{i, t, j_{2}} (d)$ are uncorrelated for all $j_{2} \neq j_{1}$ . Let $Σ_{i} (d) = (σ_{i, j_{1} j_{2}} (d)) = cov {X_{i, t, j_{1}} (d), X_{i, t, j_{2}} (d) | W_{i}, T_{i} (d)}$ and $Ω_{i} (d) = (ω_{i, j_{1} j_{2}} (d)) = Σ_{i}^{- 1} (d)$ be the subject level covariance and precision matrices of $X_{i, t} (d)$ with respect to the distribution $F_{d, i} (\cdot)$ . Lemma 1 in Peng et al. (Citation2009) shows that both the regression coefficients and error covariances of the nodewise regression are related to the precision matrix $Ω_{i} (d)$ where $\begin{matrix} Y_{i, j_{1} j_{2}}^{*} (d) = - \frac{ω_{i, j_{1} j_{2}} (d)}{ω_{i, j_{1} j_{1}} (d)} and cov {ϵ_{i, t, j_{1}} (d), ϵ_{i, t, j_{2}} (d) | W_{i}, T_{i} (d)} \\ = \frac{ω_{i, j_{1} j_{2}} (d)}{ω_{i, j_{1} j_{1}} (d) ω_{i, j_{2} j_{2}} (d)} . \end{matrix}$

Moreover, the partial correlation between $X_{i, t, j_{1}} (d)$ and $X_{i, t, j_{2}} (d)$ is equal to scaled coefficient $Y_{i, j_{1} j_{2}}^{*} (d) {ω_{i, j_{1} j_{1}} (d) / ω_{i, j_{2} j_{2}} (d)}^{1 / 2}$ . Based on this relationship, precision coefficient and partial correlation can be estimated from the nodewise regression (Qiu and Zhou Citation2020).

In brain connectivity analysis, $Y_{i, j_{1} j_{2}}^{*} (d)$ represents the conditional effect of the j₂th brain region on the j₁th region under the group d, after controlling the effects of other brain regions. We are interested in the ATE $τ_{j_{1} j_{2}} = E {Y_{i, j_{1} j_{2}}^{*} (1) - Y_{i, j_{1} j_{2}}^{*} (0)}$ on the regression coefficients $Y_{i, j_{1} j_{2}}^{*} (d)$ for $j_{2} \neq j_{1}$ , which shows whether certain treatment could improve the conditional association between the j₁th and j₂th brain regions.

To identify nonzero ATEs in Examples 1 and 2, we consider the multiple hypotheses (2.5) $H_{0, j_{1} j_{2}} : τ_{j_{1} j_{2}} = 0 versus H_{a, j_{1} j_{2}} : τ_{j_{1} j_{2}} \neq 0$ (2.5) for $1 \leq j_{1} < j_{2} \leq p$ . In the following sections, we only use the subscript j in τ_j and $Y_{i, j}$ to denote the jth parameter of interest for $j = 1, \dots, p_{0}$ when there is no confusion, despite the double subscript $j_{1} j_{2}$ is used for correlation and regression coefficient between the j₁th and j₂th variables in Examples 1 and 2, where $p_{0} = p (p - 1) / 2$ .

3 Conditions and Identifiability

As the outcomes of interest ${Y_{i}}$ are not directly observable, we make the following conditions for the identifiability of the treatment effect $τ$ .

Condition 1. (i) The collection of covariates, treatment assignment and subject level parameters ${W_{i}, D_{i}, T_{i} (0), T_{i} (1)}$ are independent among subjects and identically distributed (iid) from a super population for all $i = 1, \dots, n$ .

(ii) The variable D_i takes a value of either 1 (treatment) or 0 (control) for all i with potential responses denoted by $X_{i}^{m} (1)$ and $X_{i}^{m} (0)$ , and for each subject, there are no different forms or versions of each treatment level, which lead to different potential responses.

(iii) The potential responses ${X_{i_{1}}^{m} (0)$ , $X_{i_{1}}^{m} (1)}$ and ${X_{i_{2}}^{m} (0)$ , $X_{i_{2}}^{m} (1)}$ from different subjects are independent for $i_{1} \neq i_{2}$ and all $m \geq 1$ .

The first two parts of Condition 1 are the stable unit treatment value assumption, or SUTVA (Imbens and Rubin Citation2015) for the problem. As the subject level parameters of interest $Y_{i} (d)$ is a function of $T_{i} (d)$ , the independence assumption in Part (i) implies no interference, which means that the potential outcomes for any subject do not vary with the treatments assigned to other subjects. Part (ii) assumes no hidden variations of treatments. Part (iii) assumes the potential response variables ${X_{i, t} (d)}_{t = 1}^{m}$ are independent between subjects, but allows temporal dependence among ${X_{i, t} (d)}$ over t within a subject.

To identify the causal parameter τ of interest, we make the strict overlap assumption on the propensity score $P (D_{i} = 1 | W_{i})$ and the unconfoundedness assumption (Rosenbaum and Rubin Citation1983; Hirano, Imbens, and Ridder Citation2003) on the independence of the treatment indicator D_i and the potential responses $X_{i}^{m} (d)$ given the covariates $W_{i}$ .

Condition 2. (Strict Overlap): There exists a small positive constant $c_{ϵ} \in (0, 1 / 2)$ such that $c_{ϵ} \leq P (D_{i} = 1 | W_{i}) \leq 1 - c_{ϵ}$ for all $i = 1, \dots, n$ .

Condition 3. (Unconfounded Treatment Assignment): Given the covariates $W_{i}$ , (3.1) $D_{i} ⊥ {X_{i}^{m} (0), X_{i}^{m} (1)} | W_{i}$ (3.1) for all $i = 1, \dots, n$ and all $m \geq 1$ .

Condition 2 is the positivity assumption that every subject has some positive probability to be assigned to every level of D_i , which ensures the possibility to find both treated and control units for any given value of the covariates. Condition 3 implies that conditioning on the covariates, the probability of receiving the treatment is independent of the potential responses $X_{i, t} (d)$ over the repeated measurements for each subject. This means that except for a random error independent of the potential responses, the covariates $W_{i}$ include all the variables that determine D_i . Compared with the classical unconfoundedness assumption, we require conditional independence between the treatment assignment and the entire time course data of responses. This is because the subject level parameters $Y_{i}$ are unobservable, and need to be estimated using the responses ${X_{i, t}}$ from each subject.

For any estimate ${\hat{Y}}_{i} (d) = {({\hat{Y}}_{i, 1} (d), \dots, {\hat{Y}}_{i, p_{0}} (d))}^{T}$ of $Y_{i} (d)$ based on $X_{i}^{m} (d)$ , let ${\hat{Y}}_{i} = D_{i} {\hat{Y}}_{i} (1) - (1 - D_{i}) {\hat{Y}}_{i} (0)$ . As ${\hat{Y}}_{i} (d)$ is a function of $X_{i}^{m} (d)$ , Condition 3 leads to the conditional independence between D_i and ${{\hat{Y}}_{i} (0), {\hat{Y}}_{i} (1)}$ given the covariates, which implies $E ({\hat{Y}}_{i} | D_{i} = 1, W_{i}) - E ({\hat{Y}}_{i} | D_{i} = 0, W_{i}) = E {{\hat{Y}}_{i} (1) | W_{i}} - E {{\hat{Y}}_{i} (0) | W_{i}}$ . Let $Δ_{i, m} (d) = E {{\hat{Y}}_{i} (d) | T_{i} (d), W_{i}} - Y_{i} (d)$ be the bias of the estimator ${\hat{Y}}_{i} (d)$ for the ith subject, where $Δ_{i, m} (d) = {(Δ_{i, 1, m} (d), \dots, Δ_{i, p_{0}, m} (d))}^{T}$ , and the expectation $E {{\hat{Y}}_{i} (d) | T_{i} (d), W_{i}}$ is with respect to the subject level distribution $F_{d, i} (\cdot)$ given all the subject level parameters and covariates. If there exists an estimate that satisfies (3.2) $\max_{1 \leq j \leq p_{0}} E {Δ_{i, j, m} (d)} = o (1) for d = 0, 1$ (3.2) as $m \to \infty$ , the difference between $E {{\hat{Y}}_{i} (d)}$ and $E {Y_{i} (d)}$ converges to zero, and hence the population ATE $τ$ in (2.1) can be identified based on ${D_{i}, W_{i}, {\hat{Y}}_{i}}$ . This identification result is summarized in the following lemma.

Lemma 1.

Assume that there exists an estimate ${\hat{Y}}_{i} (d)$ of $Y_{i} (d)$ satisfying (3.2). Under Conditions 1–3, the population ATE $τ$ can be identified by $E {E ({\hat{Y}}_{i} | D_{i} = 1, W_{i}) - E ({\hat{Y}}_{i} | D_{i} = 0, W_{i})}$ , where the outer expectation is with respect to $W_{i}$ .

In our context, since the outcomes of interest $Y_{i}$ are not observable, the identifiability of $τ$ needs to be implied by an proper estimator ${\hat{Y}}_{i} (d)$ of $Y_{i} (d)$ that satisfies (3.2). This is an additional condition compared with the classical argument for the identifiability of causal parameters with observable outcomes under the unconfoundedness assumption. Note that the identification of $τ$ is independent of the sample size n, and if there exists an exactly unbiased estimator ${\hat{Y}}_{i} (d)$ of $Y_{i} (d)$ such that $Δ_{i, m} (d) = 0$ for all $i = 1, \dots, n$ , it does not require $m \to \infty$ either. Here, the condition in (3.2) could be understood as a model assumption for the identification of $τ$ . To identify the treatment effect on an unobservable random variable $Y_{i, j}$ , we need to assume the existence of an approximately unbiased estimator of $Y_{i, j}$ in the model. Note that the existence of an estimator satisfying the condition (3.2) may not be trivially true for any parameter or subject distribution as the potential responses $X_{i}^{m} (d)$ could be time-dependent and the dimension of the parameters could be very large.

Also notice that Lemma 1 holds under a weaker SUTVA condition on the derived potential outcomes ${\hat{Y}}_{i} (d)$ and a weaker unconfoundedness condition $D_{i} ⊥ {\hat{Y}}_{i} (d) | W_{i}$ for d = 0, 1 than Conditions 1 and 3. Since SUTVA and unconfoundedness are properties of the designs of the experiments or observational studies that generate the data, we make those two conditions on the potential responses in Conditions 1 and 3, instead of on an estimator ${\hat{Y}}_{i} (d)$ chosen by the investigator.

To conduct statistical inference for $τ$ , we need a stronger condition than (3.2), which specifies the uniform $l_{1}$ -norm convergence rate of the estimate ${\hat{Y}}_{i} (d)$ to $Y_{i} (d)$ .

Condition 4. For the estimate ${\hat{Y}}_{i} (d)$ of $Y_{i} (d)$ , we have $\max_{1 \leq j \leq p_{0}} E | Δ_{i, j, m} (d) | = δ_{m}$ for d = 0, 1, and $n^{1 / 2} δ_{m} \to 0$ as $n, m \to \infty$ .

The theoretical results in Sections 4 and 5 show that the proposed inference procedure for τ is valid if the bias term is small enough such that $δ_{m} = o (n^{- 1 / 2})$ . Compared with (3.2), this additional requirement on the bias term is needed to achieve $n^{- 1 / 2}$ convergence rate for the estimated $τ$ . Condition 4 can be satisfied for various types of estimators if the repeated measurements $X_{i}^{m} (d)$ are weakly dependent in time and the sample size m for each subject is relatively large. Also notice that for unbiased estimators with $δ_{m} = 0$ , Condition 4 holds for any m > 0, and m is not required to diverge to infinity.

Take Example 1 in Section 2 as an illustration. Let $v_{i, j_{1} j_{2}} (d) = \sum_{t = 1}^{m} {X_{i, t, j_{1}} (d) - {\bar{X}}_{i, j_{1}} (d)} {X_{i, t, j_{2}} (d) - {\bar{X}}_{i, j_{2}} (d)} / m$ be the sample variance of the ith subject, where ${\bar{X}}_{i, j} (d) = \sum_{t = 1}^{m} X_{i, t, j} (d) / m$ . It can be shown that the subject level correlation $Y_{i, j_{1} j_{2}} (d)$ can be well approximated by the sample correlation ${\hat{Y}}_{i, j_{1} j_{2}} (d) = v_{i, j_{1} j_{2}} (d) {v_{i, j_{1} j_{1}} (d) v_{i, j_{2} j_{2}} (d)}^{- 1 / 2}$ such that $E {{\hat{Y}}_{i, j_{1} j_{2}} (d) | T_{i} (d), W_{i}} = Y_{i, j_{1} j_{2}} (d) + O_{p} {{(\log p + \log n)}^{1 / 2} / m}$ if the sequence of potential responses ${X_{i, t} (d)}_{t = 1}^{m}$ satisfy certain mixing property and moment condition for each subject. For Example 2, under low-dimensional cases, since the least square estimator of the regression coefficient $Y_{i, j_{1} j_{2}}^{*} (d)$ is unbiased, Condition 4 is valid for this estimator under any m > p. Under high-dimensional cases, regulated regression estimators like lasso (Tibshirani Citation1996) are biased and are not suitable for our purpose. Instead, the de-sparsified lasso estimator (Van de Geer et al. Citation2014) can be applied to estimate $Y_{i, j_{1} j_{2}}^{*} (d)$ , which also satisfies Condition 4 under certain sparsity conditions on the coefficients. The justification of Condition 4 for the sample correlation and the de-sparsified lasso estimator for Examples 1 and 2 is provided in the SM.

In the next section, we propose an estimator of the ATE $τ$ via inverse propensity score weighting and a multiple testing procedure for the hypotheses (2.5) with FDP control.

4 Simultaneous Confidence Interval

To solve the challenges of many target parameters in $τ$ , high-dimensional $X_{i, t}$ and multi-source dependence in the data, in this section, we develop the Gaussian approximation result on the maximum of the estimates of $τ$ . We assume the logistic regression for the propensity score model, where D_i follows a Bernoulli distribution with the probability (4.1) $P (D_{i} = 1 | W_{i}) = Λ (W_{i}^{T} β)$ (4.1) for $Λ (a) = \exp (a) / {1 + \exp (a)}$ and unknown parameters $β = {(β_{1}, \dots, β_{q})}^{T} \in R^{q}$ . Let $β_{0} = {(β_{0, 1}, \dots, β_{0, q})}^{T}$ denote the true value of $β$ . The maximum likelihood estimator (MLE) $\hat{β}$ of the logistic regression (4.1) solves (4.2) $\begin{matrix} \hat{β} & = \underset{β \in R^{q}}{\arg \max} \sum_{i = 1}^{n} ρ_{β} (W_{i}, D_{i}) for \\ ρ_{β} (W_{i}, D_{i}) & = D_{i} \log Λ (W_{i}^{T} β) + (1 - D_{i}) \log {1 - Λ (W_{i}^{T} β)} . \end{matrix}$ (4.2)

Let $Λ (W_{i}^{T} \hat{β})$ be the estimated propensity score for the ith subject. Recall that ${\hat{Y}}_{i}$ is an estimate of $Y_{i}$ based on the responses ${X_{i, t}}$ . The ATE τ_j in (2.1) can be estimated by the inverse probability weighted estimator for derived random variables ${{\hat{Y}}_{i, j}}_{i = 1}^{n}$ as (4.3) ${\hat{τ}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} {\frac{{\hat{Y}}_{i, j} D_{i}}{Λ (W_{i}^{T} \hat{β})} - \frac{{\hat{Y}}_{i, j} (1 - D_{i})}{1 - Λ (W_{i}^{T} \hat{β})}} .$ (4.3)

If we have the outcome models for $E {Y_{i, j} (d) | W_{i}} = g_{d} (W_{i})$ , the augmented inverse probability weighted estimator (AIPWE) $\begin{matrix} {\hat{τ}}_{j}^{AIPW} = \frac{1}{n} \sum_{i = 1}^{n} {{\hat{g}}_{1} (W_{i}) - {\hat{g}}_{0} (W_{i})} \\ + \frac{1}{n} \sum_{i = 1}^{n} {\frac{({\hat{Y}}_{i, j} - {\hat{g}}_{1} (W_{i})) D_{i}}{Λ (W_{i}^{T} \hat{β})} - \frac{({\hat{Y}}_{i, j} - {\hat{g}}_{0} (W_{i})) (1 - D_{i})}{1 - Λ (W_{i}^{T} \hat{β})}} \end{matrix}$ can be applied to estimate τ_j , where ${\hat{g}}_{d} (W_{i})$ is the estimated outcome model by regressing ${\hat{Y}}_{i, j}$ on D_i = d and $W_{i}$ . The AIPWE is doubly robust and is the efficient estimator for τ_j if both the outcome models and the propensity score model in (4.1) are correctly specified (Cao, Tsiatis, and Davidian Citation2009). However, in our problem, specifying the outcome model may not be easy as the number of parameters is large and the parameter space could be restricted. In Example 1, correlations are bounded in $(- 1, 1)$ and the subject level correlation matrices are positive definite for all subjects. In Example 2, coefficients of the nodewise regression are proportional to partial correlations, which also have the positive definiteness constraint.

In the following, we derive the asymptotic expansion of ${\hat{τ}}_{j}$ , which implies the consistency of ${\hat{τ}}_{j}$ to τ_j and its limiting distribution. Let $\dot{Λ} = Λ (1 - Λ)$ and $\ddot{Λ} = \dot{Λ} (1 - 2 Λ)$ be the first and second derivatives of the logistic function $Λ (\cdot)$ , respectively, where $\dot{Λ} (W_{i}^{T} β)$ is the conditional variance of D_i given $W_{i}$ . Let ${\dot{ρ}}_{β} (W_{i}, D_{i}) = {D_{i} - Λ (W_{i}^{T} β)} W_{i}$ be the first derivative of $ρ_{β} (W_{i}, D_{i})$ with respect to $β$ . Then, $I_{β} = E [{\dot{ρ}}_{β} (W_{i}, D_{i}) {\dot{ρ}}_{β}^{T} (W_{i}, D_{i})] = E [\dot{Λ} (W_{i}^{T} β) W_{i} W_{i}^{T}]$ is the Fisher information (FI) matrix of $β$ in the model (4.1). Let $Ξ_{β} = I_{β}^{- 1}$ be the inverse of the FI matrix, and (4.4) $H = E ([{1 - Λ (W_{i}^{T} β_{0})} Y_{i} (1) + Λ (W_{i}^{T} β_{0}) Y_{i} (0)] W_{i}^{T})$ (4.4) be the $p_{0} \times q$ dimensional influence matrix of $\hat{β}$ on $\hat{τ}$ . The following proposition provides the asymptotic expansion of ${\hat{τ}}_{j}$ in (4.3).

Proposition 1.

Assume the subject level parameters of interest $Y_{i, j}$ , their estimates ${\hat{Y}}_{i, j}$ , the covariates $W_{i, k}$ and the true propensity score parameter $β_{0}$ are bounded such that $\max_{1 \leq k \leq q} | W_{i, k} | \leq C, \max_{1 \leq j \leq p_{0}} {| Y_{i, j} |, | {\hat{Y}}_{i, j} |} < C$ and $\max_{1 \leq k \leq q} | β_{0, k} | \leq C$ for all $i = 1, \dots, n$ and a positive constant C. Under Conditions 1–4 and (4.1), we have that (4.5) $\hat{τ} - τ = \frac{1}{n} \sum_{i = 1}^{n} {\tilde{η}}_{i} + O_{p} {{(\log p_{0})}^{1 / 2} n^{- 1} + δ_{m} n^{- 1 / 2}},$ (4.5) where ${\tilde{η}}_{i} = {({\tilde{η}}_{i 1}, \dots, {\tilde{η}}_{i p_{0}})}^{T}$ , (4.6) $\begin{matrix} {\tilde{η}}_{i j} = \frac{{\hat{Y}}_{i, j} D_{i}}{Λ (W_{i}^{T} β_{0})} - \frac{{\hat{Y}}_{i, j} (1 - D_{i})}{1 - Λ (W_{i}^{T} β_{0})} \\ - H_{j} Ξ_{β_{0}} {D_{i} - Λ (W_{i}^{T} β_{0})} W_{i} - τ_{j} \end{matrix}$ (4.6) with $E ({\tilde{η}}_{i j}) = O (δ_{m})$ for all $j = 1, \dots, p_{0}$ , and $H_{j}$ is the jth row of $H$ .

Proposition 1 implies asymptotic normality of ${\hat{τ}}_{j}$ with the influence functions ${{\tilde{η}}_{i j}}$ . Let (4.7) $\begin{matrix} θ_{j} = \frac{1}{n} \sum_{i = 1}^{n} E [{\frac{{\hat{Y}}_{i, j} D_{i}}{Λ (W_{i}^{T} β_{0})} - \frac{{\hat{Y}}_{i, j} (1 - D_{i})}{1 - Λ (W_{i}^{T} β_{0})} - τ_{j}} \\ - H_{j} Ξ_{β_{0}} {D_{i} - Λ (W_{i}^{T} β_{0})} W_{i}]^{2} \end{matrix}$ (4.7) be the main order term of $var (n^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{η}}_{i j})$ . Under the conditions $(\log p_{0}) / n = o (1)$ and $n^{1 / 2} δ_{m} = o (1)$ , we have $n^{1 / 2} ({\hat{τ}}_{j} - τ_{j}) \to N (0, θ_{j})$ in distribution for all $j = 1, \dots, p_{0}$ , as $n, m, p \to \infty$ . Notice that the variance θ_j includes the variation of subject level parameters $Y_{i, j}$ , the estimation variation of ${\hat{Y}}_{i, j}$ for each subject, and the variation of the estimated propensity score. The decomposition of θ_j into each of those sources is provided in (S.7) in the SM. Let ${\hat{θ}}_{j} = \sum_{i = 1}^{n} {\hat{η}}_{i j}^{2} / n$ be the estimator of θ_j , where (4.8) $\begin{matrix} {\hat{η}}_{i j} = \frac{{\hat{Y}}_{i, j} D_{i}}{Λ (W_{i}^{T} \hat{β})} - \frac{{\hat{Y}}_{i, j} (1 - D_{i})}{1 - Λ (W_{i}^{T} \hat{β})} \\ - {\hat{τ}}_{j} - {\hat{H}}_{j} {\hat{I}}^{- 1} {D_{i} - Λ (W_{i}^{T} \hat{β})} W_{i}, \end{matrix}$ (4.8) $\begin{matrix} \hat{I} = \frac{1}{n} \sum_{i = 1}^{n} Λ (W_{i}^{T} \hat{β}) {1 - Λ (W_{i}^{T} \hat{β})} W_{i} W_{i}^{T} and \\ {\hat{H}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} {\frac{1 - Λ (W_{i}^{T} \hat{β})}{Λ (W_{i}^{T} \hat{β})} D_{i} + \frac{Λ (W_{i}^{T} \hat{β})}{1 - Λ (W_{i}^{T} \hat{β})} (1 - D_{i}) } {\hat{Y}}_{i, j} W_{i}^{T} . \end{matrix}$

The following proposition provides the uniform convergence rate of ${\hat{θ}}_{j}$ to θ_j .

Proposition 2.

Under the conditions of Proposition 1, it can be shown that $\max_{1 \leq j \leq p_{0}} | {\hat{θ}}_{j} - θ_{j} | = O_{p} {{(\log p_{0})}^{1 / 2} n^{- 1 / 2} + δ_{m}}$ .

Although the responses ${X_{i, t}}_{t = 1}^{m}$ could be time dependent within each subject, due to the independence among subjects, Proposition 2 shows that θ_j can be consistently estimated by the sample variance of ${{\hat{η}}_{i j}}_{i = 1}^{n}$ without estimating the within-subject temporal dependence. The results in Propositions 1 and 2 directly imply the asymptotic normality of the standardized treatment effect $n^{1 / 2} ({\hat{τ}}_{j} - τ_{j}) {\hat{θ}}_{j}^{- 1 / 2}$ and the confidence interval of τ_j , which is presented in the following theorem.

Theorem 1.

Under the conditions of Proposition 1, $\log p_{0} = o (n)$ , and $θ_{j} \geq b_{1}$ for a small positive constant b₁, we have $n^{1 / 2} ({\hat{τ}}_{j} - τ_{j}) {\hat{θ}}_{j}^{- 1 / 2} \to N (0, 1)$ in distribution as $n, m, p \to \infty$ .

From Theorem 1, the $1 - α$ confidence interval for τ_j can be constructed as $({\hat{τ}}_{j} - z_{α / 2} {({\hat{θ}}_{j} / n)}^{1 / 2}, {\hat{τ}}_{j} + z_{α / 2} {({\hat{θ}}_{j} / n)}^{1 / 2})$ , where $z_{α / 2}$ is the upper $α / 2$ quantile of the standard normal distribution. The condition $θ_{j} \geq b_{1}$ stands for nondegenerate variance of $n^{1 / 2} {\hat{τ}}_{j}$ . To construct simultaneous confidence intervals of $τ$ , we need to screen out the noninformative components with θ_j converging to zero. Note that, from the decomposition of θ_j in (S.7) in the SM, $θ_{j} \to 0$ if $Y_{i, j} (d) = 0$ for d = 0, 1 and all $i = 1, \dots, n$ , and the estimation variation of ${\hat{Y}}_{i, j} (d)$ diminishes to zero with the increase of m. Let $S_{AZ} = {j : θ_{j} \to 0 as n \to \infty}$ be the non-informative components of the subject level parameters. In Example 1 of causal effects on correlations, this means a common sparsity structure on all subject level correlation matrices such that some of the correlations are zero uniformly across all subjects.

Let $R_{0} = {j : 1 \leq j \leq p_{0}, {\hat{θ}}_{j} \leq \tilde{c}}$ for a small positive constant $\tilde{c}$ , and $R_{1} = {j : 1 \leq j \leq p_{0}, {\hat{θ}}_{j} > \tilde{c}}$ be the complement set of R₀. Here, the lower bound $\tilde{c}$ is needed to derive the asymptotic results in Theorems 1 and 2, which can be chosen arbitrarily small to exclude the components with $θ_{j} \to 0$ . Note that θ_j also includes the estimation variation $var {{\hat{Y}}_{i} (d) | T_{i} (d), W_{i}}$ which does not diminish to zero for bounded m. This screening procedure will typically not exclude any component for small m in real applications. In practice, we choose $\tilde{c} = 0.01$ so that ${\hat{θ}}_{j}^{- 1 / 2} \leq 10$ for $j \in R_{1}$ . In the real data analysis in Section 7, we find all ${\hat{θ}}_{j}$ are larger than 0.03 and $R_{0} = \emptyset$ .

Let $S_{AZ}^{c}$ be the complement set of $S_{AZ}$ . To construct the simultaneous confidence intervals of τ_j over $S_{AZ}^{c}$ , we consider the maximal standardized estimates of treatment effects (4.9) $V = \max_{j \in R_{1}} \sqrt{n} | ({\hat{τ}}_{j} - τ_{j}) {\hat{θ}}_{j}^{- 1 / 2} |$ (4.9) over $j \in R_{1}$ . From the expansion (4.5), different components in $\hat{τ}$ are dependent due to correlations among the subject level parameters ${Y_{i, j} : 1 \leq j \leq p_{0}}$ from the population distribution and the common estimate $Λ (W_{i}^{T} \hat{β})$ of the propensity scores used in all ${{\hat{τ}}_{j}}$ . The complex dependence among ${{\hat{τ}}_{j}}$ makes the asymptotic analysis of $V$ challenging.

To tackle this problem, we derive a Gaussian approximation result for 𝒱, and use the multiplier bootstrap to estimate its quantile and construct the simultaneous confidence intervals for ${τ_{j} : j \in R_{1}}$ . To establish this approximation result, we make the following assumption on the lower bound of θ_j and the upper bound of the correlations of ${{\tilde{η}}_{i j}}$ for $j \in S_{AZ}^{c}$ . Conditions on bounds of variances and covariances are needed for Gaussian approximation results (Chernozhukov, Chetverikov, and Kato Citation2013).

Condition 5. There exist small positive constants b₁ and b₂ such that $θ_{j} \geq b_{1}$ for all $j \in S_{AZ}^{c}$ and $| cor ({\tilde{η}}_{i j_{1}}, {\tilde{η}}_{i j_{2}}) | \leq 1 - b_{2}$ for all $j_{1}, j_{2} \in S_{AZ}^{c}$ and $j_{1} \neq j_{2}$ .

Notice that the variance θ_j of ${\tilde{η}}_{i j}$ is bounded from above under the conditions of Proposition 1. Condition 5 assumes θ_j being bounded from below for all the components except those noninformative ones in $S_{AZ}$ , and the correlations of ${{\tilde{η}}_{i j}}$ among those components are bounded away from 1. This implies nondegenerate influence functions ${{\tilde{η}}_{i j}}$ of ${\hat{τ}}_{j}$ for all $j \in S_{AZ}^{c}$ . From Proposition 1, $V$ is asymptotically equivalent to $\max_{j \in R_{1}} {(n {\hat{θ}}_{j})}^{- 1 / 2} | \sum_{i = 1}^{n} {\tilde{η}}_{i j} |$ , where ${\tilde{η}}_{i j}$ can be well approximated by ${\hat{η}}_{i j}$ . Let ${\tilde{η}}_{i, 1}$ and ${\hat{η}}_{i, 1}$ be the vectorization of ${{\tilde{η}}_{i j} : j \in R_{1}}$ and ${{\hat{η}}_{i j} : j \in R_{1}}$ , respectively. Note that $\sum_{i = 1}^{n} {\hat{η}}_{i j} = 0$ for all $j = 1, \dots, p_{0}$ . Let $S_{n, 1} = \sum_{i = 1}^{n} {\hat{η}}_{i, 1} {\hat{η}}_{i, 1}^{T} / n$ be the sample covariance of ${{\hat{η}}_{i, 1}}$ , and $U_{n, 1}$ be the diagonal matrix of ${{\hat{θ}}_{j}^{1 / 2} : j \in R_{1}}$ . Let $Z = {W_{i}, D_{i}, X_{i, 1}, \dots, X_{i, m}}_{i = 1}^{n}$ be the collection of all data. The following theorem shows the distribution function of $V$ can be approximated by that of the maximum of a Gaussian random vector when the sample size n is large.

Theorem 2.

Under the conditions of Proposition 1, Condition 5 and $\log p_{0} = o (n)$ , we have $\sup_{x > 0} | P (V > x) - P (| z_{1} |_{\infty} > x | Z) | \overset{p}{\to} 0 as n \to \infty,$ where $z_{1} \sim N (0, U_{n, 1}^{- 1} S_{n, 1} U_{n, 1}^{- 1})$ is a multivariate normal random vector.

Theorem 2 extends the original Gaussian approximation result for sample means (Chernozhukov, Chetverikov, and Kato Citation2013) to inverse probability weighted estimates with derived outcomes. From this result, the quantiles of $V$ can be approximated by those of $| z_{1} |_{\infty}$ . To simulate $z_{1} \sim N (0, U_{n, 1}^{- 1} S_{n, 1} U_{n, 1}^{- 1})$ , we propose a multiplier bootstrap procedure that generates $g_{1, u}, \dots, g_{n, u} \overset{iid}{\sim} N (0, 1)$ and calculates $z_{1, u} = U_{n, 1}^{- 1} \sum_{i = 1}^{n} g_{i, u} {\hat{η}}_{i, 1} / \sqrt{n}$

for $u = 1, \dots, B$ . It can be easily shown that $z_{1, 1}, \dots, z_{1, B} \overset{iid}{\sim} N (0, U_{n, 1}^{- 1} S_{n, 1} U_{n, 1}^{- 1})$ . The upper α quantile of $V$ can be estimated by (4.10) ${\hat{q}}_{1} (α) = \inf {x : \frac{1}{B} \sum_{u = 1}^{B} I (| z_{1, u} |_{\infty} \leq x) \geq 1 - α} .$ (4.10)

The $1 - α$ simultaneous confidence intervals for ${τ_{j}}$ can be constructed as (4.11) $({\hat{τ}}_{j} - {\hat{q}}_{1} (α) {({\hat{θ}}_{j} / n)}^{1 / 2}, {\hat{τ}}_{j} + {\hat{q}}_{1} (α) {({\hat{θ}}_{j} / n)}^{1 / 2})$ (4.11) for $j \in R_{1}$ , where $P (R_{1} = S_{AZ}^{c}) \to 1$ as $n \to \infty$ .

As the data from different individuals are assumed to be independent, the empirical bootstrap that re-samples individuals might also be able to estimate the distribution function of $V$ . Let ${\hat{β}}^{EBT}, {\hat{τ}}_{j}^{EBT}$ , and ${\hat{θ}}_{j}^{EBT}$ be the re-sample estimates of $β$ , τ_j and θ_j , respectively. The empirical bootstrap uses $V^{EBT} = \max_{j \in R_{1}} \sqrt{n} | ({\hat{τ}}_{j}^{EBT} - {\hat{τ}}_{j}) {({\hat{θ}}_{j}^{EBT})}^{- 1 / 2} |$ to approximate the distribution of $V$ . Chernozhukov, Chetverikov, and Kato (Citation2013) (Appendix K) showed that the empirical bootstrap can consistently estimate the distribution function of the maximum of sample means. However, the consistency of the empirical bootstrap for the maximum of the standardized IPW estimates in our problem is unclear as $V$ requires the estimation of the propensity scores compared with sample means. We will investigate this question in future work.

5 Multiple Testing Procedure

In this section, we consider the multiple hypotheses $H_{0, j} : τ_{j} = 0$ versus $H_{a, j} : τ_{j} \neq 0$ for $j = 1, \dots, p_{0}$ . Note that the simultaneous confidence interval constructed in (4.11) can be directly used to identify nonzero τ_j . This procedure controls the FWER, defined as the probability of rejecting any true null hypothesis, at α asymptotically. However, the FWER control procedures are too conservative which lose power in detecting nonzero treatment effects. We aim to propose a more powerful multiple testing method that controls the FDP.

FDP is the ratio of the number of false positives over the number of discoveries, and false discovery rate (FDR) is the expectation of FDP. Given a small constant $c \in (0, 1)$ , the FDP exceedance rate (FDPex) is defined as $P (FDP > c)$ , which gives the probability of the FDP exceeding level c in a random sample. FDR control procedures target on the average FDP over repeated random samples, which fail to consider the variation of FDP in a single experiment. To introduce a more stringent measure of Type I errors for one random sample, Genovese and Wasserman (Citation2006) suggested to control the FDP exceedance rate $P (FDP > c) \leq α$ for pre-specified small positive constants c and α.

Following the step-down testing process with augmentation proposed in van der Laan, Dudoit, and Pollard (Citation2004b); Genovese and Wasserman (Citation2006), we use a similar construction of the multiple testing procedure to control the FDPex. However, the existing methods fail to work in our problem as any pair of the estimated ATEs ${{\hat{τ}}_{j}}$ are dependent. To solve this problem, we build the testing procedure on the maximum of ${\sqrt{n} | {\hat{τ}}_{j} | {\hat{θ}}_{j}^{- 1 / 2}}$ , and couple the step-down process with multiplier bootstrap, which approximates the distribution of the maximum statistic of standardized ${\hat{τ}}_{j}$ based on the Gaussian approximation result.

Specifically, we test the overall hypotheses $H_{0, (l)} : τ_{j} = 0 forall j \in R_{l} vs . H_{a, (l)} : τ_{j} \neq 0 forsome j \in R_{l}$ sequentially by the maximum statistics (5.1) $M_{l} = \max_{j \in R_{l}} \sqrt{n} | {\hat{τ}}_{j} {\hat{θ}}_{j}^{- 1 / 2} |,$ (5.1) where $R_{l} \subset {1, \dots, p_{0}}$ is a subset of indices. Let ${\hat{η}}_{i, l}$ be the vectorization of ${{\hat{η}}_{i j} : j \in R_{l}}$ , and $U_{n, l}$ be the diagonal matrix of ${{\hat{θ}}_{j}^{1 / 2} : j \in R_{l}}$ , where ${\hat{η}}_{i j}$ and ${\hat{θ}}_{j}$ are given in (4.8). Let $S_{n, l} = \sum_{i = 1}^{n} {\hat{η}}_{i, l} {\hat{η}}_{i, l}^{T} / n$ be the sample covariance of ${{\hat{η}}_{i, l}}$ . Similar to Theorem 2, it can be shown that the quantile of $M_{l}$ under $H_{0, (l)}$ can be consistently estimated by the quantile of $| z_{l} |_{\infty}$ where $z_{l} \sim N (0, U_{n, l}^{- 1} S_{n, l} U_{n, l}^{- 1})$ . Therefore, multiplier bootstrap is applied to determine the rejection region of $M_{l}$ for testing hypotheses $H_{0, (l)}$ . If $H_{0, (l)}$ is rejected, we remove the most significant index with the largest $\sqrt{n} | {\hat{τ}}_{j} | {\hat{θ}}_{j}^{- 1 / 2}$ from $R_{l}$ and continue this testing process until the null hypothesis is not rejected. It can be shown this step-down process can control the FWER. By augmenting the rejected set with the next few most significant components, the final discovered set of nonzero τ_j is able to control the FDPex.

Let $| U |$ be the cardinality of a set U, and $V_{1} = \emptyset$ . Starting from the set $R_{1} = {j : 1 \leq j \leq p_{0}, {\hat{θ}}_{j} > \tilde{c}}$ and the maximum statistic $M_{1}$ , the proposed procedure is as follows.

Generate $g_{1}, \dots, g_{n} \overset{iid}{\sim} N (0, 1)$ . Let $z_{l} = U_{n, l}^{- 1} \sum_{i = 1}^{n} g_{i} {\hat{η}}_{i, l} / \sqrt{n}$ .
Repeat the above step B times to generate iid Monte Carlo samples $z_{l, 1}, \dots, z_{l, B}$ from $N (0, U_{n, l}^{- 1} S_{n, l} U_{n, l}^{- 1})$ , and estimate the upper α quantile of $M_{l}$ under $H_{0, (l)}$ by
${\hat{q}}_{l} (α) = \inf [x : \frac{1}{B} \sum_{u = 1}^{B} I {| z_{l, u} |_{\infty} \leq x} \geq 1 - α] .$ (5.2)
Let $h_{l} = {argmax}_{j \in R_{l}} | {\hat{τ}}_{j} {\hat{θ}}_{j}^{- 1 / 2} |$ and $R_{l + 1} = {j \in R_{l} : j \neq h_{l}}$ . If $M_{l} > {\hat{q}}_{l} (α)$ , let $V_{l + 1} = V_{l} \cup {h_{l}}$ ; otherwise, declare the treatment effects τ_j in $R_{l}$ are not significant.
Repeat the first three steps until $R_{k_{0}}$ is not significant for some $k_{0} \geq 1$ . Then, $V_{k_{0}}$ is the set of discovered nonzero treatment effects by the step-down process.
Augment $V_{k_{0}}$ with the next $⌊ c | V_{k_{0}} | / (1 - c) ⌋$ largest $| {\hat{τ}}_{j} {\hat{θ}}_{j}^{- 1 / 2} |$ in $R_{k_{0}}$ . Let V be the final discovery set from both the step-down process and the augmentation step.

Let $S_{*} = {j : τ_{j} \neq 0}$ be the set of true nonzero causal effects. The following theorem states the asymptotic size and power of the proposed multiple testing procedure.

Theorem 3.

Under the conditions of Proposition 1, Condition 5 and $\log p_{0} = o (n)$ , for testing the hypotheses $H_{0, j} : τ_{j} = 0$ vs. $H_{a, j} : τ_{j} \neq 0$ for $j = 1, \dots, p_{0}$ , we have that

$\lim_{n, m, p \to \infty} P (FDP > c) \leq α$ , where $FDP = | V \cap S_{*}^{c} | / | V |$ ;
$P (S_{*} \subset V) \to 1$ if $\min_{j \in S_{*}} | τ_{j} | \geq c {\log (p_{0}) / n}^{1 / 2}$ for a positive constant c.

Theorem 3 demonstrates that the proposed multiple testing procedure is able to control the FDP exceedance rate at α and has a power converging to 1 if all the nonzero τ_j is large than $c {\log (p_{0}) / n}^{1 / 2}$ for some positive constant c. Compared with the existing methods, we have theoretically shown the Type I error control of the proposed testing procedure under general dependence among ${{\hat{τ}}_{j}}$ based on the Gaussian approximation result. This theoretical result is justified by the empirical size and power of the proposed procedure reported in the simulation section.

6 Simulation

In this section, we conducted simulation studies to evaluate the false discovery proportion and power of the proposed multiple testing procedure for the hypotheses (2.5). We considered the treatment effects on subject level correlations and regression coefficients in Examples 1 and 2 in Section 2. As a comparison of the proposed procedure, we also implemented the Benjamini-Hochberg (BH) procedure (Benjamini and Hochberg Citation1995) on the standardized estimated treatment effect $n^{1 / 2} {\hat{τ}}_{j} {\hat{θ}}_{j}^{- 1 / 2}$ based on Theorem 1.

For each subject $i \in {1, \dots, n}$ , the covariates $W_{i}$ were iid generated from a q-dimensional multivariate normal distribution with zero mean and covariance matrix $Σ_{w} = {(σ_{w, j_{1} j_{2}})}_{q \times q}$ , where $σ_{w, j_{1} j_{2}} = {0.7}^{| j_{1} - j_{2} |} / 2$ . Let $1_{q}$ be the q-dimensional vector of one. The treatment assignment D_i was generated from a logistic regression model with $P (D_{i} = 1 | W_{i}) = Λ (W_{i}^{T} β)$ , where $β = 0.5 1_{q}$ . To generate the potential responses $X_{i, t} (d)$ of each subject, we first constructed the subject level correlation matrix $Y_{i} (d) = {(Y_{i, j_{1} j_{2}} (d))}_{p \times p}$ . To model subject heterogeneity on correlations, let $δ_{i} = Λ (| W_{i}^{T} γ |) - 0.5 + ε_{δ, i}$ , where $γ = 0.4 1_{q}$ and $ε_{δ, i}$ was generated from $N (0, {0.1}^{2})$ independent with D_i . We set the potential covariance matrix under control (d = 0) to be fixed among all subjects where $Y_{i, j_{1} j_{2}} (0) = ρ_{0}^{| j_{1} - j_{2} |}$ . For the potential outcomes under treatment (d = 1), we considered block-diagonal and super-diagonal settings to construct signals on the individual covariance matrices as follows.

Block-diagonal: $Y_{i, j_{1} j_{2}} (1) = ρ_{0}^{| j_{1} - j_{2} |} + 2 ρ_{0} | δ_{i} |$ for $(j_{1}, j_{2}) \in S_{bd}$ and $Y_{i, j_{1} j_{2}} (1) = ρ_{0}^{| j_{1} - j_{2} |}$ for $(j_{1}, j_{2}) \notin S_{bd}$ , where $S_{bd} = {(j_{1}, j_{2}) : j_{1} \neq j_{2}, k s_{0} + 1 \leq j_{1}, j_{2} \leq (k + 1) s_{0} for k = 0, \dots, ⌊ {p/s}_{0} ⌋ - 1}$ , and $⌊ \cdot ⌋$ denotes the integer truncation function.
Super-diagonal: $Y_{i, j_{1} j_{2}} (1) = ρ_{0}^{| j_{1} - j_{2} |} + 2 ρ_{0} | δ_{i} |$ for $(j_{1}, j_{2}) \in S_{sd}$ and $Y_{i, j_{1} j_{2}} (1) = ρ_{0}^{| j_{1} - j_{2} |}$ for $(j_{1}, j_{2}) \notin S_{sd}$ , where $S_{sd} = {(j_{1}, j_{2}) : | j_{1} - j_{2} | = s_{1}}$ .

Under the block-diagonal setting, the nonzero treatment effects appear in the blocks of size s₀ along the main diagonal of the correlation matrices. There are no causal effects on subject level correlations outside those blocks. All the nonzero treatment effects have the same strength $2 ρ_{0} | δ_{i} |$ for the ith subject, and $2 ρ_{0} E {| Λ (| W_{i}^{T} γ |) - 0.5 + ε_{δ, i} |}$ is the average treatment effect on correlations. Under the super-diagonal setting, the nonzero causal effects are only on the s₁th super diagonal of the correlation matrices. Comparing the two structures, the block-diagonal setting has more signals than the super-diagonal setting. In our settings, as D_i was independent of the subject level covariances ${Y_{i} (0), Y_{i} (1)}$ conditioning on the covariates $W_{i}$ , the unconfounded treatment assignment condition was satisfied. We chose $s_{0} = 10$ and $s_{1} = 6$ in the simulation.

Denote $Y_{i} = D_{i} Y_{i} (1) + (1 - D_{i}) Y_{i} (0)$ . For each individual, the time course data $X_{i, t}$ over repeated measurements were generated via a vector autoregressive (VAR) model. Let $X_{i, 1}$ and $ϵ_{i, 2}, \dots, ϵ_{i, m}$ be iid random vectors from the p-dimensional normal distribution $N (0, Y_{i})$ with mean zero and covariance matrix $Y_{i}$ , where m is the number of repeated measurements for each subject over time. As the number of fMRI scans from each subject is usually around 200 to 300, we set m = 300 in our study. Let $X_{i, t} = ρ_{time} X_{i, t - 1} + \sqrt{1 - ρ_{time}^{2}} ϵ_{i, t}$ for $t = 2, \dots, m$ , where the parameter $ρ_{time}$ specifies the strength of time dependence. Under this VAR model, the temporal correlation between $X_{i, t, j}$ and $X_{i, t - 1, j}$ is equal to $ρ_{time}$ for all $j = 1, \dots, p$ . We considered both $ρ_{time} = 0$ and $ρ_{time} = 0.3$ to evaluate the effect of time dependence on the proposed procedure. We set the number of covariates q = 4, n = 100, 200 and p = 50, 100. All the simulation settings were repeated 1000 times.

For the proposed multiple testing procedure, we aimed to control the FDPex rate over 0.1 at 0.05, namely, $P (FDP > 0.1) \leq 0.05$ . The number B of bootstrap samples in (5.2) was set to be 1000. The nominal level of FDR for the BH procedure was set as 0.05. Over the 1000 repetitions of each simulation setting, we calculated the average FDR, the proportion of FDP larger than 0.1, and the average true positive rate (TPR) as the power of the test, which is the number of true positives over the total number of nonzero signals. Note that the BH procedure targets to control the FDR, which may not be able to control the FDPex rate, and the dependence among the standardized estimates $n^{1 / 2} {\hat{τ}}_{j} {\hat{θ}}_{j}^{- 1 / 2}$ due to sharing the common estimated propensity scores may also cause an inaccurate Type I error control for the BH procedure. The average FDPex rate, FDR and power of the two methods are reported in and .

Fig. 1 The average FDPex rate $P (FDP > 0.1)$ , FDR and power of the proposed (red) and BH (blue) procedures for testing the hypotheses (2.5) to recover the nonzero treatment effects on the subject level correlations under the block-diagonal signal setting with $s_{0} = 10$ , m = 300, n = 100, 200, $p = 50, 100$ and $ρ_{time} = 0$ and 0.3. The nominal level was 0.05.

Fig. 2 The average FDPex rate $P (FDP > 0.1)$ , FDR and power of the proposed (red) and BH (blue) procedures for testing the hypotheses (2.5) to recover the nonzero treatment effects on the subject level correlations under the super-diagonal signal setting with $s_{1} = 6$ , m = 300, n = 100, 200, $p = 50, 100$ and $ρ_{time} = 0$ and 0.3. The nominal level was 0.05.

From and , we observe that the proposed method can control the FDPex rate for testing the hypotheses (2.5) smaller than 0.05 under most of the cases, except for the super-diagonal signal setting under n = 100, where the FDPex rate was around 0.1. But this error rate decreased to less than 0.05 when the number of subjects n was increased to 200. Meanwhile, the FDR of the proposed method was well controlled, around 5% under all the cases. In comparison, the empirical FDPex rate of the BH procedure was quite large, which reached over 0.7 and 0.9 under the block-diagonal and super-diagonal signal settings with n = 100, respectively. The FDPex rate of BH became much smaller under the block-diagonal setting when n was increased to 200, but it was still larger than 5%. This shows that the BH procedure can not control the FDP exceedance rate for testing many treatment effects derived from the same group of subjects. Also, notice that BH encountered a slight inflation on FDR. Its FDR was around 0.1 larger than the nominal level 5%. It reached around 0.2 under the super-diagonal setting and n = 100. This may be due to the dependence among the estimated treatment effects ${{\hat{τ}}_{j}}$ in (4.3) as they all use the common propensity scores.

The overall power of the proposed test was reasonable, and it was much increased with the increase of the sample size from 100 to 200. Compared with the BH procedure, the power of the proposed procedure was lower, especially under n = 100. This is because of the size inflation of the BH procedure and the more stringent Type I error control of our procedure. Notice that both procedures use the same estimates of the average treatment effects. The difference is solely from the ways of choosing the cutoff value for testing the multiple hypotheses (2.5). When n was increased to 200, the power of the proposed procedure became much closer to those of BH. Especially, under the super-diagonal setting and $n = 200$ , the power of both methods reached around 0.9. Also notice that the results under $ρ_{time} = 0$ and $ρ_{time} = 0.3$ were comparable for both methods. When $ρ_{time} = 0$ , the subject level data were temporally independent. The power under $ρ_{time} = 0.3$ was slightly lower, which was due to the larger variation of the estimated subject level correlations under time-dependent data.

To evaluate the treatment effects on regression coefficients in Example 2, we compared the performance of the proposed method with the lasso estimate and the de-sparsified lasso estimate described in Section S1 in the SM. Note that the lasso estimate is biased, which does not satisfy Condition 4. The covariates $W_{i}$ , treatment assignment D_i and the subject level covariance matrix $Y_{i} (0)$ under control were generated the same as the simulation settings for Example 1. Let $Ω_{i} (d) = (ω_{i, j_{1} j_{2}} (d)) = Y_{i}^{- 1} (d)$ be the precision matrix. We added the signals on $Ω_{i} (1)$ in the block-diagonal setting where $ω_{i, j_{1} j_{2}} (1) = ω_{i, j_{1} j_{2}} (0) + ρ_{0} | δ_{i} |$ for $(j_{1}, j_{2}) \in S_{bd}$ and $ω_{i, j_{1} j_{2}} (1) = ω_{i, j_{1} j_{2}} (0)$ for $(j_{1}, j_{2}) \notin S_{bd}$ . The subject level data $X_{i, t}$ were generated by the VAR model with the error distribution $N (0, Ω_{i}^{- 1})$ , where $Ω_{i} = D_{i} Ω_{i} (1) + (1 - D_{i}) Ω_{i} (0)$ . From Example 2, the support of the coefficients ${Y_{i, j_{1} j_{2}}^{*} (d)}$ of the nodewise regressions is the same as that of the precision matrix $Ω_{i} (d)$ . The lasso penalty parameters λ_j and $λ_{j_{1} j_{2}}$ were set as $\sqrt{2 (\log p) / m}$ for all $j, j_{1}, j_{2}$ .

The average FDPex rate, FDR and power are reported in Figure S1 in the SM. We observe that the proposed method with the de-sparsified lasso estimate was able to control the FDPex rate under the nominal level 5% with good power. Its FDR was also controlled at 5%. In contrast, the proposed method with the lasso estimate failed to work. Its power was lower than 0.1 for all the cases due to the underestimation of the true regression coefficients. These results show that the asymptotic unbiasedness condition (Condition 4) is essential for the validity of the proposed method. Simply using any consistent estimator for the subject level parameters may not lead to desirable estimates of the treatment effects.

7 Case Study

It is well recognized that neurodevelopmental and neurodegenerative disorders, like Autism spectrum disorder (ASD), are associated with altered patterns of over- and under-connectivity in brains (Kana et al. Citation2014). In this section, we studied medication effects on the brain functional connectivity of patients with ASD. We used the fMRI data of 79 ASD patients from New York University Langone Medical Center. The preprocessed data were downloaded from the Autism Brain Imaging Data Exchange I initiative (Craddock et al. Citation2013). Among the 79 patients, 16 of them had taken some kind of medications for alleviating the related symptoms of ASD, and the rest 60 had not. We estimated and identified the nonzero treatment effects on brain functional connectivity, measured by correlations among brain regions, between the groups taking and not taking medications within three months before taking fMRI scans.

For each subject, there were 175 scans of brain overtime in the fMRI data, and a set of 116 regions-of-interests (ROI) were extracted according to the automated anatomical labeling atlas. Following Liu and Huang (Citation2020), those 116 ROIs can be divided into six common functional networks: the default mode network (DMN), the execution and attention network (EAN), the sensorimotor network (SMN), the visual network (Visual), the subcortical nuclei (SBN) regions and the cerebellum (Cerebel), which consist of 22, 17, 18, 14, 19, 26 regions, respectively. Data of three patients were deleted, one of which was due to missing covariates and the other two were due to missing values in the ROIs. The final data included 76 ASD patients.

Four covariates including age, sex, handedness, and FIQ were chosen in the analysis. Age and sex are binary variables, handedness is an integer score between –100 and 100 with a higher score implying more use of the right hand and vice versa, and FIQ is a positive integer score representing the standard estimate of intelligence. Note that the development of ASD symptoms is related to age, and ASD is a disorder of sexual differentiation (Alaerts, Swinnen, and Wenderoth Citation2016). IQ scores also affect the decision of taking medicine. An ASD patient may be treated if there is some abnormality in IQ. Meanwhile, those covariates are also related to the differences in brain connectivity among subjects. Therefore, age, sex, and IQ can be regarded as common causes for the treatment indicator and the potential responses. We make the unconfoundedness assumption that the treatment assignment and the potential fMRI readings are conditionally independent given the covariates.

We applied the proposed procedure for testing the multiple hypotheses (2.5) to identify the nonzero treatment effects on the subject level correlations among brain ROIs. Similar to the simulation study, we controlled the probability of FDP exceeding 0.1 at 0.05, namely, c = 0.1 and $α = 0.05$ . We discovered four significant pairs of brain connections with nonzero ATE of medication on ASD patients, which are reported in together with the corresponding functional network. The BH procedure that controls the FDR at 5% based on the standardized estimated treatment effects gives the same set of discovered connections as the proposed method for this dataset. The estimated treatment effects for the four discovered pairs, their 95% confidence intervals, the IPW averages for the treatment and control groups, $n^{- 1} \sum_{i = 1}^{n} {\hat{Y}}_{i, j} D_{i} / Λ (W_{i}^{T} \hat{β})$ and $n^{- 1} \sum_{i = 1}^{n} {\hat{Y}}_{i, j} (1 - D_{i}) / {1 - Λ (W_{i}^{T} \hat{β})}$ , are also reported in . The visualization of those connections is shown in Figure S2 in the SM. All six functional networks except the cerebellum network were involved in those significant connections. Notice that the correlation between SMA.L and HES.R were altered in the positive direction on average by medication, while the correlations of the other three pairs were negatively enhanced. Among the identified ROIs, HES.R was also reported to be strongly associated with the severity of ASD in Liu and Huang (Citation2020). The discovered connections were mainly related to the execution and attention network (EAN). According to CDC, signs and symptoms of ASD are associated with execution and attention. Bernas et al. (Citation2018) and Snyder and Troiani (Citation2020) also found evidence that relates the abnormalities of neurodynamics and connectivity of ASD brains to brain regions associated with attention and execution. Our method reveals the medication effects on brain connectivities of some regions in EAN, which may alleviate the symptoms of ASD.

Table 1 The four pairs of ROIs (the functional networks they belong to in the parentheses) with the identified nonzero treatment effects on subject level correlations by the proposed multiple testing procedure, the estimated treatment effect, its 95% confidence interval, and the IPW averages for the treatment group (Ave-trt) and the control group (Ave-cl).

Download CSV Display Table

We also applied the two-sample t-test directly on the subject level sample correlations ${{\hat{Y}}_{i, j_{1} j_{2}}}$ . This is equivalent to treating the propensity score in (4.3) being 16/76 for all subjects, as the ratio of the number of treated patients over the total sample size. Such a procedure is commonly used in many neuroscience studies. However, no significant difference in correlations between the treatment and control groups was found without the propensity score adjustment. This shows that simply conducting two-sample t-tests may not be able to identify treatment effects in observational studies for brain connectivity.

8 Discussion

In this article, we consider estimating the ATEs of unobservable random variables from each subject under the ignorability assumption. Those unobservable random variables need to be estimated using subject level data. Our theoretical analysis shows that an additional unbiasedness condition is needed for the identifiability and estimation of the ATEs, compared with the classical causal inference problems under the ignorability assumption. This implies that not all consistent estimators of the subject level parameters of interest can be used to estimate the population treatment effects. A de-biased estimator satisfying Condition 4 needs to be constructed.

We study the treatment effects on brain functional connectivity measured by Pearson’s correlation in the case study. The proposed method can be applied to other measures of brain connectivity, for example, Spearman’s ρ and Kendall’s τ for nonlinear association and partial correlation for conditional association. The proposed procedure can be extended to dynamic treatment effects over time as well, where the subject level potential outcomes $Y_{i} (d, t)$ and the average treatment effects $τ (t) = E {Y_{i} (1, t) - Y_{i} (0, t)}$ are functions of time (Bojinov, Rambachan, and Shephard Citation2021). This setting is suitable for task-based fMRI data where the brain activity patterns are likely to change with respect to the given tasks. Developing more powerful multiple testing procedures able to control the FDP under a general dependence structure among the estimated ATEs is also worth investigating in future work.

Supplemental material

UASA_A_2252135_Supplement.zip

Download Zip (96.3 KB)

supplement.pdf

Download PDF (1.8 MB)

Acknowledgments

The authors would like to thank the anonymous referees, the Associate Editor and the Editor for thoughtful comments and suggestions which have improved the presentation of the paper. Part of the work reported in the paper was conducted while Qiu was affiliated with Iowa State University. He thanks ISU’s statistics department for providing various support during the course of the project.

Supplementary Materials

The supplementary material provides justification of Condition 4 for the de-sparsified lasso estimator and sample correlation, proofs for all theoretical results and some additional numerical results.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Zhou was partially supported by National Key Research and Development Program of China, grant No. 2021YFF0901400, National Natural Science Foundation of China, grant No. 12026606, and Novo Nordisk A/S.

References

Alaerts, K., Swinnen, S. P., and Wenderoth, N. (2016), “Sex Differences in Autism: A Resting-State fMRI Investigation of Functional Brain Connectivity in Males and Females,” Social Cognitive and Affective Neuroscience, 11, 1002–1016. DOI: 10.1093/scan/nsw027.
PubMed Web of Science ®Google Scholar
Andrews-Hanna, J. R., Snyder, A. Z., Vincent, J. L., Lustig, C., Head, D., Raichle, M. E., and Buckner, R. L. (2007), “Disruption of Large-Scale Brain Systems in Advanced Aging,” Neuron, 56, 924–935. DOI: 10.1016/j.neuron.2007.10.038.
PubMed Web of Science ®Google Scholar
Austin, P. C., and Stuart, E. A. (2015), “Moving Towards Best Practice When Using Inverse Probability of Treatment Weighting (IPTW) Using the Propensity Score to Estimate Causal Treatment Effects in Observational Studies,” Statistics in Medicine, 34, 3661–3679. DOI: 10.1002/sim.6607.
PubMed Web of Science ®Google Scholar
Bang, H., and Robins, J. M. (2005), “Doubly Robust Estimation in Missing Data and Causal Inference Models,” Biometrics, 61, 962–973. DOI: 10.1111/j.1541-0420.2005.00377.x.
PubMed Web of Science ®Google Scholar
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society, series B, 57, 289–300. DOI: 10.1111/j.2517-6161.1995.tb02031.x.
Web of Science ®Google Scholar
Bernas, A., Barendse, E. M., Aldenkamp, A. P., Backes, W. H., Hofman, P. A., Hendriks, M. P., Kessels, R. P., Willems, F. M., de With, P. H., Zinger, S., et al. (2018), “Brain Resting-State Networks in Adolescents with High-Functioning Autism: Analysis of Spatial Connectivity and Temporal Neurodynamics,” Brain and Behavior, 8, e00878. DOI: 10.1002/brb3.878.
PubMed Web of Science ®Google Scholar
Berto, S., Treacher, A. H., Caglayan, E., Luo, D., Haney, J. R., Gandal, M. J., Geschwind, D. H., Montillo, A. A., and Konopka, G. (2022), “Association between Resting-State Functional Brain Connectivity and Gene Expression is Altered in Autism Spectrum Disorder,” Nature Communications, 13, 1–11. DOI: 10.1038/s41467-022-31053-5.
PubMed Web of Science ®Google Scholar
Bojinov, I., Rambachan, A., and Shephard, N. (2021), “Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective,” Quantitative Economics, 12, 1171–1196. DOI: 10.3982/QE1744.
Web of Science ®Google Scholar
Brown, P. J., Vannucci, M., and Fearn, T. (1998), “Multivariate Bayesian Variable Selection and Prediction,” Journal of the Royal Statistical Society, Series B, 60, 627–641. DOI: 10.1111/1467-9868.00144.
Google Scholar
Cai, T. T., and Liu, W. (2016), “Large-Scale Multiple Testing of Correlations,” Journal of the American Statistical Association, 111, 229–240. DOI: 10.1080/01621459.2014.999157.
PubMed Web of Science ®Google Scholar
Cao, W., Tsiatis, A. A., and Davidian, M. (2009), “Improving Efficiency and Robustness of the Doubly Robust Estimator for a Population Mean with Incomplete Data,” Biometrika, 96, 723–734. DOI: 10.1093/biomet/asp033.
PubMed Web of Science ®Google Scholar
Chernozhukov, V., Chetverikov, D., and Kato, K. (2013), “Gaussian Approximations and Multiplier Bootstrap for Maxima of Sums of High-Dimensional Random Vectors,” The Annals of Statistics, 41, 2786–2819. DOI: 10.1214/13-AOS1161.
Web of Science ®Google Scholar
Craddock, C., Benhajali, Y., Chu, C., Chouinard, F., Evans, A., Jakab, A., Khundrakpam, B. S., Lewis, J. D., Li, Q., Milham, M., Yan, C., and Bellec, P. (2013), “The Neuro Bureau Preprocessing Initiative: Open Sharing of Preprocessed Neuroimaging Data and Derivatives,” Frontiers in Neuroinformatics, 7, 27.
PubMedGoogle Scholar
Feller, A., and Gelman, A. (2015), Hierarchical Models for Causal Effects, Hoboken, NJ: Wiley.
Google Scholar
Funk, M. J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M. A., and Davidian, M. (2011), “Doubly Robust Estimation of Causal Effects,” American Journal of Epidemiology, 173, 761–767. DOI: 10.1093/aje/kwq439.
PubMed Web of Science ®Google Scholar
Genovese, C., and Wasserman, L. (2004), “A Stochastic Process Approach to False Discovery Control,” The Annals of Statistics, 32, 1035–1061. DOI: 10.1214/009053604000000283.
Web of Science ®Google Scholar
Genovese, C. R., and Wasserman, L. (2006), “Exceedance Control of the False Discovery Proportion,” Journal of the American Statistical Association, 101, 1408–1417. DOI: 10.1198/016214506000000339.
Web of Science ®Google Scholar
Hirano, K., Imbens, G. W., and Ridder, G. (2003), “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica, 71, 1161–1189. DOI: 10.1111/1468-0262.00442.
Web of Science ®Google Scholar
Imbens, G., and Rubin, D. (2015), Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, Cambridge: Cambridge University Press.
Google Scholar
Kana, R. K., Uddin, L. Q., Kenet, T., Chugani, D., and Müller, R.-A. (2014), “Brain Connectivity in Autism,” Frontiers in Human Neuroscience, 8, 349. DOI: 10.3389/fnhum.2014.00349.
PubMed Web of Science ®Google Scholar
Liu, X., and Huang, H. (2020), “Alterations of Functional Connectivities Associated with Autism Spectrum Disorder Symptom Severity: A Multi-Site Study Using Multivariate Pattern Analysis,” Scientific Reports, 10, 1–13. DOI: 10.1038/s41598-020-60702-2.
PubMed Web of Science ®Google Scholar
Marrelec, G., Krainik, A., Duffau, H., Pélégrini-Issac, M., Lehéricy, S., Doyon, J., and Benali, H. (2006), “Partial Correlation for Functional Brain Interactivity Investigation in Functional MRI,” Neuroimage, 32, 228–237. DOI: 10.1016/j.neuroimage.2005.12.057.
PubMed Web of Science ®Google Scholar
Meinshausen, N., and Bühlmann, P. (2006), “High-Dimensional Graphs and Variable Selection with the Lasso,” The annals of statistics, 34, 1436–1462. DOI: 10.1214/009053606000000281.
Web of Science ®Google Scholar
Palaniyappan, L., Simmonite, M., White, T. P., Liddle, E. B., and Liddle, P. F. (2013), “Neural Primacy of the Salience Processing System in Schizophrenia,” Neuron, 79, 814–828. DOI: 10.1016/j.neuron.2013.06.027.
PubMed Web of Science ®Google Scholar
Peng, J., Wang, P., Zhou, N., and Zhu, J. (2009), “Partial Correlation Estimation by Joint Sparse Regression Models,” Journal of the American Statistical Association, 104, 735–746. DOI: 10.1198/jasa.2009.0126.
PubMed Web of Science ®Google Scholar
Qiu, Y., and Zhou, X.-H. (2020), “Estimating c-level Partial Correlation Graphs with Application to Brain Imaging,” Biostatistics, 21, 641–658. DOI: 10.1093/biostatistics/kxy076.
PubMed Web of Science ®Google Scholar
Qiu, Y., and Zhou, X.-H. (2022), “Inference on Multi-Level Partial Correlations based on Multi-Subject Time Series Data,” Journal of the American Statistical Association, 117, 2268–2282.
Web of Science ®Google Scholar
Rosenbaum, P. R., and Rubin, D. B. (1983), “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, 70, 41–55. DOI: 10.1093/biomet/70.1.41.
Web of Science ®Google Scholar
Rubin, D. B. (1974), “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies,” Journal of Educational Psychology, 66, 688–701. DOI: 10.1037/h0037350.
Web of Science ®Google Scholar
Rubin, D. B. (1978), “Bayesian Inference for Causal Effects: The Role of Randomization,” The Annals of statistics, 6, 34–58.
Web of Science ®Google Scholar
Snyder, W., and Troiani, V. (2020), “Behavioural Profiling of Autism Connectivity Abnormalities,” BJPsych Open, 6, e11. DOI: 10.1192/bjo.2019.102.
PubMed Web of Science ®Google Scholar
Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
Google Scholar
Van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014), “On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models,” The Annals of Statistics, 42, 1166–1202. DOI: 10.1214/14-AOS1221.
Web of Science ®Google Scholar
van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004a), “Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate,” Statistical Applications in Genetics and Molecular Biology, 3, Article14. DOI: 10.2202/1544-6115.1041.
PubMedGoogle Scholar
van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004b), “Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives,” U.C. Berkeley Division of Biostatistics Working Paper Series.
Google Scholar

Unveiling the Unobservable: Causal Inference on Multiple Derived Outcomes

Abstract

1 Introduction

2 Preliminary and Parameter of Interest

3 Conditions and Identifiability

4 Simultaneous Confidence Interval

5 Multiple Testing Procedure

6 Simulation

7 Case Study

8 Discussion

UASA_A_2252135_Supplement.zip

supplement.pdf

Acknowledgments

Supplementary Materials

Disclosure Statement

References

Information for

Open access

Opportunities

Help and information

Unveiling the Unobservable: Causal Inference on Multiple Derived Outcomes

Abstract

1 Introduction

2 Preliminary and Parameter of Interest

3 Conditions and Identifiability

4 Simultaneous Confidence Interval

5 Multiple Testing Procedure

6 Simulation

7 Case Study

8 Discussion

UASA_A_2252135_Supplement.zip

supplement.pdf

Acknowledgments

Supplementary Materials

Disclosure Statement

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date