908
Views
1
CrossRef citations to date
0
Altmetric
Research Paper

A region-based method for causal mediation analysis of DNA methylation data

ORCID Icon, ORCID Icon, ORCID Icon &
Pages 286-296 | Received 16 Jul 2020, Accepted 10 Feb 2021, Published online: 23 Mar 2021

ABSTRACT

Exposure to environmental factors can affect DNA methylation at a 5'-cytosine-phosphate-guanine-3' (CpG) site or a genomic region, which can then affect an outcome. In other words, environmental effects on an outcome could be mediated by DNA methylation. To date, single CpG-site-based mediation analysis has been employed extensively. More recently, however, there has been considerable interest in studying differentially methylated regions (DMRs), both because DMRs are more likely to have functional effects than single CpG sites and because testing DMRs reduces multiple testing. In this report, we propose a novel causal mediation approach under the counterfactual framework to test the significance of total (TE), direct (DE), and indirect effects (IE) of predictors on response variable with a methylated region (MR) as the mediator (denoted as MR-Mediation). Functional linear transformation is used to reduce the possible high dimension of the CpG sites in a predefined MR and to account for their location information. In our simulation studies, MR-Mediation retained the desired Type I error rates for TE, DE, and IE tests. Furthermore, MR-Mediation had better power performance than testing mean methylation level as the mediator in most considered scenarios, especially for IE (i.e., mediated effect) test, which could be more interesting than the other two effect tests. We further illustrate our proposed method by analysing the methylation mediated effect of exposure to gun violence on total immunoglobulin E or atopic asthma among participants in the Epigenetic Variation and Childhood Asthma in Puerto Ricans study.

Introduction

Epigenetic studies are key to understanding regulatory mechanisms of gene expression. Of the available epigenetic markers, DNA methylation is the most stable and widely studied epigenetic modification [Citation1]. DNA methylation refers to the addition of a methyl group at the 5′ location of cytosine nucleotides (C), with this modification occurring predominantly at Cs that are immediately followed by a guanine (G) in the 5′ to 3′ direction, denoted 5'-cytosine-phosphate-guanine-3' (CpG). DNA methylation often affects gene transcription and has been associated with complex diseases [Citation2–5] and cancers [Citation6–10].

DNA methylation analysis has traditionally focused on individual CpG sites. Over the last few years, there has been great interest in developing region-based methods to detect differentially methylated regions (DMRs) [Citation11,Citation12]. Such region-based approach is supported by strong correlation in DNA methylation levels across regions of the genome [Citation13,Citation14], as well as by the fact that methylated regions (MRs) such as CpG islands [Citation15], CpG island shores [Citation16], or generic 2-kb regions [Citation17] are more often linked to functionally relevant findings than single CpG sites. Statistically, if there are multiple causative CpG sites that have small individual effects, single-marker testing may have limited power to detect those weak signals. On the other hand, region-based approaches have higher power by combining the effects of multiple CpG sites. Moreover, region-based methods greatly reduce the burden of multiple testing in a genome-wide study.

DMR identification methods could be classified into two categories: 1. Supervised methods that first calculate the p-values for the association between each single CpG site and the outcome of interest, and then identify the regions with adjacent small p-values, and 2. Unsupervised methods that predefine the regions without using any outcome information, and then test the association between methylation level in the genomic regions and the outcome of interest.

Exposure to environmental factors can affect DNA methylation, which can then affect a disease or condition of interest. In other words, environmental effects on a disease or condition could be mediated by DNA methylation (). Mediation analysis investigates the relationship between an exposure and an outcome, while examining how such exposure and outcome relate to the mediator. Although the Baron and Kenny’s ‘four steps’ method [Citation18] has been the most commonly used approach to mediation analysis, such approach has been recently extended using a counterfactual framework [Citation19–21] to allow the presence of an interaction between the exposure and mediator, as well as to decompose the estimate of total effect (TE) into estimates of direct (DE) and indirect effects (IE). To date, single CpG-site-based mediation analysis has been employed extensively. This approach has indeed shown that DNA methylation can mediate the effects of environmental exposure on complex traits [Citation22–24], complex diseases [Citation25–27], and cancers [Citation28,Citation29].

Figure 1. Mediation model with DNA methylation as the mediator.

Figure 1. Mediation model with DNA methylation as the mediator.

In this study, we propose a novel causal mediation approach under the counterfactual framework to test the significance of TE, DE, and IE of DNA methylation in a genomic region on an outcome of interest. In this approach, a group of CpG sites from a predefined region are utilized as the mediator, and functional linear transformation [Citation30] is used to reduce the possible high dimensions in the region’s CpG sites and account for their location information. We denote the method as MR-Mediation. To evaluate the performance of the proposed MR-Mediation statistic, we first conduct extensive simulation studies to assess Type I error rates and power. We then illustrate our proposed method by analysing whether DNA methylation mediates the association between exposure to gun violence and total immunoglobulin E (IgE) or atopic asthma among children participating in the Epigenetic Variation and Childhood Asthma in Puerto Ricans (EVA-PR) study [Citation31].

Methods

Functional linear transformation for region-based CpG sites

The idea is to linearly combine a few basis functions to fit a curve that can closely represent the methylation levels of the CpG sites in the region (e.g., an illustrative example is shown in Figure S1). Thus, the dimension can be reduced and the location information can be captured by the fitted curve. Specifically, we consider n subjects who have m CpG sites in a predefined region. The physical locations of the m CpG sites are normalized to a range of 0,1 (i.e., 0l1lm1). Thus, we use Milj to denote the methylation level of a CpG site at jth location from the ith subject. We assume that the n×1 vector of the continuous outcome y follows a linear regression model. For the ith subject,

yi=Ziβ0+01Mildl×βM+εi1

where Zi is a 1×p covariate vector, β0 is a p×1 parameter vector (an intercept and p1 covariates), Mil is a function of CpG site locations and includes m CpG site values, βM is the effect of Mil, and εi is the random error. By using the ordinary linear smoother [Citation32], we let

Mil=Mil1,  ,MilmΦΦΦ1ϕl

where Mil1,  ,Milm is a 1×m vector for m CpG sites in the region, ϕl is a KM×1 vector including KM basis functions, and Φ is a m×KM matrix containing the values of ϕl at l1lm. Thus, Formula (1) can be rewritten as

yi=Ziβ0+RiβM+εi2

where Ri=Mil1,  ,MilmΦ ΦΦ101ϕldl is a scalar. Therefore, Ri can be viewed as a summary of the methylation level in the region. In this study, we focus on cubic B-spline basis functions.

Association between the mediator and the exposure variable

To test the proportions of the TE, DE, and IE of the exposure on the outcome variables that are mediated by DNA methylation, we need to first make sure there exists an association between the mediator and exposure variables. To do so, we consider a linear regression model for n independent subjects,

Ri=Ziα0+XiαX+εRi3

where Ri is a scalar representing the regional methylation level for the ith subject, Zi is a 1×p covariate vector, α0 is a p×1 parameter matrix (an intercept and p1 covariates), Xi is a scalar for the exposure variable, αX is the parameter for exposure, and εRi is a random error. In this study, we first test H0:αX=0, and then further test any methylation mediated effects between the exposure and outcome variables only when αX0.

Counterfactual approach to the causal mediation model

The TE from exposure to outcome can be decomposed into the IE mediated through DNA methylation and the DE not mediated by DNA methylation, which could be mediated by other mechanisms or a direct link between exposure and outcome. For a continuous outcome, the mediation model for the ith subject can be written as

Eyi|Zi,Xi,Ri=Ziβ0+XiβX+RiβM+XiRiβXM4

where Ziβ0 is for the covariates, XiβX is for the exposure variable, RiβM is for the regional methylation level, and XiRiβXM is for the interaction between the exposure variable and the regional methylation level. To use the counterfactual approach to the causal mediation model, four identifiability assumptions need to be satisfied: 1. No unmeasured confounding of the exposure and outcome relationship; 2. No unmeasured confounding of the mediator and outcome relationship; 3. No unmeasured confounding of the exposure and mediator relationship; and 4. The exposure must not cause any known confounder of the mediator and outcome relationship [Citation33]. If all assumptions hold, the DE, IE, and TE for change in exposure from level x0 to x1 are given by

DE =Eyi|Zi,x1,Rix0Eyi|Zi,x0,Rix0
=x1x0βX+Ziα0+x0αXβXM
IE =Eyi|Zi,x1,Rix1Eyi|Zi,x1,Rix0
=x1x0αXβM+x1βXM
TE=Eyi|Zi,x1,Rix1Eyi|Zi,x0,Rix0
=DE+IE

Since we only focus on the scenarios with αX0, conceivably, it can be shown that

H0:DE=0 equivalent to H0:βX=βXM=0

H0:IE=0 equivalent to H0:βM=βXM=0 5

H0:TE=0 equivalent to H0:βX=βM=βXM=0.

Thus, we can test H0:DE=0 by a F-test with degrees of freedom 2,np3, H0:IE=0 by a F-test with degrees of freedom 2,np3, and H0:TE=0 by a F-test with degrees of freedom 3,np3.

Similarly, when the outcome is binary, the mediation model for the ith subject can be written as

logitPyi=1|Zi,Xi,Ri=Ziβ0+XiβX+RiβM+XiRiβXM6
.

If all identifiability assumptions hold, the DE, IE, and TE for change in exposure from level x0 to x1 are given by

logORDE=logitPyi=1|Zi,x1,Rix0logitPyi=1|Zi,x0,Rix0
logORIE=logitPyi=1|Zi,x1,Rix1logitPyi=1|Zi,x1,Rix0
logORTE=logitPyi=1|Zi,x1,Rix1logitPyi=1|Zi,x0,Rix0
.

According to Valeri and VanderWeele (2013) [Citation33], a rare outcome assumption (i.e., low disease prevalence) is needed so as to have the odds ratio (OR) in the case-control design equivalent to the OR in the population, which is further approximately equivalent to the relative risk (RR) in the population. We also adopt the rare outcome assumption in this study. Then, the DE, IE, and TE can be derived as

logORDEx1x0βX+Ziα0+x0αX+βMσR2βXM+12x12x02βXM2σR2
logORIEx1x0αXβM+x1βXM
logORTE=logORDE+logORIE
,

where, σR2 is the variance of random error from the mediator regression (Formula (3)). The existence of σR2 is due to expεRl˜ognormal0,σR2. The null hypotheses for testing DE=0, IE=0 and TE=0 for binary outcome are the same as given for continuous outcome. We can test H0:DE=0 by a χ2-distributed Rao’s score statistic with 2 degrees of freedom, H0:IE=0 with 2 degrees of freedom, and H0:TE=0 with 3 degrees of freedom. Please note that the mediator regression (Formula (3)) is run only for controls to account for the case-control design [Citation33]. In other words, because of the rare outcome assumption, the exposure distribution in the controls from the case-control design is approximately equivalent to the exposure distribution in the whole population.

The EVA-PR study

EVA-PR is a case-control study of asthma in subjects aged 9–20 years. Subject recruitment and study procedures for EVA-PR have been described elsewhere [Citation31]. Genome-wide DNA methylation was measured in 488 nasal epithelial samples using the Infinium HumanMethylation450 BeadChip arrays (Illumina, San Diego, CA). The preprocessing and quality control (QC) procedures were described in our prior study [Citation31]. After QC, CpG sites with an overall mean β-value within [0.1, 0.9] were kept, leaving 227,901 CpG sites in the final dataset. Methylation M-value was calculated by log2(β-value/(1–β-value)). This methylation dataset was used in both the following simulation studies and in real data analysis.

The child’s lifetime exposure to gun violence was treated as the exposure variable in the following real data analysis. Lifetime exposure to gun violence was derived from the exposure to violence scale [Citation34–36] and analysed as a binary variable (having heard gunshots at least twice vs. no more than once) [Citation37]. Atopy was defined as an IgE ≥0.35 to ≥1 IU/mL of five common allergens (Der p 1, Bla g 2, Fel d 1, Can f 1, and Mus m 1). Asthma was defined as a physician’s diagnosis plus at least one episode of wheeze in the previous year. Atopic asthma was defined as the presence of both atopy and asthma, and thus treated as the binary outcome variable in the real data analysis; controls were non-asthmatic subjects. Total plasma IgE was treated as continuous outcome after log10-transformation.

Simulation studies

We used the CpG sites from 488 nasal epithelial samples in EVA-PR for our simulation studies. The CpG sites were grouped into corresponding genes, and the kept CpG sites also needed to be within 5 kb distance from its nearest gene. The histogram plot (Figure S2) showed that most of the genes included less than 20 CpG sites, although some genes could have several hundred CpG sites. Thus, we selected different sizes of genes in our simulation studies. Specifically, we selected (1) 705 genes with 10 CpG sites; (2) 306 genes with 15 CpG sites; (3) 152 genes with 20 CpG sites; and (4) 21 genes with 40 CpG sites.

Furthermore, we need to choose the basis functions and the number of basis functions to reduce the dimension of each gene. To simulate the exposure X and outcome y variables, we used the same cubic (i.e., order=4) B-spline basis functions for Mil. We considered KM=4,,8 for all genes.

For each gene, we simulated one 488×1 vector for binary exposure X via the model:

logitPX=1=MαX

where M included 15% consecutive CpG sites covering the longest distance in the gene, each αX=1.2/m15% and m15% is the number of 15% CpG sites in the gene. The purpose is to make the exposure and the mediator (i.e., methylation) associated. Then, we can evaluate the power of testing αX=0 in Formula (3).

For each gene, we then simulated continuous and binary outcomes y for 488 samples via the models:

y=β0+Z1β1+Z2β2+XβX+MβM+XMβXM+ε

and

logitPy=1=β0+Z1β1+Z2β2+XβX+MβM+XMβXM

where Z1, Z2, X and M were the same as described above, ε was generated from standard normal distribution, β0=0.1, β1=0.1, β2=0.1 and β0=1. We considered (1) all m15% effective CpG sites in the same direction (100%+) and (2) 50% effective CpG sites in one direction and the other 50% effective CpG sites in the opposite direction (50%+/50%–). We then considered eight settings for βX, βM, and βXM to simulate y, and these eight settings () were used to evaluate Type I error rates and power of testing DE, IE, and TE with the null hypotheses described in Formula (5). As shown in , Setting (1) was used to assess Type I error rates of testing DE, IE, and TE; Setting (2) was used to assess Type I error rates of testing DE and power of testing IE and TE; Setting (3) was used to assess Type I error rates of testing IE and power of testing DE and TE; and Settings (4)–(8) were used to assess power of testing DE, IE, and TE. Please note that Setting (2) may not be the true null hypothesis for DE. Although βX=0, there could still be implicit X effect, because X and M are associated, and βM0. Analogously, Setting (3) may not be the true null hypothesis for IE.

Table 1. The eight settings for βX, βM, and βXM and their corresponding null hypothesis (H0) and alternative hypothesis (Ha) for DE, IE, and TE

For each gene, we simulated 10 sets of y for genes with 10 CpG sites (total is 705 × 10 = 7,050); 30 sets of y for genes with 15 CpG sites (total is 306 × 30 = 9180); 50 sets of y for genes with 20 CpG sites (total is 152 × 50 = 7600); and 400 sets of y for genes with 40 CpG sites (total is 21 × 400 = 8400). Thus, we simulated 32,230 datasets for continuous outcome and another 32,230 datasets for binary outcome. After simulating the exposure X and outcome y variables, we used two approaches to conduct the analyses: (1) basis function fitted regional methylation level as the mediator (i.e., MR-Mediation) and (2) mean CpG methylation level as the mediator. We assessed the Type I error rates of testing DE, IE, and TE only when the p-values of αX were less than 0.05, and we evaluated the power at the significance level that the p-values of the corresponding effect (i.e., DE, IE or TE) and αX were both less than 0.05.

Results

Simulation of the type I error rate

For both continuous and binary outcomes, both approaches retained the desired Type I error rates for DE, IE, and TE across all considered gene sizes and basis numbers in Setting (1) (, S3-S5 and S8-S10). Since Setting (2) might not be the true null hypothesis for DE, both MR-Mediation and the mean CpG methylation approach had inflated Type I error rates for scenarios with 100%+ effective CpG sites (Figure S6 and S11) and correct Type I error rates for scenarios with 50%+/50%– effective CpG sites (Figure S13 and S15). Although Setting (3) might not be the true null hypothesis for IE, both approaches retained the desired Type I error rates for all simulated conditions for both continuous and binary outcomes (Figures S7, S12, S14, and S16). Thus, both approaches are valid under their true null hypothesis.

Figure 2. QQ plot of the p-values from Setting (1) with (a) continuous outcome and (b) binary outcome. The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20 or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach. A 95% pointwise confidence band (grey area) was computed under the assumption that the p-values were drawn independently from a uniform [0, 1] distribution.

Figure 2. QQ plot of the p-values from Setting (1) with (a) continuous outcome and (b) binary outcome. The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20 or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach. A 95% pointwise confidence band (grey area) was computed under the assumption that the p-values were drawn independently from a uniform [0, 1] distribution.

Simulation of power

In all considered scenarios, the power of MR-Mediation decreased as the basis number increased when the gene size was relatively small, and remained the similar power as the basis number increased when the gene size was large (Figures S17–S28). When the basis number was 4, MR-Mediation was the most powerful method in all scenarios for testing IE and more powerful than the mean CpG methylation method in almost all scenarios for testing TE and DE (). In real data studies, one could be more interested in testing IE than TE and DE. Thus, MR-Mediation with the basis number equal to 4 is recommended.

Figure 3. Power comparison for continuous outcome. (a) All positively effective CpG sites (100%+) and (b) 50% positively and 50% negatively effective CpG sites (50%+/50%–). The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20, or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach.

Figure 3. Power comparison for continuous outcome. (a) All positively effective CpG sites (100%+) and (b) 50% positively and 50% negatively effective CpG sites (50%+/50%–). The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20, or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach.

Figure 4. Power comparison for binary outcome. (a) All positively effective CpG sites (100%+) and (b) 50% positively and 50% negatively effective CpG sites (50%+/50%–). The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20, or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach.

Figure 4. Power comparison for binary outcome. (a) All positively effective CpG sites (100%+) and (b) 50% positively and 50% negatively effective CpG sites (50%+/50%–). The first column shows DE results, the second column shows IE results, and the third column shows TE results. Each row shows one gene size (i.e., 10, 15, 20, or 40 CpG sites). Two approaches were compared: (1) MR-Mediation; and (2) Mean CpG methylation approach.

Results in EVA-PR

We used the proposed MR-Mediation method to study whether the association between exposure to gun violence and atopic asthma was mediated by DNA methylation, where atopic asthma is a binary outcome. In this study, we considered 5720 genes including at least 10 CpG sties in 407 subjects without any missing covariates. Note that subjects who had asthma but no atopy were excluded from this analysis, as atopic asthma is the most common type of asthma in children. This analysis was adjusted for age, sex, annual household income (a measure of socioeconomic status), the top five principal components from genotypic data, methylation batch, and latent factors between exposure to gun violence and methylation, and between atopic asthma and methylation, estimated from sva[Citation38]. The mediator regression with four cubic B-spline basis functions (Formula (3)) was run only for controls to account for the case-control design and we selected the top 20 genes for further mediation analysis. We focused on the IE test to see if the association between exposure to gun violence and atopic asthma was mediated by methylation. We found that CFD on chromosome 19 was the top gene associated with exposure to gun violence (P = 1.93 × 10−5 and FDR = 0.1102, Table S1), although it did not reach the genome-wide significance level, and then it could mediate the effect of exposure to gun violence on atopic asthma risk (P = 3.09 × 10−2). The CFD gene has been reported to be associated with asthma [Citation39]. The expression of CFD can be reduced by IL-17A, which is required for development of airway hyperresponsiveness (a key intermediate phenotype of asthma). In addition, TBC1D14 (P = 7.28 × 10−3) on chromosome 4, FAM120B (P = 3.33 × 10−2) on chromosome 6, CRB1 (P = 1.75 × 10−2) on chromosome 10, LOC339166 (P = 1.42 × 10−2) on chromosome 17, ZGPAT (P = 2.55 × 10−3) on chromosome 20, MED16 (P = 2.05 × 10−2) on chromosome 19, PLEKHG6 (P = 1.24 × 10−3) on chromosome 12, and TNNI1 (P = 8.85 × 10−3) on chromosome 1 could also mediate the effect of exposure to gun violence on atopic asthma risk at the nominal level in the top 20 genes list.

We also applied MR-Mediation to study whether the association between exposure to gun violence and total IgE was mediated by DNA methylation, where total IgE is a continuous outcome. We also used 5720 genes including at least 10 CpG sties in 473 subjects without any missing covariates. This analysis was adjusted for age, sex, annual household income, the top five principal components from genotypic data, methylation batch, and latent factors between exposure to gun violence and methylation, and between total IgE and methylation, estimated from sva[Citation38]. We selected the top 20 genes by p-values from the association between exposure to gun violence and gene-based methylation (Formula (3)) and conducted mediation analysis (Formula (4)). Again, we focused on the IE test to see if the association between exposure to gun violence and total IgE was mediated by methylation. The results showed that PIP5K1C (P = 4.33 × 10−3) on chromosome 19 could mediate the effect of exposure to gun violence on total IgE change at the nominal level in the top 20 gene list (Table S2).

Discussion

In this study, we developed a novel causal mediation approach, MR-Mediation, under the counterfactual framework to test the significance of DE, IE, and TE. We implemented MR-Mediation in R (http://www.r-project.org) and the R package (https://cran.r-project.org/web/packages/MRmediation/index.html) is available. The major advantage of the counterfactual framework to mediation analysis is that it allows the decomposition of TE into DE and IE, even in models with nonlinearities and interactions. The counterfactual framework extends the widely used Baron and Kenny mediation approach by allowing an interaction term between exposure and mediator in the outcome regression.

We focused on testing the significance of DE, IE, and TE instead of estimating their effects. For example, in IE, we test H0:βM=βXM=0. When we reject the null hypothesis, it is possible that βM=0 and βXM0, which could not be known from the proposed test. Then, the estimates of βM may not be reasonable to be used in calculating IE. The parameters of covariates even need to be significant to calculate the estimates of DE and TE. In this approach, a group of CpG sites from a predefined region are utilized as the mediator, and the functional transformation is used to reduce the possible high dimension in the region-based CpG sites and account for their location information. The region-based analysis methods can improve power by combining weak signals and by reducing the multiple testing penalty. Users can use different ways to define regions of interest in addition to genes used in this study, such as CpG islands and CpG shores.

For binary outcomes, we used the rare disease assumption, so that the OR in the case-control design is equal to the OR in the population, which is approximately equal to the RR in the population. When the disease is common in the population, the OR does not approximate the RR. In that case, a log link should be used in the generalized linear model for binary outcomes instead of a logit link, but the log link often fails to work with a binomial distribution. The Zou’s modified Poisson regression [Citation40] could be applied in this situation, although this is an approximate approach.

In the simulation studies, we show that MR-Mediation retains the correct Type I error rate for testing DE, IE, and TE for both continuous and binary outcomes. In power comparison, we are more interested in IE, and MR-Mediation with basis = 4 achieves better power than the mean CpG methylation approach. Our real data results show that MR-Mediation could help to identify potential methylation mediated effects. The genome-wide data analysis could be completed within an hour using one CPU.

Since MR-Mediation is a joint significance test, the exposure and regional methylation relationship and the regional methylation and outcome relationship are tested in two steps. It is possible that CpG sites associated with the exposure and CpG sites associated with the outcome are different, which cannot be detected by the region-based mediator method. A further mediation analysis using single CpG sites may be needed. The single CpG site mediation function is also available in our R package. In addition, the summarized methylation level in the region can be viewed as the area under the curve. However, the areas could be still the same, even if curves are different. In future studies, we will consider to automatically partition the area into several sub-areas and each sub-area has a summarized methylation value. In such methods, αX, βM, and βXM are vectors instead of scalars as in this study. It is possible that αXβM+x1βXM=0, even if αX0 and βM0 or βXM0. We will need to address this potential problem.

MR-Mediation can be extended to high dimensional predictors, such as multiple SNPs from a region, so as to investigate DNA methylation-mediated genetic effects on an outcome. These SNPs can be from the same region as the CpG sites. Such region-based approach on both SNPs and CpG sites can boost statistical power, because the region-based approach combines small individual effects and reduces multiple testing. Moreover, the region-based CpG sites can also be treated as predictors and gene expression can be treated as a mediator in order to understand the regulatory mechanisms of DNA methylation on gene expression that further affect diseases or traits.

Disclosure of Potential Conflict of Interest

No potential conflicts of interest were disclosed.

Supplemental material

Supplemental Material

Download PDF (6.3 MB)

Supplementary material

Supplemental data for this article can be accessed here.

Additional information

Funding

This research was supported by grant HL138098 from the U.S. National Institutes of Health (NIH) to Q.Y. J.C.C. was supported by grants HL079966, HL117191, MD011764 and HL150431 from the U.S. NIH. W.C. was supported by grant HL150431 from the U.S. NIH;National Heart, Lung, and Blood Institute [HL079966];National Heart, Lung, and Blood Institute [HL138098];National Heart, Lung, and Blood Institute [HL150431];National Heart, Lung, and Blood Institute [HL117191];National Heart, Lung, and Blood Institute [HL150431];National Institute on Minority Health and Health Disparities [MD011764];

References

  • Laurent L, Wong E, Li G, et al. Dynamic changes in the human methylome during differentiation. Genome Res. 2010;20:320–331.
  • Portela A, Esteller M. Epigenetic modifications and human disease. Nat Biotechnol. 2010;28:1057–1068.
  • Melotte V, Lentjes MH, Van Den Bosch SM, et al. N-Myc downstream-regulated gene 4 (NDRG4): a candidate tumor suppressor gene and potential biomarker for colorectal cancer. J Natl Cancer Inst. 2009;101:916–927.
  • Schmidt B, Liebenberg V, Dietrich D, et al. SHOX2 DNA methylation is a biomarker for the diagnosis of lung cancer based on bronchial aspirates. BMC Cancer. 2010;10:600.
  • Jain S, Chen S, Chang KC, et al. Impact of the location of CpG methylation within the GSTP1 gene on its specificity as a DNA marker for hepatocellular carcinoma. PLoS One. 2012;7:e35789.
  • Lord J, Cruchaga C. The epigenetic landscape of Alzheimer’s disease. Nat Neurosci. 2014;17:1138–1140.
  • De Jager PL, Srivastava G, Lunnon K, et al. Alzheimer’s disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nat Neurosci. 2014;17:1156–1163.
  • Lunnon K, Smith R, Hannon E, et al. Methylomic profiling implicates cortical deregulation of ANK1 in Alzheimer’s disease. Nat Neurosci. 2014;17:1164–1170.
  • Pidsley R, Viana J, Hannon E, et al. Methylomic profiling of human brain tissue supports a neurodevelopmental origin for schizophrenia. Genome Biol. 2014;15:483.
  • Jaffe AE, Gao Y, Deep-Soboslay A, et al. Mapping DNA methylation across development, genotype and schizophrenia in the human frontal cortex. Nat Neurosci. 2016;19:40–47.
  • Pedersen BS, Schwartz DA, Yang IV, et al. Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values. Bioinformatics. 2012;28:2986–2988.
  • Butcher LM, Beck S. Probe Lasso: a novel method to rope in differentially methylated regions with 450K DNA methylation data. Methods. 2015;72:21–28.
  • Eckhardt F, Lewin J, Cortese R, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet. 2006;38:1378–1385.
  • Irizarry RA, Ladd-Acosta C, Carvalho B, et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 2008;18:780–790.
  • Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33:Suppl:245–54.
  • Irizarry RA, Ladd-Acosta C, Wen B, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet. 2009;41:178–186.
  • Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322.
  • Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51:1173–1182.
  • Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods. 2010;15:309–334.
  • Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155.
  • Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol. 2010;172:1339–1348.
  • Bozack AK, Cardenas A, Quamruzzaman Q, et al. DNA methylation in cord blood as mediator of the association between prenatal arsenic exposure and gestational age. Epigenetics. 2018;13:923–940.
  • Cardenas A, Lutz SM, Everson TM, et al. Mediation by placental DNA methylation of the association of prenatal maternal smoking and birth weight. Am J Epidemiol. 2019;188:1878–1886.
  • Barfield R, Shen J, Just AC, et al. Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol. 2017;41:824–833.
  • Tobi EW, Slieker RC, Luijk R, Biobank-based Integrative Omics Studies C, Slagboom PE, van Zwet EW, Lumey LH. et al. DNA methylation as a mediator of the association between prenatal adversity and risk factors for metabolic disease in adulthood. Sci Adv. 2018;4:eaao4364.
  • Yang CF, Karmaus WJJ, Yang CC, et al. Bisphenol a exposure, DNA methylation, and asthma in children. Int J Environ Res Public Health. 2020;17:298.
  • Neophytou AM, Oh SS, Hu D, et al. In utero tobacco smoke exposure, DNA methylation, and asthma in Latino children. Environ Epidemiol. 2019;3:e048.
  • Wu D, Yang H, Winham SJ, et al. Mediation analysis of alcohol consumption, DNA methylation, and epithelial ovarian cancer. J Hum Genet. 2018;63:339–348.
  • Battram T, Richmond RC, Baglietto L, et al. Appraising the causal relevance of DNA methylation for risk of lung cancer. Int J Epidemiol. 2019;48:1493–1504.
  • Fan R, Wang Y, Mills JL, et al. Functional linear models for association analysis of quantitative traits. Genet Epidemiol. 2013;37:726–742.
  • Forno E, Wang T, Qi C, et al. DNA methylation in nasal epithelium, atopy, and atopic asthma in children: a genome-wide study. Lancet Respir Med. 2019;7:336–346.
  • Ramsay JO, Silverman BW. Functional data analysis. New York: Springer; 1996.
  • Valeri L, Vanderweele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;18:137–150.
  • Sternthal MJ, Jun HJ, Earls F, et al. Community violence and urban childhood asthma: a multilevel analysis. Eur Respir J. 2010;36:1400–1409.
  • Suglia SF, Ryan L, Wright RJ. Creation of a community violence exposure scale: accounting for what, who, where, and how often. J Trauma Stress. 2008;21:479–486.
  • Thomson CC, Roberts K, Curran A, et al. Caretaker-child concordance for child’s exposure to violence in a preadolescent inner-city population. Arch Pediatr Adolesc Med. 2002;156:818–823.
  • Ramratnam S, Han Y, Rosas-Salazar C, et al. Exposure to gun violence and asthma among children in Puerto Rico. Respir Med. 2015 In Press;109:975–981. .
  • Leek JT, Johnson WE, Parker HS, et al. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883.
  • Mathews JA, Wurmbrand AP, Ribeiro L, et al. Induction of IL-17A precedes development of airway hyperresponsiveness during diet-induced obesity and correlates with complement factor D. Front Immunol. 2014;5:440.
  • Zou G. A modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol. 2004;159:702–706.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.