874
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Harnessing the Power of Twins in Epigenetic Association Studies: Causal Inference and More

ORCID Icon
Pages 1-3 | Received 26 Nov 2019, Accepted 27 Nov 2019, Published online: 13 Dec 2019

The literature of epigenome-wide association studies (EWAS) is growing rapidly in the field of biomedical research. Lots of sites have been identified in significant association with clinical and health traits of interest, with the number of publications increasing nearly exponentially since 2010 [Citation1]. The reported sites are usually claimed as epigenetic biomarkers of biological significance or as potential targets for intervention. It is, however, unfortunate that one important conceptual issue has been missing in making these claims: a significant association does not guarantee a causal effect but just a correlation. Although the hypothesis-free EWASs enable us comprehensive and perhaps unbiased analysis of the epigenome, they are observational studies by nature. In epidemiology, observational studies are considered to have less probative force due to inherent limitation in controlling confounding factors that influence both explanatory (exposure) and (clinical or health) outcome variables, a situation that can result in biased, confusing and even misleading results [Citation2]. Indeed, it is true that association does not prove causation but it is not true that association refutes causation. Lots of efforts have been taken in inferring causality from observational studies [Citation3] and some of them have been proven effective and valuable. For example, the propensity score matching which uses fitted logistic probability for matching samples between comparison groups to approximate randomized controlled trail [Citation4], which is the gold standard for assessing causality in biomedicine.

Among the different causal inference methods, the genetically informed method is a promising approach that fosters efficient causal assessment [Citation5]. By engaging genetically related individuals such as twins, siblings and family members, unobserved genetic and familiar environmental confounding can be controlled to achieve or approach exchangeability, which is essential in ensuring consistent causal inference. In the case of using identical or monozygotic (MZ) twin design [Citation6,Citation7], exchangeability can be sufficiently approximated, an advantage of their perfect sharing of DNA sequence variations and rearing environments. As a matter of fact, the efficient control of genetic and common environmental factors in the MZ twin design enables significant enrichment of statistical power in EWAS, as revealed by our recent computer simulation study [Citation8]; one good reason for the popularity of using disease-discordant MZ twin pairs in epigenetic association studies. Here, it is necessary to point out that the matching-out of genetic and nongenetic (common-exposure) variables in the discordant MZ twin design does not mean that the effects of such variables on disease can no longer be assessed, which is described in the current literature as a limitation of the design [Citation5]. By simple mathematics, we show that the inclusion of pair-specific or common-exposure variables in the analysis of MZ twin data allows estimation of valuable interaction effects with the disease. For example, taking a pair-specific variable age, if there is an age-specific effect of DNA methylation (Me) on the diseases (i.e. an interaction effect), we have, for the healthy (-) twin,logMe(-)=α0+β1age+ɛ-.

For the affected (+) twin,logMe(+)=α0+α++(β1+β1+)age+ɛ+.

Here, ε is a random error term, α0 is the mean methylation level in healthy controls, α+ is the mean methylation difference between affected and unaffected healthy twins; β1 is the main effect of age on DNA methylation and β1+ represents additional age effect on DNA methylation, specifically in the affected twins, in other words, an interaction effect. Taking the intrapair difference, we have:log[Me(+)Me(-)]=α++β1+age+ɛ.

In the above model, an estimated β1+ significantly >0 or <0 indicates that the differential DNA methylation between disease and control twins increases or decreases with increasing age. Another simple example, the male twin pairs can be, on average, more discordant than the female pairs if coefficient of sex (male = 0, female = 1) is significantly <0, an indication of sex-dependent effect of DNA methylation on the disease. Even more interestingly, when a genotype at a specific locus is available and included as a pair-specific variable, its significant estimate could suggest a methylation quantitative trait locus associated with the disease.

The frequent use of twins – especially MZ twins – in epigenetic association studies indeed helps to control unobserved confounding factors and minimizes false positive findings [Citation9], but it does not guarantee causality as in randomized controlled trail because of potential confounding factors beyond the control of the MZ twin design, for example, nonshared or individual environmental factors. Moreover, even if a causal relationship is established, the statistical models for association analysis do not provide direction of causation – an important consideration due to possible reverse causation. The latter can be avoided when the exposure variable is free from reverse like a germline genetic variant in genome-wide association study, but unfortunately, not in EWAS. In fact, the specific genetic variants a person is born with serve as instrumental variables in Mendelian randomization, a well-known approach for causal inference from observational studies [Citation10]. Determining the direction of causation is of high clinical and etiological relevance. For example, in aging research, one could easily ask if an identified age-dependent methylation change at a genomic site is the cause of aging or just in response to aging. Through examining the cross-trait cross-pair correlation, Hopper and colleagues performed inference on causation from examination of familial confounding (ICE FALCON) using regression analysis assigning the co-twin as a ‘negative control’ [Citation11]. If the associations between the outcome of twin A and the predictors of both the twin A and co-twin B remain unchanged before and after adjusting for each other, then no evidence of causal relationship is given. On the other hand, if there is a significant attenuation of the cross-trait cross-twin association after conditioning on twin A or self, there is an evidence ‘consistent with’ some causation. ICE FALCON is analogous to Mendelian randomization in causal assessment. The latter requires genotype data but can be applied to unrelated individuals. ICE FALCON can use twin (monozygotic or dizygotic), sibling or other relative pairs with efficiency of inference intuitively decreasing with reduced level of relatedness. The method has been recently applied to EWASs and reported causal effects of smoking [Citation12] and body mass index [Citation13] on site-specific DNA methylation variations. We have applied ICE FALCON to infer causality between DNA methylation and gene expression in MZ twins and discovered a large number of genes with promoter methylation displaying causal effects on expression activity.

Given the frequent use of the twin design in EWAS, it is highly recommended that, by taking further advantage of using twins, causal inference be performed on important sites to establish causal relationships before claiming them as molecular targets because noncausal markers are meaningless for clinical or preventive intervention, although they could be used for prediction.

Financial & competing interests disclosure

The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

References

  • Li M , ZouD , LiZet al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res.47(D1), D983–D988 (2019).
  • Rush CJ , CampbellRT , JhundPS , PetrieMC , McMurrayJJV. Association is not causation: treatment effects cannot be estimated from observational data in heart failure. Eur. Heart J.39(37), 3417–3438 (2018).
  • Listl S , JürgesH , WattRG. Causal inference from observational data. Community Dent. Oral Epidemiol.44(5), 409–415 (2016).
  • Nguyen MH , YangHI , LeAet al. Reduced incidence of hepatocellular carcinoma in cirrhotic and noncirrhotic patients with chronic hepatitis B treated with Tenofovir – a propensity score-matched study. J. Infect. Dis.219(1), 10–18 (2019).
  • Pingault JB , O’ReillyPF , SchoelerT , PloubidisGB , RijsdijkF , DudbridgeF. Using genetic data to strengthen causal inference in observational research. Nat. Rev. Genet.19(9), 566–580 (2018).
  • McGue M , OslerM , ChristensenK. Causal inference and observational research: the utility of twins. Perspect. Psychol. Sci.5(5), 546–556 (2010).
  • Tan Q . The epigenome of twins as a perfect laboratory for studying behavioural traits. Neurosci. Biobehav. Rev.107, 192–195 (2019).
  • Li W , ChristiansenL , HjelmborgJ , BaumbachJ , TanQ. On the power of epigenome-wide association studies using a disease-discordant twin design. Bioinformatics34(23), 4073–4078 (2018).
  • Tan Q , ChristiansenL , vonBornemann Hjelmborg J , ChristensenK. Twin methodology in epigenetic studies. J. Exp. Biol.218, 134–139 (2015).
  • Koellinger PD , de VlamingR. Mendelian randomization: the challenge of unobserved environmental confounds. Int. J. Epidemiol.48(3), 665–671 (2019).
  • Dite GS , GurrinLC , ByrnesGBet al. Predictors of mammographic density: insights gained from a novel regression analysis of a twin study. Cancer Epidemiol. Biomarkers Prev.17(12), 3474–3481 (2008).
  • Li S , WongEM , BuiMet al. Causal effect of smoking on DNA methylation in peripheral blood: a twin and family study. Clin. Epigenetics10, 18 (2018).
  • Li S , WongEM , BuiMet al. Inference about causation between body mass index and DNA methylation in blood from a twin family study. Int. J. Obes. (Lond.)43(2), 243–252 (2019).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.