652
Views
1
CrossRef citations to date
0
Altmetric
Research paper

Spatiotemporal specificity of correlated DNA methylation and gene expression pairs across different human tissues and stages of brain development

, , , , , , , & ORCID Icon show all
Pages 1110-1127 | Received 26 May 2021, Accepted 08 Oct 2021, Published online: 21 Oct 2021

ABSTRACT

DNA methylation (DNAm) that occurs on promoter regions is primarily considered to repress gene expression. Previous studies indicated that DNAm could also show positive correlations with gene expression. Both DNAm and gene expression profiles are known to be tissue- and development-specific. This study aims to investigate how DNAm and gene expression are coordinated across different human tissues and developmental stages, as well as the biological significance of such correlations. By analyzing 2,239 samples with both DNAm and gene expression data in the same human subjects obtained from six published datasets, we evaluated the correlations between gene and CpG pairs (GCPs) at cis-regions and compared significantly correlated GCPs (cGCPs) across different tissues and brains at different age groups. A total of 37,363 cGCPs was identified in the six datasets; approximately 38% of the cGCPs were positively correlated. The majority (>90%) of cGCPs was tissue- or development-specific. We also observed that the correlation direction can be opposite in different tissues and ages. Further analysis highlights the importance of cGCPs for their cellular functions and potential roles in complex traits and human diseases. For instance, the early developmental brain possessed a highly unique set of cGCPs that were associated with neurogenesis and psychiatric disorders. By assessing the epigenetic factors involved in cGCPs, we discovered novel regulatory mechanisms of positive cGCPs distinct from negative cGCPs, which were related to multiple factors, such as H3K27me3, CTCF, and JARD2. The catalogue of cGCPs compiled can be used to guide functional interpretation of genetic and epigenetic studies.

Introduction

DNA methylation (DNAm) at CpG sites is a major epigenetic marker that regulates gene expression and thus participates in many essential biological processes and human diseases [Citation1–4]. DNAm is commonly understood to suppress gene expression by blocking the binding sites of transcription factors at gene promoter regions [Citation5,Citation6]. However, studies revealed that DNAm in the gene body could be positively correlated with gene expression [Citation7,Citation8]. Genome-wide evaluation of DNAm-expression associations also supported the presence of positive correlations [Citation9–12]. There is multiple evidence suggesting a spatiotemporal relationship between DNAm and gene expression. DNAm is reported to be development- or tissue-specific and correlated with the expression of corresponding genes [Citation13–18]. Comparison of methylation patterns in fibroblasts, T-cells, and lymphoblastoid cells suggested that differentially methylated regions during development could contribute to delineating gene expression in different cell types [Citation19,Citation20]. Bonder et al. [Citation21]. assessed this correlation by analysing 158 liver samples from two studies and compared the liver to the muscle and adipose tissues. Their results indicated more negative DNAm-expression correlations than positive correlations and the correlations showed strong tissue-specificity. However, the developmental specificity of the DNAm-expression relationships has not been studied to date. Moreover, these studies did not examine the implications of DNAm in cellular functions nor the connection between DNAm-expression correlations and human diseases.

The DNAm-expression relationship is affected by numerous biological factors. The relative location of CpG within the gene body may influence regulation of expression and thus DNAm-gene expression relationships. It has been reported that CpGs on promoter regions are negatively correlated with gene expression, while those on gene bodies function quite the opposite [Citation2]. DNAm on CpG islands (CGI) has been found to be associated with transcriptional repression and gene silencing [Citation2,Citation22]. In addition, some transcription factors (TFs) such as RFX5 preferred DNAm and could interact with methylated-DNA, while some others such as NRF1 have been reported to be more sensitive to DNAm and only bind unmethylated sequences, thereby affecting the gene expression [Citation22–25]. A comprehensive epigenomic study revealed that chromatin state could also influence DNAm and gene expression [Citation26]. Accessible chromatin, marked by promoter DNAm depletion, histone acetylation, and H3K4 methylation, is believed to facilitate transcription [Citation27–29]. In contrast, histone modification such as H3K9me3 is linked to the establishment of DNAm and is involved in heterochromatin and long-term gene repression [Citation30,Citation31]. Furthermore, a study of blood found that CpGs negatively correlated with gene expression are enriched in transcriptionally active regions, such as enhancers, while CpGs positively correlated with gene expression occur in repressed regions, such as Polycomb-repressed regions [Citation32]. Chromatin structure alterations can also change the relationship between DNAm and gene expression [Citation33]. Taken together, numerous factors might influence DNAm-gene expression relationships. Yet, the exact epigenetic mechanisms of the positive or negative correlation of DNAm-expression remain elusive.

In this study, we intend to confirm the spatiotemporal specificity of the correlated relationships between DNAm and gene expression in multiple tissues, particularly brain tissue that showed highly dynamic changes of gene expression and DNAm during development [Citation34,Citation35], and to explore its biological significance as related to disease association, and to identify epigenomic features related to positively and negatively correlated DNAm-expression relationships. Spatial specificity refers to DNAm-expression correlations that can only be detected in specific tissues but not the others, while temporal specificity refers to correlated relationships that change with development and aging processes. We constructed gene and CpG pairs (GCPs) using gene expression and DNAm data from the same human subject available in six public datasets () and further compared significantly correlated GCPs (cGCPs, also referred as eQTMs in several previous publications [Citation11,Citation12,Citation19,Citation21]) firstly across a group of different tissues (brain, liver, monocytes, and T-cells), and secondly across exclusively brain tissue at three different age groups (‘developmental brain’ with age ranging from foetus to age 25 years old (y.o.); ‘aging brain’ with age of 25 y.o. to 78 y.o.; and ‘aged brain’ from all senior donors with age at 88 ± 7 y.o. and limited age variation). Gene functions and disease associations of these cGCPs were subsequently analysed. Finally, we investigated the enrichment of epigenomic features on both positively and negatively correlated GCPs to identify the potential driver of opposite correlations. Our genome-wide analysis revealed unique DNAm-expression profiles in different tissues and stages of brain development consistent with expected tissue-specific functions and developmental processes.

Figure 1. The overview of study design. (a) Design of the study. (b) Gene and CpG pairs (GCPs) are defined as CpG sites located within the 10kb flanking region of the corresponding gene.

Figure 1. The overview of study design. (a) Design of the study. (b) Gene and CpG pairs (GCPs) are defined as CpG sites located within the 10kb flanking region of the corresponding gene.

Materials and methods

Data collection

To decipher the relationships between DNAm and gene expression in different tissues, we focused on the human tissues, which had both methylation and matched expression data available in various databases and publications. Aged brain dorsolateral prefrontal cortex (DLPFC) data were obtained from the Rush University’s Religious Orders Study and Memory and Ageing Project (ROSMAP) study (https://www.synapse.org/#!Synapse:syn3219045)[Citation36]. Monocyte and T-cell data were both obtained from the MESA Epigenomics and Transcriptomics Study [Citation17]. Liver data were obtained from Horvath et al. [Citation37]. To investigate the relationships between DNAm and gene expression across different ages, we used developmental and ageing brain DLPFC data from Jaffe et al., with samples across the developmental stages ranging from prenatal to later postnatal life [Citation38]. The developmental brain was defined as stages from early foetal to 25 years old when brain development is complete [Citation39] and the ageing brain was from 25 years old to aged stages. The data source, sample size, and platform for each dataset are summarized in Supplemental Table S1.

Data preprocessing

DNA methylation measurements were performed using Illumina HumanMethylation450 BeadChips to interrogate more than 485,000 methylation sites in all datasets. Raw idat files were available for aged, developmental, and ageing brain; raw intensity files were available for monocyte, T-cell, and liver. For methylation data analysis, we used the ChAMP package (Version 1.8.2) implemented in R for filtering and normalization. We first defined the probe with a detection p-value of more than 0.01 as an ‘absent’ signal and the probe with a bead number less than 3 as a ‘failed’ probe. The samples with ‘absent’ probes (more than 1% of total probes), the low-quality probes with ‘absent’ signal (more than 10%), and ‘failed’ probes (more than 5% across samples) were removed. The probes containing common SNPs as identified in Zhou et al. [Citation40] or mapped to multiple genome locations as identified in Nordlund et al. [Citation41] were also discarded. Next, the beta value was used as the methylation measurement, and the beta-mixture quantile normalization (BMIQ) method [Citation42] was used to adjust the bias between type I and type II probes.

The expression data were collected from different platforms containing microarrays and RNA-seq. All platforms were able to capture expression of more than 26,000 unique Refseq genes. There is no systematic bias in terms of coverage of genes or pathways. For the expression data from different platforms, we used different values as the expression measurement. Fragments per kilobase million (FPKM) was used to normalize RNA-seq data of the aged brain for correcting sequencing depth and the length of the gene. Genes were selected based on the expression thresholds of >0.1 FPKM in at least 10% of the samples for further analysis. For microarray expression data, filtering, normalization, and summarization were performed based on the platforms. For monocyte and T-cell data, we removed probes with a detection p-value above 0.06 in more than 20% of the samples, and samples with more than 20% of the absent probes were filtered out. The quantile normalized values were used for monocyte and T-cell data. For liver data, we used the R package oligo (Version 1.36.1) to get the log2 RMA normalization for the gene-level expression values from CEL files. For developmental and aging brain data, the least normalized log2 (sample/reference) ratios were used to quantify gene expression. Afterwards, for the convenience of comparison, only protein-coding genes annotated in Gencode V19 [Citation43] were kept.

For both expression and methylation datasets, several methods were used for quality control. Principal component analysis (PCA) [Citation44] and hierarchical cluster analysis was used to filter outliers of samples. ComBat [Citation45] was used to control for position and batch effects. Moreover, surrogate variable analysis (SVA) [Citation46] was applied to identify surrogate variables, and the confounding effects of known and unknown covariates were evaluated using principal variance component analysis (PVCA) [Citation47] and then regressed out on the data with the exception of age for the developmental and aging brain data. To target these better-powered high-variable probes, we remove half of the less variable methylation probes by filtering out 50% of CpG probes based on a soft threshold of median absolute deviation (MAD) for each data.

Statistical analysis of Gene and CpG pairs (GCPs)

For each region, we defined pairs of methylation CpG site and gene expression as CpG sites located within the 10kb flanking region of the corresponding gene (). Spearman’s rank correlation test was used to assess the correlation of methylation-expression pairs. Furthermore, we used the Benjamini–Hochberg (BH) method to correct for multiple hypothesis testing. We defined the significant threshold as a BH adjusted p-value <0.05.

Cross- and downsampling- validation of cGCPs

To validate the power of the GCPs analysis, we randomly split each dataset into two-thirds percent as ‘discovery data’ and one-third percent as ‘replication data.’ We performed the GCPs correlation test in discovery data and replication data independently as described above. We then calculated the Pearson correlation of the cGCPs’ rho value between discovery data and replication data. In total, we repeated the analysis one hundred times and found the Pearson correlation between discovery data and replication data to be very high (mean: 0.90 ~ 0.95, Supplemental Fig. S1A, Supplemental Table S3).

Because the sample sizes between each dataset differed (Supplemental Table S1), we randomly selected 75 individuals (the size of the liver dataset) from the aged brain, monocyte, and T-cell groups, and performed cGCPs correlation tests with these downsampled data sets to match the liver sample size. The Pearson correlation of the cGCPs’ rho value was highly correlated between the full and downsampled populations (mean: 0.86 ~ 0.96, Supplemental Fig. S1B, Supplemental Table S3).

Functional annotation of cGCPs

For the genes of cGCPs, Gene Ontology (GO) enrichment was completed by the WEB-based GEne SeT AnaLysis Toolkit (WebGestalt) [Citation48]. Biological processes, cellular components, and molecular functions were tested using the overrepresentation enrichment analysis (ORA) method and genome protein-coding as a reference gene set. The BH method was used for multiple testing adjustment.

To explore the role of cGCPs in complex diseases and traits, we annotated the genes and CpGs of cGCPs using GWAS and EWAS results, respectively. For the genes of cGCPs, we performed the enrichment analysis of GWAS catalogue of reported gene sets using FUMA GWAS v1.3.6a [Citation49]. The minimum overlapping genes with gene-sets was set to 2. Protein-coding genes were used as background genes and the BH method was used for multiple testing corrections. For the CpGs of GCPs, we downloaded the EWAS results (updated 03–07-2019) from the EWAS catalogue [Citation50] (http://www.ewascatalog.org). We used the results with p < 2.4 × 10–7 for the 450k array [Citation51]. We tested the CpG sets with a minimum number of overlapped CpGs to 2. Hypergeometric tests were performed to test if the CpGs were overrepresented in any of the EWAS CpG sets. Significant enrichment was defined as BH adjusted p-value <0.05.

Epigenomic features of cGCPs

The CpGs and genes involved in cGCPs were annotated using multiple epigenomic features. CpGs of cGCPs were defined accordingly: (1) by location relative to CpG Islands, CpGs were classified as CpG island (CGI), CGI shores (0–2,000 bp up- or down-stream of CGI), CGI shelves (2,000–4,000 bp up- or down-stream of CGI), and open sea region (>4,000 bp up- or down- stream away from CGI); (2) by location relative to the matched gene, CpGs were classified as gene promoter (2kb upstream to transcription start sites (TSS) defined in Gencode v.19), gene body region (TSS to transcription end site (TES)), outgene region (out of promoter and gene body). To study the contribution of upstream and downstream regions of the gene body on cGCPs, the intergenic regions were further classified into upstream and downstream regions relative to the gene body. Fold enrichment was tested against the total tested GCPs in each tissue. The significance of enrichment was determined by the hypergeometric test.

To assess CpGs for epigenetic state enrichment, we downloaded the histone marks and DNaseI hypersensitivity sites (DHS) data of the matched tissues from the Roadmap Epigenomics Project [Citation26] (http://www.roadmapepigenomics.org) and the ENCODE Project [Citation52] (https://www.encodeproject.org) (Supplemental Table S20). In addition, the 15-epigenetic state model of the matched tissue from the Roadmap Epigenomics Project was used to generate the epigenetic state table (Supplemental Table S20). Using the CpGs of total GCPs analysed in each tissue as a background, we performed the epigenetic state enrichment of the CpGs from cGCPs using the R LOLA package [Citation53].

We downloaded the Hi-C defined topological domains (TADs) of H1-hESC for GCPs annotation. The data were downloaded from Dixon et al. [Citation54], and the coordinates were converted to hg19 using UCSC’s liftOver tool by Ho et al. [Citation55]. The cGCPs within the same TAD were annotated using bedtools (v2.17.0) [Citation56].

To test the effect of transcription factors (TFs) on cGCPs, we downloaded the human cistromes which provide the genome-wide maps of regions bound by TFs from Vorontsov et al. [Citation57]. To achieve the highest reliability, we only used the cistrome regions detected in at least two experimental data sets and by at least two peak calling tools. A total number of 138 TFs were tested using the R LOLA package [Citation53]. Q-value in R q-value package was used to control for the false discovery rate. Significant TF enrichment was defined as q-value <0.1.

To estimate the importance of functional features on cGCPs, the random forest algorithm, implemented by the R package ‘randomForest,’ was applied to pcGCPs and ncGCPs, respectively. Feature importance was tested on the basis of the average decrease in accuracy by permuting each variable.

Genetic control of the cGCPs

To study whether the cGCPs were related to the genetic factors as that can be detected by methylation quantitative trait loci (mQTL) and expression quantitative trait loci (eQTL), we downloaded both mQTL and eQTL data of brain, liver, monocytes, and T-cells from previous publications (Supplemental Table S21). First, we tested the enrichment of cGCPs against mGTLs using the exact CpG sites, considering the total tested CpGs in each tissue separately. Second, we tested the enrichment of cGCPs against eQTLs by the genes considering all the tested genes in each tissue separately. Finally, we tested the enrichment of cGCPs against eQTLs and mQTLs pairs linked by the exact SNPs considering all the tested GCPs in each tissue separately. The significance of enrichment was determined by the hypergeometric test p values.

Results

More than 30% of the cGCPs were positively correlated in different tissues and developmental brain

We collected six datasets that have both methylation and matched expression data. After quality evaluation and filtering, we retained the data of 502 brains (also referred to as brain-aged group), 75 livers, 214 T-cell, 1201 monocyte samples from adult human subjects, as well as 129 human brains from donors that were foetus to 25 years old, referred to as the brain developmental group, and 118 human brains from donors that were 25 years old to 78 years old, referred to as the brain ageing group (Supplemental Table S1, ). DNA methylation measurements were performed using Illumina HumanMethylation450 BeadChips to interrogate more than 485,000 CpG sites in all datasets. CpGs with variations under 50% of the probes were filtered out based on the median absolute deviation (MAD) of DNAm levels to focus GCP analysis on highly variable loci. The MAD thresholds for each data range from 0.01 in T-cells to 0.02 in liver. The numbers of CpGs and genes used in the analysis are shown in (criteria described in Methods).

Table 1. Summary of significantly correlated GCPs (BH-adjusted p-value <0.05) in each tissue. GCPs, gene and CpG pairs; cGCPs, correlated GCPs; ncGCPs, negatively correlated GCPs; pcGCPs, positively correlated GCPs.

GCPs were defined as CpG sites located within the 10kb flanking region of the corresponding gene (). As a result, the number of GCPs tested in each tissue varied from 230,087 in the aged brains to 46,831 in monocytes (). Using Spearman’s rank correlation test and the Benjamini–Hochberg (BH) multiple-testing correction, we obtained thousands of cGCPs (BH adjusted p-value < 0.05) in different tissues and stages of the developmental brain (, Supplemental Table S2). The largest number of cGCPs (16,500) was obtained from the brain developmental group and the smallest (261) from the brain ageing group. Both positively correlated GCPs (pcGCPs) and negatively correlated GCPs (ncGCPs) were detected in each dataset (). The pcGCPs accounted for 27%~41% of GCPs across different tissues and stages of the brain (). Cross- and downsampling-validation of cGCPs supported the robustness of the results described herein (Supplemental Table S3, Supplemental Fig. S1A, S1B).

Majority of the cGCPs were tissue-specific

Comparison across different tissues revealed more than 90% of ncGCPs (10,015/11,125) or pcGCPs (6,243/6,696) to be specific to the liver, T-cells, monocytes, or brain (, Supplemental Fig. S2). Genes and CpGs in cGCPs were named cGenes and cCpGs, respectively. Then, Gene Ontology (GO) functional analysis was performed for the cGenes that harboured tissue-specific cGCPs. Different cGene sets showed significant enrichment for tissue-specific functions in the corresponding tissue (, Supplemental Table S4-7). For example, the cGenes in brain were enriched for cell adhesion (BH adjusted p-value = 2.85E-12) in GO biological process (BP) and neuron-specific part (BH adjusted p-value = 2.82E-11) in GO cellular component (CC), innate immune response in monocyte cGenes (BH adjusted p-value = 7.03E-10) in BP, leukocyte activation in T-cell cGenes (BH adjusted p-value = 2.53E-12) in BP, and oxidoreductase activity in liver cGenes (BH adjusted p-value = 3.14E-3) in GO molecular function (MF). The functional annotation of cGenes in cGCPs specific to one tissue was similar to that in total cGCPs in the corresponding tissues (Supplemental Fig. S3, Supplemental Table S8-11). This is mainly the result of the tissue-specific character of the cGCPs.

Figure 2. cGCPs are tissue-specific and the cGenes are enriched in tissue-specific functions. (a) The Venn plot to compare negatively correlated GCPs (ncGCPs) across four tissues. Brain, aged brain. (b) The Venn plot to compare positively correlate GCPs (pcGCPs) across four tissues. (c) The top-ranked GO enrichment items for cGenes across four tissues. The number in parenthesis under tissue name refers to cGCPs count. Mono, monocytes.

Figure 2. cGCPs are tissue-specific and the cGenes are enriched in tissue-specific functions. (a) The Venn plot to compare negatively correlated GCPs (ncGCPs) across four tissues. Brain, aged brain. (b) The Venn plot to compare positively correlate GCPs (pcGCPs) across four tissues. (c) The top-ranked GO enrichment items for cGenes across four tissues. The number in parenthesis under tissue name refers to cGCPs count. Mono, monocytes.

Even though thousands of cGCPs were identified in different tissues, only 208 of them were shared by three or more tissue types; of which 148 were negatively correlated and 60 were positively correlated (). Notably, four cGCPs (CD86-cg04387658, GSTM1-cg10950028, GSTT1-cg04234412, GSTT1-cg17005068) were shared in all tissues and are consistently negatively correlated (, ). The cGenes in the 208 cGCPs shared in at least three tissues were enriched in glutathione derivative metabolic process (BH adjusted p-value = 1.8e-04) in BP, MHC class II protein complex (BH adjusted p-value = 1.13e-06) in CC, and glutathione binding (BH adjusted p-value = 1.36e-04) in MF, which were the top pathways in the three GO categories (Supplemental Fig. S4, Supplemental Table S12). GSTT1 and GSTM1 in cGCPs shared in four tissues were both related to glutathione derivative metabolic process (two of 22 GO-termed genes in total, BH adjusted p-value = 2.24e-02) and glutathione transferase activity (two of 34 GO-termed genes in total, BH adjusted p-value = 2.30e-02).

Figure 3. An example of cGCPs shared across four tissues. Scatter plots of GSTT1 expression level (X-axis) and cg04234412 methylation level (Y-axis) in the aged brain (a), liver (b), monocyte (c), and T-cell (d). Brain, aged brain.

Figure 3. An example of cGCPs shared across four tissues. Scatter plots of GSTT1 expression level (X-axis) and cg04234412 methylation level (Y-axis) in the aged brain (a), liver (b), monocyte (c), and T-cell (d). Brain, aged brain.

cGCPs in brain exhibit age specificity

To capture temporal regulation, we examined the age-dependent GCPs throughout the development and ageing process of the brain. The brain developmental data were collected from individuals with ages ranging from post-conception (14 weeks) to 25 years old when brain development is complete [Citation39], and the brain ageing data were from 25 years old to 78 years old (Supplemental Table S1). In brain developmental and ageing group, 133,901 and 133,218 GCPs were tested and 16,500 and 261 GCPs were significantly correlated, respectively (). Of these cGCPs, 6,803 were pcGCPs in brain developmental group and 71 were pcGCPs in brain ageing group.

Furthermore, we compared the cGCPs across the developmental brain, ageing brain, and the aged brain groups. The majority (96%) of the cGCPs found in the developmental brain were specific to the developmental age group (). The cGenes in the developmental brain were uniquely enriched for neurogenesis (BH adjusted p-value <2.2E-16) and neuron differentiation (BH adjusted p-value <2.2E-16) pathways in BP that were not present in the other brain groups (, Supplemental Table S13). For the brain ageing group, 23% of the cGCPs were specific. No GO terms were enriched for the cGenes of ageing brain data. The functional annotation of specific cGenes from the developmental or aged brain was found to be highly analogous to those of the total cGenes in the corresponding brain group (Supplemental Fig. S5, Supplemental Table S14, S15), which is most likely due to development-specific cGCPs in the brain tissue.

Figure 4. cGCPs in the brain are development-specific and cGenes in the developmental brain are enriched for neurogenesis and neuron differentiation. (a) The Venn plot to compare ncGCPs across the developmental, ageing, and aged brain. devBrain, brain developmental group; agingBrain, brain ageing group; agedBrain, brain aged group. (b) The Venn plot to compare pcGCPs across the developmental, ageing, and aged brain. (c) Comparison of the top-ranked GO enrichment items for cGCPs between the developmental brain and aged brain. The number in parenthesis under stage name refers to cGCPs count.

Figure 4. cGCPs in the brain are development-specific and cGenes in the developmental brain are enriched for neurogenesis and neuron differentiation. (a) The Venn plot to compare ncGCPs across the developmental, ageing, and aged brain. devBrain, brain developmental group; agingBrain, brain ageing group; agedBrain, brain aged group. (b) The Venn plot to compare pcGCPs across the developmental, ageing, and aged brain. (c) Comparison of the top-ranked GO enrichment items for cGCPs between the developmental brain and aged brain. The number in parenthesis under stage name refers to cGCPs count.

Additionally, 92 ncGCPs and 17 pcGCPs were shared across three brain age groups (). For example, PCDHB4 gene expression is negatively correlated with cg08626876 methylation level in developmental, ageing, and aged brain data (Supplemental Fig. S6). The cGenes in these 109 cGCPs were enriched by L-lactate dehydrogenase activity (BH adjusted p-value = 3.16E-02) in MF (Supplemental Table S16), which may imply the importance of brain energy use medicated by DNAm.

The Genes and CpGs of the cGCPs are enriched in traits-relevant tissues

To study the potential role of cGCPs in human diseases, we performed enrichment analysis using genome-wide association study (GWAS) for cGenes and epigenome-wide association study (EWAS) for cCpGs. The enrichments were performed for all phenotypes reported in GWAS and EWAS catalogue to avoid potential selection bias.

GWAS enrichment analysis revealed that cGenes of specific tissue types were enriched in GWAS gene sets of the most relevant phenotype. For example, monocyte cGenes were significantly enriched for monocyte count-related gene sets (BH adjusted p-value = 4.08E-05), and liver cGenes were significantly enriched for liver enzyme levels-related gene sets (BH adjusted p-value = 4.93E-02, , Supplemental Table S17). Intriguingly, cGenes in monocytes and T-cells were also enriched for the autism spectrum disorder or schizophrenia-associated genes. Compared to the ageing and aged brain, cGenes in the developmental brain were more specifically enriched for genes associated with schizophrenia (BH adjusted p-value = 1.35E-07), bipolar disorder (BH adjusted p-value = 8.61E-04), and educational attainment (BH adjusted p-value = 1.24E-02). Examples of schizophrenia-related cGCPs in three brain groups are shown in . In contrast to TAP2-cg14812313 pairs correlated in all brain age groups (), MDK gene expression is only correlated with cg21009265 methylation level in the brain developmental group ().

Figure 5. cGenes of specific tissue types are enriched in GWAS of the most relevant phenotype. (a) Top-ranked overrepresented GWAS catalogue gene sets in cGCPs across different tissues and brain stages. agedBrain, brain aged group; devBrain, brain developmental group; agingBrain, brain ageing group. (b-g) Examples of schizophrenia-related cGCPs in the aged brain developmental brain, and/or ageing brain.

Figure 5. cGenes of specific tissue types are enriched in GWAS of the most relevant phenotype. (a) Top-ranked overrepresented GWAS catalogue gene sets in cGCPs across different tissues and brain stages. agedBrain, brain aged group; devBrain, brain developmental group; agingBrain, brain ageing group. (b-g) Examples of schizophrenia-related cGCPs in the aged brain developmental brain, and/or ageing brain.

Our EWAS enrichment analysis indicated that liver cCpGs were enriched for liver development-related CpGs (BH adjusted p-value = 3.46E-02), and T-cell cCpGs were enriched for the CpGs related to autoantibody production in systemic lupus erythematosus (BH adjusted p-value = 4.65E-02) (Supplemental Table S18). In contrast, cCpGs in other tissues were not enriched for phenotype associated CpGs after multiple testing correction.

Gene expression is correlated both positively and negatively with nearby CpGs, and the correlation direction can be changed

We explored whether expression of specific genes could be correlated with CpGs in both positive and negative manners. In the same data set, an averaged 15.7%±7.4% of genes have been identified in both positive and negative correlations with different CpG sites (Supplemental Table S19). For example, in the aged brain, QSOX2 is negatively correlated with cg14228683 but positively correlated with cg14381623. Furthermore, when considering all the spatiotemporal cGCPs, a gene may be negatively or positively correlated with a CpG site in one data set while showing an opposite correlation with the same or different CpG site in another data set. For instance, HLA-DQB1 is negatively correlated with cg03202060 in the liver, but they show the opposite correlation in T-cells. Among all correlated GCPs across all four tissue types and the developmental and ageing brain, up to 36.6% of the genes were linked to CpG sites with both positive and negative correlations.

The ncCpGs and pcCpGs are associated with distinct epigenomic features

To investigate whether mechanisms (termed here as ‘epigenomic features’) associated with gene regulation may influence the DNAm-expression correlation and their direction, we collected various functional features (CGIs, gene location, chromatin states, histone modification, DHS, TF binding regions, and chromatin structures) for CpG annotation. To compare the difference between positive and negative relationships, we performed the enrichment analysis of CpG in ncGCPs and pcGCPs in all data sets. cCpGs in ncGCPs and pcGCPs were named ncCpGs and pcCpGs, respectively.

To assess whether cCpGs location relative to CGI is related to the DNAm-gene expression correlation, we tested the enrichment of cCpGs in CpG island, shore, shelf, and open sea regions. Both pcCpGs and ncCpGs were significantly more likely to be located on CpG island and shore, though they occasionally also lie in the open sea regions (). For example, the ncCpGs and pcCpGs in developmental and aged brain were enriched in CpG island and shore (p-value <0.05), whereas ncCpGs and pcCpGs in T-cells and ncCpGs in the liver were enriched in open sea regions (p-value <0.05). However, no instances were found on the CpG shelf.

Figure 6. ncCpGs and pcCpGs are associated with multiple specific epigenomic features. (a) The bar plot for CpGs by location relative to CpG islands (CGI). The stars (*) represent significant enrichment in the hypergeometric test (p-value<0.05). agedBrain, brain aged group; devBrain, brain developmental group; agingBrain, brain ageing group. (b) The bar plot for CpGs by location relative to gene body of the matched genes. The stars (*) represent significant enrichment in the hypergeometric test (p-value<0.05). (c) Overrepresentation of 15 epigenetic states in cGCPs. The active states are labelled as red colour and the repressive states are labelled as blue colour. (d) Overrepresentation of histone modification and DHS regions in cGCPs. The active markers are labelled as red and the repressive markers are labelled as blue. The markers which are not available are labelled as grey. (e) Overrepresentation of TF binding regions in ncGCPs and pcGCPs, respectively. The stars (*) represent significant enrichment in Fisher’s exact test (q value<0.1). (f) The feature importance for ncGCPs and pcGCPs, respectively.

Figure 6. ncCpGs and pcCpGs are associated with multiple specific epigenomic features. (a) The bar plot for CpGs by location relative to CpG islands (CGI). The stars (*) represent significant enrichment in the hypergeometric test (p-value<0.05). agedBrain, brain aged group; devBrain, brain developmental group; agingBrain, brain ageing group. (b) The bar plot for CpGs by location relative to gene body of the matched genes. The stars (*) represent significant enrichment in the hypergeometric test (p-value<0.05). (c) Overrepresentation of 15 epigenetic states in cGCPs. The active states are labelled as red colour and the repressive states are labelled as blue colour. (d) Overrepresentation of histone modification and DHS regions in cGCPs. The active markers are labelled as red and the repressive markers are labelled as blue. The markers which are not available are labelled as grey. (e) Overrepresentation of TF binding regions in ncGCPs and pcGCPs, respectively. The stars (*) represent significant enrichment in Fisher’s exact test (q value<0.1). (f) The feature importance for ncGCPs and pcGCPs, respectively.

Furthermore, we evaluated the effects of the location of the cCpGs relative to their associated cGenes. pcCpGs and ncCpGs were found at different gene locations (). Regardless of tissue specificity, the pcCpGs were more frequently located in the gene body while ncCpGs tended to be located on the promoter or gene body (p-value <0.05). We also found that pcCpGs in the developmental brain were uniquely enriched in the outer regions of gene promoters and bodies (p-value <2.2E-16), which were not observed in other tissues or brain of other age groups, suggesting a distinct regulatory mechanism in development. Further enrichment analysis of the intergenic regions showed that pcCpGs were enriched in gene downstream regions in monocytes, T-cells, and developmental brain; ncCpGs were enriched in developmental brain for gene upstream regions (Supplemental Fig. S7).

In addition, we annotated the CpGs using the Roadmap chromatin state model and histone markers from matched tissues. The pcCpGs and ncCpGs possess different occurrence frequencies (p-value <0.05) in multiple chromatin state regions and histone markers (). The ncCpGs tended to be located in transcriptionally active regions and associated with actively epigenetic markers, such as histone acetylation and H3K4 methylation. In contrast, the pcCpGs were largely located in non-transcribed regions, especially in regions of repressed polyComb and bivalent enhancer, and associated with repressing markers like H3K27me3. Surprisingly, H3K4me1 as an active marker also showed enrichment on the pcGCPs.

The correlation direction can differ from one tissue to another because of the changed epigenetic environment. For example, cg03202060, cg24593918, and cg23464743 were all correlated with HLA-DQB1 with negative correlations in the liver but positive correlations in T-cells, accompanied by different chromatin states and histone modification of these genomic regions in liver and T-cells (). Histone modifications and chromatin state differences could also contribute to the DNAm-expression correlation directionality of multiple CpGs correlated with the same gene in one tissue (Supplemental Fig. S8-S11). For example, in monocytes, RPS16-cg18487508 is a negative correlation, while RPS16-cg20641794 is a positive correlation. Correspondingly, the cg18487508 and cg20641794 possess distinct chromatin states and histone modification in monocytes (Supplemental Fig. S11).

Figure 7. Examples of cGCPs with opposite correlations in liver or T-cells. (a) An IGV plot shows cCpGs of HLA-DQB1 and epigenomic features. cCpGs in blue and red mean negative and positive correlations, respectively. The blue and red arrows point the examples of negative and positive cCpGs. (b) Scatter plots of HLA-DQB1 expression level (X-axis) and cg03202060 methylation level (Y-axis) in the liver. (c) Scatter plots of HLA-DQB1 expression level (X-axis) and cg03202060 methylation level (Y-axis) in the T-cell.

Figure 7. Examples of cGCPs with opposite correlations in liver or T-cells. (a) An IGV plot shows cCpGs of HLA-DQB1 and epigenomic features. cCpGs in blue and red mean negative and positive correlations, respectively. The blue and red arrows point the examples of negative and positive cCpGs. (b) Scatter plots of HLA-DQB1 expression level (X-axis) and cg03202060 methylation level (Y-axis) in the liver. (c) Scatter plots of HLA-DQB1 expression level (X-axis) and cg03202060 methylation level (Y-axis) in the T-cell.

To investigate the relationship between TFs, DNAm, and gene expression, we annotated the cCpGs using the human cistrome (the genome-wide map of regions bound by TFs). The ncCpGs were enriched in 6 TF binding regions in at least four of the six datasets, which included binding sites for MYC, PPARG, SPI1, RUNX1, CEBPA, and MYB (). The pcCpGs were enriched in the JARD2 and CTCF binding regions in at least four of the six datasets (). Some TFs-cCpG enrichments were observed only in specific tissue types, such as PO5F1-pcCpG in the developmental brain and FOXA1-ncCpG in liver only (Supplemental Fig. S12).

Topologically associated domains (TADs), a high level of 3D chromosome structure, are considered to mediate long-distance transcriptional regulation within their boundaries [Citation58]. To confirm whether cGCPs are generally located within the same TAD, we downloaded the TAD regions of H1-hESC from ENCODE and found that most cCpGs and their corresponding cGenes (>85%) were landed within the same TADs (Supplemental Fig. S13).

Finally, we determined the importance of epigenomic features in cGCPs using the random-forest method. We used the CpG location relative to CGI, gene body, chromatin state, histone modifications, TF binding sites, and the TADs mentioned above as input. The average decrease in accuracy measured by error rate was estimated after permuting each variable in ncGCPs and pcGCPs, respectively (). As demonstrated in , CpG location relative to CGI and gene body was found to be the two most important effectors for ncGCPs. Additionally, active histone modifications, such as H3K27ac also, have an important effect on the ncGCPs. The established gene repression marker, H3K27me3, was identified to be the most important effector for pcGCPs.

The cGCPs are controlled by genetic factors

Previous studies have demonstrated that genetic factors could regulate DNAm and the gene expression [Citation59]. We examined the genetic effect of cGCPs using mQTL and eQTL data of brain, liver, monocytes, and T-cells. By comparing the cCpGs to the CpG site in mQTLs, we observed a significant enrichment of the cCpGs in mQTL CpG sites (p < 0.05, Supplemental Table S21). The proportion of cCpGs that are associated with SNP genotypes range from 9% in liver to 52% in monocytes. The cGenes were also significantly enriched for the eQTL genes (p < 0.05, proportion: 24%~55%, Supplemental Table S21). Finally, cGCPs were enriched by pairs of eQTLs and mQTLs linked by the exact SNPs (p < 0.05, proportion: 5%~23%, Supplemental Table S21). Moreover, the eQTLs-mQTLs pairs also showed strong tissue specificity (~83% in single tissue, Supplemental Fig. S14).

Discussion

In this study, we evaluated DNAm and gene expression across different tissue types and brains of different age ranges, and found tissue- and age-specific DNAm-expression correlations. Thousands of GCPs were found to be correlated in various tissue types, but only four pairs were consistently correlated across different tissue types. Three of the four cGCPs were related to glutathione metabolic functions. DNAm-related regulation highlighted the functional importance of glutathione, which is known to be critical in antioxidant defence, nutrient metabolism, and basic cellular functions [Citation60,Citation61]. Moreover, glutathione level is directly related to methyl-donor and DNAm level [Citation62]. Thousands of cGCPs were identified in the developmental brain cohort, but only 4% of these developmental cGCPs were also present in the aged brain. Most cGCPs in the developmental brain were found to be specific to neurogenesis. We further discovered new epigenetic signature for pcGCPs suggesting novel regulation mechanisms. Our findings suggest that DNA methylation patterns are essential for the maintenance of tissue- and development-specific gene expression.

This study confirmed the presence of both positive and negative relationships between DNAm and gene expression and systematically assessed their associated epigenomic features. Our results support the well-known repressive model of DNAm in gene regulation by ncGCPs, including that DNAm at promoter and CGI regions repress gene expression [Citation6], the involvement of active chromatin markers in negative cGCPs [Citation10] and the association with a large number of methylation-sensitive [Citation23,Citation63]. For the activator function of DNAm in regulating gene expression represented by pcGCPs, we discovered several associated epigenomic features. The DNAm activator model may be that increasing DNAm level could recruit the methylation-preferred activating TFs to promote gene expression. It is reported that some TFs, such as PO5F1 (POU5F1), preferred to bind the methylated DNA sequences [Citation24]. Another alternative model of DNAm activator may be modulated by the reduced binding of repressive TFs, like CTCF, which contributes to the establishment of the chromatin loops and genome topology [Citation64]. DNAm could block the CTCF binding and open chromatin loop, which increased expression of the gene outside the loop [Citation33]. H3K27me3 regulated by PRC2 combined with JARD2 (JARID2) might be another example of pcGCPs (Supplemental Fig. S15). Our results showed the strong association of pcGCPs with the H3K27me3 marker regions and the JARD2. JARD2 recruits PRC2 to chromatin and may also regulate DNA methylation [Citation65–68]. PRC2 is the only identified methyltransferase that catalyzes H3K27me3 and this process may be negatively regulated by DNAm [Citation69–72]. In the H3K27me3 regions, increasing DNAm may inhibit the recruitment of PRC2 by JARD2 and prevent the deposition of H3K27me3, resulting in an open chromatin and gene activation. Our results suggested novel insight of DNAm-mediated expression regulation for the positively associated GCPs.

Our study highlights the functional relevance of cGCPs by their enriched biological functions and human diseases. The cGCPs identified in our study were found to be associated with tissue- and age-specific disease phenotypes, suggesting an important contribution of methylation-correlated gene expression to complex traits. Based on our results, schizophrenia- and bipolar disorder-related cGCPs were preferentially enriched in developmental brain tissue, suggesting that these phenotypes are established during certain developmental stages. For example, MDK was only correlated with DNAm in the developmental brain but not in the ageing and aged brains. MDK has been reported as a schizophrenia-related gene in several studies. It was identified as a schizophrenia-related gene by a Pascal gene-based test [Citation73]; a differentially expressed gene in the brains of patients with schizophrenia compared to healthy controls [Citation74]; and a schizophrenia-associated gene in a transcriptome-wide association study (TWAS) in both adult brain [Citation75] and foetal brain [Citation76]. The cGCPs associated with immune functions were also found to be linked to multiple psychiatric disorders, suggesting other important candidates for further investigation [Citation77,Citation78]. cCpGs related to lupus erythematosus were enriched in T-cells, suggesting candidates for autoimmune disorders [Citation79]. These results jointly imply that concerted DNA-expression is indicative of the biological functions of gene and CpG sites involved in the cGCPs.

We compiled a catalogue of the correlated GCPs as references for other genetic and epigenetic studies (Supplemental Table S2). Genetic and transcriptome studies have yielded a large number of disease-associated candidate genes. DNAm abnormalities have also been identified in multiple diseases [Citation80–85]. With the cGCP catalogue, we were able to connect disease-related genes with their putative DNAm regulators. This will reduce the search space for disease-relevant genes, a decrease in false positives, and increase statistical power.

Limitations in this study exist. First, the cGCPs identification is based on the correlation analysis, which may be affected by factors, such as the sample size and variation of data [Citation86]. Small sample size reduced the power in correlation tests and thus led to fewer significant cGCPs identified in these tissues such as the liver dataset. The regulatory relationships could be missed if both levels of DNAm and expression are constant in tissues. Correlation analysis also cannot decipher the causality. Though we have investigated the correlated DNAm-expression in large samples, more research is still needed to detect the correlated GCPs and interpret their causality in much more tissues and samples. Second, though we used the uniform preprocess pipeline, a batch factor may still confound the data from different laboratories and different platforms. Thus, we can only conclude on genes and CpGs that have been consistently captured by different platforms. The tissue specificity has an unavoidable problem for the cross-tissue comparison: we cannot evaluate loci that are unique to one tissue or have dramatically different variations across data. Our study focuses on the CpG sites that can be measured consistently with sufficient variation across tissues. We can only conclude on the DNAm-expression relationships based on the 50% CpG sites that have been investigated. Only the genes expressed in both data in the comparison were covered. The low-variation CpG sites and gene expressed uniquely in one data point may not necessarily obey the rules discovered here. Third, since high-resolution Hi-C data was not available for the studied tissues, we used the H1-hESC TAD regions, which may not map the cGCPs interaction in all tissue precisely. Finally, we found that up to 23% of cGCPs were likely driven by CpGs and genes associated with the same genetic factors. However, given that the fixed genotype is coupled with highly dynamic relationships between gene expression and DNAm that vary across tissues or developmental stages, genetic factors explained a relative small portion some of the cGCPs. Majority of cGCPs was not driven by the shared genetic regulators.

In summary, our study provides strong evidence that concerted DNAm-gene expression relationship is tissue- and development-specific. We also found that the positive and negative correlation directions can be switched in different tissues or age periods. We discovered a novel epigenetic signature of pcGCPs, which suggested new DNAm-mediated regulatory mechanisms through CTCF and JARD2. Our findings highlight the functional importance of specific cGCPs for their cellular functions, and potential contribution to diverse disease susceptibility. Early developmental brain employed a highly unique set of cGCPs. MDK is one of the candidates for schizophrenia revealed by cGCPs. cGCPs identified in our study may facilitate further classification of regulatory elements and prioritization of tissue- and development-specific functional genes.

Author contributions

KW, RD, YX, and CJ collected and preprocessed the datasets. KW performed the GCPs analysis. KW and JH performed the feature and GWAS enrichment analysis. KW, CC, TM, CZ, and CL participated in the revision of the manuscript. CC and CL conceived the study, participated in its design, and supervised the entire project. All authors read and approved the final manuscript.

Supplemental material

Supplemental Material

Download Zip (18.3 MB)

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grants Nos. 82022024, 31970572, 31871276), the National Key R&D Project of China (Grants No. 2016YFC1306000 and 2017YFC0908701), Innovation-driven Project of Central South University (Grant Nos. 2020CX003), and NIH grants U01 MH122591, 1U01MH116489, 1R01MH110920. We sincerely thank Liz Kuney, Richard F. Kopp, and other colleagues for scientific and language advice that helped improve our manuscript.

Disclosure statement

The authors declare no competing interests.

Supplementary material

Supplemental data for this article can be accessed here

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [82022024]; National Natural Science Foundation of China [31970572]; National Natural Science Foundation of China [31871276]; Innovation-driven Project of Central South University [2020CX003]; National Institutes of Health [1U01MH116489]; National Institutes of Health [1R01MH110920]; National Institutes of Health [U01 MH122591]; the National Key R&D Project of China [2017YFC0908701]; the National Key R&D Project of China [2016YFC1306000].

References

  • Barlow DP. Genomic imprinting: a mammalian epigenetic discovery model. Annu Rev Genet. 2011;45(1):379–403.
  • Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484–492.
  • Reik W. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature. 2007;447(7143):425–432.
  • Baylin SB, Jones PA. A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer. 2011;11(10):726–734.
  • Stadler MB, Murr R, Burger L, et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011;480(7378):490–495.
  • Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33(S3):245–254. Suppl:245-54.
  • Hellman A, Chess A. Gene body-specific methylation on the active X chromosome. Science. 2007;315(5815):1141–1143.
  • Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271):315–322.
  • Olsson AH, Volkov P, Bacos K, et al. Genome-wide associations between genetic and epigenetic variation influence mRNA expression and insulin secretion in human pancreatic islets. PLoS Genet. 2014;10(11):e1004735.
  • Wagner JR, Busche S, Ge B, et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 2014;15(2):R37.
  • Tasaki S, Gaiteri C, Mostafavi S, et al. Multi-omic directed networks describe features of gene regulation in aged brains and expand the set of genes driving cognitive decline. Front Genet. 2018;9:294.
  • Taylor DL, Jackson AU, Narisu N, et al. Integrative analysis of gene expression, DNA methylation, physiological traits, and genetic variation in human skeletal muscle. Proc Natl Acad Sci U S A. 2019;116(22):10883–10888.
  • Bell JT, Tsai PC, Yang TP, et al. Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population. PLoS Genet. 2012;8(4):e1002629.
  • Varley KE, Gertz J, Bowling KM, et al. Dynamic DNA methylation across diverse human cell lines and tissues. Genome Res. 2013;23(3):555–567.
  • Smith ZD, Meer A. DNA methylation: roles in mammalian development. Nat Rev Genet. 2013;14(3):204–220.
  • Liang P, Song F, Ghosh S, et al. Genome-wide survey reveals dynamic widespread tissue-specific changes in DNA methylation during development. BMC Genomics. 2011;12(1):231.
  • Reynolds LM, Taylor JR, Ding J, et al. Age-related variations in the methylome associated with gene expression in human monocytes and T cells. Nat Commun. 2014;5(1):5366.
  • Blake LE, Roux J, Hernando-Herraez I, et al. A comparison of gene expression and DNA methylation patterns across tissues and species. Genome Res. 2020;30(2):250–262.
  • Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. Elife. 2013;2:e00523.
  • Gutierrez-Arcelus M, Ongen H, Lappalainen T, et al. Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS Genet. 2015;11(1):e1004958.
  • Bonder MJ, Kasela S, Kals M, et al. Genetic and epigenetic regulation of gene expression in fetal and adult human livers. BMC Genomics. 2014;15(1):860.
  • Schubeler D. Function and information content of DNA methylation. Nature. 2015;517(7534):321–326.
  • Zhu H, Wang G, Qian J. Transcription factors as readers and effectors of DNA methylation. Nat Rev Genet. 2016;17(9):551–565.
  • Yin Y, Morgunova E, Jolma A, et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science. 2017;356(6337).
  • Wang G, Luo X, Wang J, et al. MeDReaders: a database for transcription factors that bind to methylated DNA. Nucleic Acids Res. 2018;46:D146–D51.
  • Roadmap Epigenomics C, Kundaje A, Meuleman W, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–330.
  • Lovkvist C, Sneppen K, Haerter JO. Exploring the link between nucleosome occupancy and DNA methylation. Front Genet. 2017;8:232.
  • Spektor R, Tippens ND, Mimoso CA, et al. methyl-ATAC-seq measures DNA methylation at accessible chromatin. Genome Res. 2019;29(6):969–977.
  • Collings CK, Anderson JN. Links between DNA methylation and nucleosome occupancy in the human genome. Epigenetics Chromatin. 2017;10(1):18.
  • Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat Rev Genet. 2009;10(5):295–304.
  • Du J, Johnson LM, Jacobsen SE, et al. DNA methylation pathways and their crosstalk with histone methylation. Nat Rev Mol Cell Biol. 2015;16(9):519–532.
  • Bonder MJ, Luijk R, Zhernakova DV, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 2017;49(1):131–138.
  • Liu XS, Wu H, Ji X, et al. Editing DNA methylation in the mammalian genome. Cell. 2016;167(1):233–47 e17.
  • Kang HJ, Kawasawa YI, Cheng F, et al. Spatio-temporal transcriptome of the human brain. Nature. 2011;478(7370):483–489.
  • Numata S, Ye T, Hyde TM, et al. DNA methylation signatures in development and aging of the human prefrontal cortex. Am J Hum Genet. 2012;90(2):260–272.
  • De Jager PL, Srivastava G, Lunnon K, et al. Alzheimer’s disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nat Neurosci. 2014;17(9):1156–1163.
  • Horvath S, Erhart W, Brosch M, et al. Obesity accelerates epigenetic aging of human liver. Proc Natl Acad Sci U S A. 2014;111(43):15538–15543.
  • Jaffe AE, Gao Y, Deep-Soboslay A, et al. Mapping DNA methylation across development, genotype and schizophrenia in the human frontal cortex. Nat Neurosci. 2016;19(1):40–47.
  • Pujol J, Vendrell P, Junque C, et al. When does human brain development end? Evidence of corpus callosum growth up to adulthood. Ann Neurol. 1993;34(1):71–75.
  • Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45:e22.
  • Nordlund J, Backlin CL, Wahlberg P, et al. Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia. Genome Biol. 2013;14(9):r105.
  • Teschendorff AE, Marabita F, Lechner M, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina infinium 450 k DNA methylation data. Bioinformatics. 2013;29(2):189–196.
  • Harrow J, Frankish A, Gonzalez JM, et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 2012;22(9):1760–1774.
  • Martin N, Maes H. Multivariate analysis. London: Academic press; 1979.
  • Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127.
  • Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3(9):1724–1735.
  • Li J, Bushel PR, Chu TM, et al. Principal variance components analysis: estimating batch effects in microarray gene expression data. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. 2009: 141–154.
  • Wang J, Vasaikar S, Shi Z, et al. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 2017;45:W130–W7.
  • Watanabe K, Taskesen E, van Bochoven A, et al. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826.
  • Battram T, Yousefi P, Crawford G, et al. The EWAS catalog: a database of epigenome-wide association studies. OSF Preprints. 2021; February 4. DOI: 10.31219/osf.io/837wn.
  • Saffari A, Silver MJ, Zavattari P, et al. Estimation of a significance threshold for epigenome-wide association studies. Genet Epidemiol. 2018;42(1):20–33.
  • Consortium EP. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306(5696):636–640.
  • Sheffield NC, Bock C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and bioconductor. Bioinformatics. 2015;32(4):587–589.
  • Dixon JR, Selvaraj S, Yue F, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–380.
  • Ho JW, Jung YL, Liu T, et al. Comparative analysis of metazoan chromatin organization. Nature. 2014;512(7515):449–452.
  • Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–842.
  • Vorontsov IE, Fedorova AD, Yevshin IS, et al. Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data. BMC Res Notes. 2018;11(1):756.
  • Acemel RD, Maeso I, Gomez-Skarmeta JL. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. WIRES Dev Biol. 2017;6(3):e265.
  • Ng B, White CC, Klein HU, et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat Neurosci. 2017;20(10):1418–1426.
  • Wu G, Fang YZ, Yang S, et al. Glutathione metabolism and its implications for health. J Nutr. 2004;134(3):489–492.
  • Forman HJ, Zhang H, Rinna A. Glutathione: overview of its protective roles, measurement, and biosynthesis. Mol Aspects Med. 2009;30(1–2):1–12.
  • Lertratanangkoon K, Wu CJ, Savaraj N, et al. Alterations of DNA methylation by glutathione depletion. Cancer Lett. 1997;120(2):149–156.
  • Maurano MT, Wang H, John S, et al. Role of DNA methylation in modulating transcription factor occupancy. Cell Rep. 2015;12(7):1184–1195.
  • Ong CT, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet. 2014;15(4):234–246.
  • Li G, Margueron R, Ku M, et al. Jarid2 and PRC2, partners in regulating gene expression. Genes Dev. 2010;24(4):368–380.
  • Pasini D, Cloos PA, Walfridsson J, et al. JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature. 2010;464(7286):306–310.
  • Peng JC, Valouev A, Swigut T, et al. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell. 2009;139(7):1290–1302.
  • Dixon G, Pan H, Yang D, et al. QSER1 protects DNA methylation valleys from de novo methylation. Science. 2021;372(6538). DOI:10.1126/science.abd0875.
  • Laugesen A, Hojfeldt JW, Helin K. Molecular mechanisms directing PRC2 recruitment and H3K27 methylation. Mol Cell. 2019;74(1):8–18.
  • Bartke T, Vermeulen M, Xhemalce B, et al. Nucleosome-interacting proteins regulated by DNA and histone methylation. Cell. 2010;143(3):470–484.
  • Jermann P, Hoerner L, Burger L, et al. Short sequences can efficiently recruit histone H3 lysine 27 trimethylation in the absence of enhancer activity and DNA methylation. Proc Natl Acad Sci U S A. 2014;111(33):E3415–21.
  • Wachter E, Quante T, Merusi C, et al. Synthetic CpG islands reveal DNA sequence determinants of chromatin structure. Elife. 2014;3:e03397.
  • Wu Y, Yao YG, Luo XJ. SZDB: a database for Schizophrenia genetic research. Schizophr Bull. 2017;43:459–471.
  • Gandal MJ, Haney JR, Parikshak NN, et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science. 2018;359(6376):693–697.
  • Gandal MJ, Zhang P, Hadjimichael E, et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science. 2018;362(6420). 10.1126/science.aat8127.
  • Walker RL, Ramaswami G, Hartl C, et al. Genetic control of expression and splicing in developing human brain informs disease mechanisms. Cell. 2019;179(3):750–71 e22.
  • Birnbaum R, Jaffe AE, Chen Q, et al. Investigating the neuroimmunogenic architecture of schizophrenia. Mol Psychiatry. 2018;23(5):1251–1260.
  • Hartwig FP, Borges MC, Horta BL, et al. Inflammatory biomarkers and risk of Schizophrenia: a 2-sample Mendelian randomization study. JAMA Psychiatry. 2017;74(12):1226–1233.
  • Sharabi A, Tsokos GC. T cell metabolism: new insights in systemic lupus erythematosus pathogenesis and therapy. Nat Rev Rheumatol. 2020;16(2):100–112.
  • Robertson KD. DNA methylation and human disease. Nat Rev Genet. 2005;6(8):597–610.
  • Rakyan VK, Down TA, Balding DJ, et al. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–541.
  • Chen C, Zhang C, Cheng L, et al. Correlation between DNA methylation and gene expression in the brains of patients with bipolar disorder and schizophrenia. Bipolar Disord. 2014;16(8):790–799.
  • Paul DS, Beck S. Advances in epigenome-wide association studies for common diseases. Trends Mol Med. 2014;20(10):541–543.
  • Li M, Zou D, Li Z, et al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res. 2019;47(D1):D983–D8.
  • Semick SA, Bharadwaj RA, Collado-Torres L, et al. Integrated DNA methylation and gene expression profiling across multiple brain regions implicate novel genes in Alzheimer’s disease. Acta Neuropathol. 2019;137(4):557–569.
  • Goodwin LD, Leech NL. Understanding correlation: factors that affect the size of r. The Journal of Experimental Education. 2006;74(3):249–266

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.