1,537
Views
0
CrossRef citations to date
0
Altmetric
Research article

Pleiotropic influence of DNA methylation QTLs on physiological and ageing traits

ORCID Icon, , ORCID Icon, , ORCID Icon &
Article: 2252631 | Received 01 May 2023, Accepted 16 Aug 2023, Published online: 10 Sep 2023

ABSTRACT

DNA methylation is influenced by genetic and non-genetic factors. Here, we chart quantitative trait loci (QTLs) that modulate levels of methylation at highly conserved CpGs using liver methylome data from mouse strains belonging to the BXD family. A regulatory hotspot on chromosome 5 had the highest density of trans-acting methylation QTLs (trans-meQTLs) associated with multiple distant CpGs. We refer to this locus as meQTL.5a. Trans-modulated CpGs showed age-dependent changes and were enriched in developmental genes, including several members of the MODY pathway (maturity onset diabetes of the young). The joint modulation by genotype and ageing resulted in a more ‘aged methylome’ for BXD strains that inherited the DBA/2J parental allele at meQTL.5a. Further, several gene expression traits, body weight, and lipid levels mapped to meQTL.5a, and there was a modest linkage with lifespan. DNA binding motif and protein–protein interaction enrichment analyses identified the hepatic nuclear factor, Hnf1a (MODY3 gene in humans), as a strong candidate. The pleiotropic effects of meQTL.5a could contribute to variations in body size and metabolic traits, and influence CpG methylation and epigenetic ageing that could have an impact on lifespan.

Introduction

Genome-wide patterns in DNA methylation (DNAm) are established during development and are critical for cell differentiation and cell identity [Citation1]. The canonical form of DNAm involves the addition of a methyl group to the cytosine residue at CG dinucleotides (i.e., CpG methylation). The methylation status of CpGs is a part of the epigenetic landscape that serves as a stable and yet reprogrammable form of gene expression regulation [Citation2,Citation3]. On the one hand, methylation of CpGs is important for sustaining and perpetuating gene expression signatures and in giving each organ and tissue its functional identity. On the other hand, some of the CpG regions are dynamic and represent modifiable epigenetic sites that enable the genome to respond and adapt to ever-changing environmental and nutritional states [Citation4,Citation5]. The epigenetic clocks that are now widely used as biomarkers of health and ageing are based on a handful of CpGs that are altered by the passage of time and track closely with age and ageing [Citation6–9].

Along with ageing and modifications by extrinsic factors, CpG methylation can also be influenced by underlying genetic sequence variants. Several studies in humans have identified genetic loci that are associated with DNAm [Citation10–12]. Analogues to gene expression quantitative trait loci (eQTLs) [Citation13], a genetic region that influences the quantitative variation in CpG methylation is referred to as a methylation QTL, or meQTL (alternatively also shortened to ‘mQTL’, although that can be confused with ‘metabolite QTL’ and ‘module QTL’) [Citation14,Citation15]. Several human studies have performed genome-wide association studies (GWAS) for CpG methylation, and there are now large-scale multi-tissue meQTL atlases available for humans [Citation16,Citation17]. Similar to the classification of eQTLs into cis- and trans-effects, meQTLs are also categorized into cis-meQTLs or trans-meQTLs, depending on the distance between the meQTL regulatory locus and the target CpG [Citation17,Citation18]. Cis-meQTLs are highly enriched for genetic loci that have been associated with complex traits, and genetic variation in methylation levels are implicated in disease risk [Citation11,Citation12,Citation19,Citation20]. An meQTL region can be associated with multiple distal CpGs in trans, and similar to trans-eQTL hotspots, such sites represent trans-meQTL hotspots [Citation10,Citation12,Citation21]. Trans-meQTL hotspots implicate causal modulators with widespread influence on CpGs. There is now growing evidence that DNA binding factors and transcription factors (TFs) play a role in shaping the methylome by exerting a trans-regulatory influence on CpGs [Citation12,Citation22–24]. For instance, during hepatocyte differentiation, TFs such as the hepatocyte nuclear factors (HNFs) and GATA family are reported to regulate the dynamic spatial and temporal patterns in DNAm [Citation25].

Model organisms provide a powerful tool for interrogating the interactions between meQTLs, eQTLs, and experimental conditions such as diets and drugs. However, although genome-scale meQTL studies in humans date back to the 2010s [Citation26], there is an over 10-year lag in methylome-wide meQTL studies in rodent populations. This is partly due to the lack of a cost-effective and scalable DNAm microarray for model organisms that is comparable to the Illumina Human Methylation Infinium BeadChips [Citation27]. A few of us attempted to repurpose the human arrays to measure methylation in mice [Citation28–30]. Indeed, a small subset of the CpG probes on the human arrays map to conserved sequences and could be considered as ‘pan-mammalian’ interrogators of the epigenome. More recently, a truly pan-mammal array, the HorvathMammalMethylChip40, was custom developed [Citation31,Citation32]. A unique aspect of the array is that the probes map to conserved sequences, and this has opened up new avenues for large multi-species comparative epigenomics [Citation33,Citation34]. We used this array to track epigenetic changes with ageing in mice subjected to two different dietary conditions [Citation6]. Our data incorporated genetic diversity as we profiled members of the BXD family. In the present work, we use the methylome data for an meQTL mapping study.

BXDs are a family of recombinant inbred (RI) and advanced intercross (AI) mouse strains. We have previously described the BXDs in greater detail [Citation35–38]. In brief, the BXD family consists of about 150 inbred members derived from two progenitor strains: C57BL/6J (B6) and DBA/2J (D2). The BXDs have a long history in quantitative genetics, and the earlier sets of RI strains were used to map simpler Mendelian traits [Citation39,Citation40]. Subsequently, additional sets of RI and AI strains were added to the growing family, and over the years, the BXDs have accrued a vast compendium of phenotypic data ranging from metabolic, physiologic, lifespan, to behavioural and neural traits, and multi-omic datasets (e.g., transcriptomics, proteomics, and metabolomics) [Citation37,Citation41–43]. This is matched by deep genome sequence data with over 6 million genetic variants segregating in the family, making the BXDs a powerful mammalian panel for systems genetics and systems epigenetics [Citation6,Citation32,Citation44–46].

In our previous work, we studied the genetic regulation of epigenetic clocks in the BXDs and examined metabolic and dietary factors that are related to the age-dependent methylation changes [Citation6]. Here, we focus on meQTLs that influence individual CpGs and evaluate the genetic architecture of CpG methylation in liver tissue. We identify meQTL hotspots that influence multiple distal CpGs. The region on chromosome (Chr) 5 harboured the highest number of co-localized trans-meQTLs, and we refer to this genetic interval as meQTL.5a. This region also contains a high-density of QTLs linked to gene expression both at the transcriptomic (eQTLs) and proteomic (pQTLs) levels. The CpGs that are trans-modulated by meQTL.5a are enriched in early developmental genes, and the pattern of variance indicates a genotype-dependent susceptibility to the effects of ageing. Specifically, we find a more aged methylome for strains that have the D2 allele at meQTL.5a, and this may contribute to variations in lifespan (represented as a graphical summary in ). Furthermore, we find a modest pleiotropic effect of this locus on the body weight, and for this, the B6 allele was associated with a positive additive effect. Based on DNA binding motif enrichment and protein–protein interaction (PPI), we identify the hepatic nuclear factor, Hnf1a, as one of the important candidate genes in meQTL.5a. In humans, mutations in HNF1A results in MODY3 (maturity onset diabetes of the young 3) [Citation47], and our results indicate a trans-modulatory effect of meQTL.5a on CpGs located in several other genes that are part of the MODY pathway. Overall, our results suggest a convergent effect of age on CpGs that are also partly influenced by an meQTL. We propose a model in which the meQTL.5a has both a horizontal and vertical pleiotropic effect on physiological traits, DNAm, and lifespan.

Figure 1. Graphical summary. Genetic variation in the trans-meQTL hotspot, meQTL.5a (black and pale-brown represent the genotype variants of meQTL.5a), influences CpG methylation at multiple developmental genes. DD in meQTL.5a is associated with higher methylation in the downstream targets. As these CpGs also gain methylation with aging, the DD genotype has a more “aged methylome.” The variation in methylation could contribute to genotype-dependent variation in lifespan. (Emojis downloaded from https://dryicons.com).

Figure 1. Graphical summary. Genetic variation in the trans-meQTL hotspot, meQTL.5a (black and pale-brown represent the genotype variants of meQTL.5a), influences CpG methylation at multiple developmental genes. DD in meQTL.5a is associated with higher methylation in the downstream targets. As these CpGs also gain methylation with aging, the DD genotype has a more “aged methylome.” The variation in methylation could contribute to genotype-dependent variation in lifespan. (Emojis downloaded from https://dryicons.com).

Results

Overview of meQTL distribution in the mouse genome

The HorvathMammalMethylChip40 array contains probes for 27,966 CpGs that have been validated for the mouse genome [Citation6,Citation32]. We have used this to assay the liver methylome in a population of BXDs (details on the samples are in Data S1). To uncover genetic loci that modulate methylation variation at these CpGs, we performed linkage mapping across the autosomal chromosome (Chrs) [Citation35]. QTL mapping was implemented in R/qtl2 and adjusted for age, diet, the top methylome-wide principal component, and the BXD’s kinship matrix [Citation48]. For each of the 27,966 CpGs, we plotted its genome-wide highest LOD score, and the distance between the maximal meQTL marker, and the CpG location (; genome-wide peak LOD data for each CpG is in Data S2). Strong meQTLs tended to reside within 10 Mb of the corresponding CpGs, and we classified these regulatory loci as cis-meQTLs and the targeted CpGs as cis-CpGs. Note that due to the family-based population and the comparatively larger haplotypes in the BXDs [Citation35], we have used a much larger interval rather than the <1 Mb interval that is typically used to assign cis-effects in human association studies [Citation10,Citation17]. In total, 3921 CpGs (14% of the CpGs we examined) mapped to at least one meQTL at a nominal LOD ≥ 3.5. Many of the CpGs were polygenic and mapped to more than one locus (in other words, a CpG with a strong cis-meQTL may also have lower QTLs in trans). The meQTLs showed an uneven distribution with some loci having a trans-modulatory linkage to many distal CpGs that potentially signify a regulatory hotspot (). At such trans-meQTL hotspots, there is an imbalance in which parental allele increased methylation (i.e., has a positive additive effect). For instance, the majority of the CpGs that have meQTLs on markers on Chr 5 (~115 Mb) are associated with higher methylation for the allele from the D2 parent (D allele) (). On Chr2 (~110 Mb), it is the allele from the B6 parent (B allele) that is associated with higher methylation. This is consistent with reports from human studies that SNPs associated with multiple CpGs in trans have the same direction of allelic effect [Citation24]. The allelic effects are clearly visible when we consider only the genome-wide peak LOD markers for each CpG (i.e., each CpG linked to only its genome-wide strongest meQTL marker). plots the locations of 1416 peak LOD markers against the location of 3921 CpGs that mapped at LOD ≥ 3.5. Of these, 1833 CpGs mapped as cis-meQTLs (meQTLstomarker ratio of 1.98). The remaining 2088 CpGs had peak LOD at 691 unique markers that were distant from the location of the CpG (meQTL-to-marker ratio of 3.02). These QTLs are classified as trans-meQTLs, and the CpGs are referred to as trans-CpGs. For the cis-meQTLs, the number of loci in which the B allele had the positive addictive effect (909 cis-meQTLs) was similar to the number of loci in which the D allele had the positive additive effect (924 cis-meQTLs). However, for the trans-meQTLs, there was a preponderance for higher methylation for the D allele (1413 or 68% of the trans-CpGs).

Figure 2. Overview of methylation QTL (meQTLs) in the liver. (a) plot of the genome-wide peak LOD score for the 27,966 CpGs, and distance between the CpG and the maximum LOD marker. Yellow: D allele (DBA/2J genotype) has positive additive effect; blue: B allele (C57BL/6J genotype) has positive additive effect. (b) meQTL location (x-axis; from chromosomes 1–19) and counts of meQTLs with LOD ≥ 3.5. (c) the genome graph plots location of the genome-wide peak QTL marker (x-axis), and location of the linked CpG (y-axis). Shows only the 3921 CpGs that map at LOD ≥ 3.5. Relative enrichment or depletion in (d) genomic location, and (e) predicted chromatin states for CpGs that map as cis-meQTLs (black), and as trans-meQTLs (grey). Asterisks denote hypergeometric enrichment p < 0.001. (f) enrichment or depletion in differentially methylated CpGs among the cis- and trans-modulated CpGs. (Asterisks denote hypergeometric enrichment p < 0.001; hash denotes p = 0.003) (g) Portion of the peak interval in the chromosome 5 meQTL hotspot: meQTL.5a (from UCSC Genome browser GRCm38/mm10).

Figure 2. Overview of methylation QTL (meQTLs) in the liver. (a) plot of the genome-wide peak LOD score for the 27,966 CpGs, and distance between the CpG and the maximum LOD marker. Yellow: D allele (DBA/2J genotype) has positive additive effect; blue: B allele (C57BL/6J genotype) has positive additive effect. (b) meQTL location (x-axis; from chromosomes 1–19) and counts of meQTLs with LOD ≥ 3.5. (c) the genome graph plots location of the genome-wide peak QTL marker (x-axis), and location of the linked CpG (y-axis). Shows only the 3921 CpGs that map at LOD ≥ 3.5. Relative enrichment or depletion in (d) genomic location, and (e) predicted chromatin states for CpGs that map as cis-meQTLs (black), and as trans-meQTLs (grey). Asterisks denote hypergeometric enrichment p < 0.001. (f) enrichment or depletion in differentially methylated CpGs among the cis- and trans-modulated CpGs. (Asterisks denote hypergeometric enrichment p < 0.001; hash denotes p = 0.003) (g) Portion of the peak interval in the chromosome 5 meQTL hotspot: meQTL.5a (from UCSC Genome browser GRCm38/mm10).

In terms of genomic locations and chromatin states [Citation49,Citation50], the cis-CpGs were enriched for introns and intergenic regions, and were located in transcriptionally permissive states (Tr-P), but were highly depleted in active and bivalent promoters (Pr-A and Pr-B, respectively), transcriptionally strong states (Tr-S) (; Table S1), and gene exons, promoters, and 3’ and 5’ UTRs. Trans-CpGs were enriched in enhancer sites and depleted in promoter regions (, e; Table S1).

To examine how the genetic variation in methylation relate to variance associated with ageing, diet, body weight, and genotype-dependent longevity (variables that we have reported in detail in [Citation6]), we examined the proportion of differentially methylated CpGs (DMCs) that map as cis- or trans-meQTLs. The cis-CpGs were only modestly enriched in DMCs associated with genotype-dependent lifespan (lifespan differentially methylated CpGs or LS-DMC; hypergeometric enrichment p = 0.003) and were depleted in DMCs related to age, weight, and diet (; Table S2). This indicates that the variance of CpGs that are under cis-modulation is largely due to genetics. In contrast, the trans-CpGs were highly enriched in CpGs that gained methylation with age (age-gain), and CpGs associated with weight, diet, and LS-DMCs. This suggests that the variance of CpGs that are under trans-modulation is multi-factorial and influenced by both genetic and non-genetic factors.

meQTL hotspots and association with gene expression

To define regions that contain a high density of meQTLs, we took the 3921 CpGs with maximal LOD ≥ 3.5 and counted the number of meQTLs linked to each genotype marker. Eighteen markers were associated with 20 or more meQTLs, and we classified these as putative meQTL hotspots (Table S3). A few of these are mostly cis-meQTL regions. For example, the two neighbouring markers on Chr19, rs30567369 (47.51 Mb) and rs31157694 (47.94 Mb), were linked to 58 CpGs in cis. We have previously reported this region as a QTL for liver epigenetic age acceleration (distal portion of ‘epigenetic age acceleration QTL on Chr19’ or Eaa19) [Citation6].

For trans effects, the highest number of trans-meQTLs per marker was on Chr5, ~115 Mb (). Here, the SNP marker rs29733222 (115.43 Mb; coordinates based on GRCm38/mm10) is linked to 230 genome-wide peak trans-meQTLs, and only three genome-wide peak cis-meQTLs (Table S3). As several neighbouring markers in this Chr5 region were linked to multiple meQTLs, we roughly delineated the 10 Mb interval (110–120 Mb) as a liver meQTL hotspot and referred to this region as meQTL.5a. In total, 535 meQTLs mapped to meQTL.5a at the 3.5 LOD score threshold (502 trans-meQTLs, 33 cis-meQTLs; ; Data S2). The majority of the trans-meQTLs in meQTL.5a (435 of the 502) were associated with higher methylation for the D allele, and only 67 trans-meQTLs were associated with higher DNAm for the B allele.

Table 1. Liver meQTL hotspots.

Similarly, we demarcated broad 10 Mb intervals around the other putative meQTL hotspots and counted the number of CpGs that have peak LOD scores in these intervals. We tabulated eight putative meQTL hotspots that are on Chrs 2, 4, 5, 7, 14, and 19 (). To examine if these meQTL intervals also influence gene expression we referred to existing BXD liver RNA-seq data (previously reported in [Citation43] and searched for transcripts that map to the 10 Mb intervals listed in at eQTL LOD ≥ 3.5). meQTL.5a had the largest number of eQTLs, followed by Eaa19 (; lists of transcripts that map to meQTL.5a and Eaa19 are in Data S3 and Data S4). meQTL.5a had 376 liver eQTLs that included 36 cis-eQTLs from positional candidate genes such as Cit, Sirt4, and Hspb8. Eaa19, despite being primarily a cis-meQTL locus, had an abundance of trans-eQTLs (). Somewhat surprisingly, for all the meQTL intervals, there was very little overlap between meQTLs and eQTLs even for the strong cis-effects, and this suggests limited co-regulation of the methylome and the transcriptome. Only a few genes (listed in ) had concordant meQTLs and eQTLs in the same locus, and of these, only the QTLs for Clcn3 and Tenm3 in meQTL.5a were trans-effects. For both Clcn3 and Tenm3, the trans-modulated CpGs (cg16842643 and cg24399106, respectively) are in the 5’UTR. The remaining few genes with overlapping me/eQTLs were cis-effects.

We prioritized the meQTL.5a and Eaa19 intervals and referred to the liver proteomic data (also reported in [Citation43]) to search for protein QTLs (pQTLs) in meQTL.5a and Eaa19. At the same LOD ≥ 3.5 threshold, 104 protein variants from 83 unique genes mapped as pQTLs to meQTL.5a (32 cis-pQTLs). There was more consistency between pQTLs and eQTLs, and Hsd17b4, Psmb8, Psmb9, and Psmb10 had trans-eQTLs and trans-pQTLs, and Pebp1 had cis-eQTL and cis-pQTL in meQTL.5a (Data S3). Similarly, the overlap between eQTLs and pQTLs was higher for Eaa19. In total, 138 protein variants (57 cis) mapped to Eaa19, and of these, Abcc2, Cutc, Cyp2c70, Gsto, and Sfxn2 had cis-acting QTLs for both mRNA and protein, and Cyp1a1 and Naga had trans-eQTLs and trans-pQTLs (Data S4).

Genetic modulation of co-methylation networks in mouse liver

We applied a weighted gene co-methylation network analysis (WGCNA) to evaluate whether the meQTL hotspots could be detected at the network level [Citation51–54]. WGCNA was carried out on the set of ~28K CpGs. At a soft-threshold power of 6, the CpGs were grouped into 14 modules that range in size from 62 to 13821 CpG members that form tightly correlated networks (Data S5; Fig S1a; the module membership for each of the CpGs is in Data S2). For each module, the module eigengene (ME) is the top principal component of the co-methylation network and is the representative methylation pattern [Citation51]. The inter-module correlations between the MEs provide a view of the covariance among the CpG networks (i.e., meta-network) (Data S5 and displayed in Fig S1b). The MEs can also be tested for association with major variables such as age and diet, and this is a convenient way to assess the network-level impact of these variables [Citation54]. Unsurprisingly, age was a significant correlate of the CpG networks, and 6 of the 14 modules were significantly correlated with age (p < 0.001, |r| ≥0.18; Data S5). Of these, the Green module (2092 CpG members), followed by the Lightgreen module (1761 CpGs), had the tightest correlation with age (r = 0.69 and 0.49, respectively; Data S5 and Fig S1c, S1d). The large Blue module with 5067 CpG members was significantly anti-correlated with age (r = −0.34).

Our primary focus is on the genetic modulation of these CpG networks, and we performed QTL mapping for each of the MEs with age, diet, and body weight as co-factors. The module-level QTL mapping was done using the Genome-wide Efficient Mixed Model (GEMMA) algorithm implemented on the webtool GeneNetwork [Citation55–57]. The strongest QTL was for the small Royalblue module, which mapped at LOD = 27 to distal Eaa19 (QTL plots for select modules in ; full QTL results in Data S6 and Fig S2). The age-associated modules, Blue and Lightgreen, also had modest QTLs in Eaa19 (). The large Blue module that is anticorrelated with age mapped at a LOD = 4.6 to meQTL.5a (). The Black and age-associated Lightgreen modules also had modest QTLs in meQTL.5a (Data S6; ). For the Royalblue module, 45 of the 62 CpGs members were located in Eaa19, and this module mostly represents a network of CpGs that are cis-modulated by variants in Eaa19. In contrast, only 29 of the 5067 CpGs in the Blue module were located in meQTL.5a and indicates that the Blue ME captures CpGs that have shared covariance due to a trans-effect from meQTL.5a. The Blue module members include 443 of the 502 CpGs that had genome-wide peak trans-meQTLs at LOD ≥ 3.5 in meQTL.5a.

Figure 3. Genetics of co-methylation CpG networks. QTL maps for four module eigengenes (MEs) are shown. Mapping was done using a linear mix model. The horizontal dashed red line marks a relatively lenient threshold of –log10p = 3.5. The Blue, black and Lightgreen modules share suggestive overlapping QTLs in meQtl.5a. Eaa19 has a strong cis-regulatory effect on the Royalblue module. The chromosome 2 peak for the black module is proximal to meQTL.2b in .

Figure 3. Genetics of co-methylation CpG networks. QTL maps for four module eigengenes (MEs) are shown. Mapping was done using a linear mix model. The horizontal dashed red line marks a relatively lenient threshold of –log10p = 3.5. The Blue, black and Lightgreen modules share suggestive overlapping QTLs in meQtl.5a. Eaa19 has a strong cis-regulatory effect on the Royalblue module. The chromosome 2 peak for the black module is proximal to meQTL.2b in Table 1.

Overall, the WGCNA shows that multiple distal CpGs can form tightly correlated networks partly due to shared genetic modulation, and partly due to variables such as age and diet. The network analysis once again highlights meQTL.5a as a CpG regulatory hotspot.

Characterizing the CpGs trans-modulated by meQTL.5a

To uncover common biological pathways among the set of trans-CpGs linked to meQTL.5a, we performed a genomic regions enrichment analysis using the GREAT tool [Citation58,Citation59]. Compared to the background array, the 502 trans-CpGs were highly enriched in developmental and cell differentiation genes (Data S7; ). This included several genes involved in early development, as well as epigenetic regulators such as the histone demethylase, Kdm6b, the histone methyltransferase, Prdm16, and the methyl-CpG binding gene, Tet3 [Citation60–63]. Other important developmental genes associated with the trans-modulated CpGs include Sox2, Sox5, Foxp2, and Esrrb [Citation64–66]. The meQTL plots for a few of these are shown in Fig S3. The trans-modulated CpG regions were enriched in promoter motifs including sequences that are downstream targets of the hepatocyte nuclear factor 4, alpha (HNF4A). Additionally, the FOXA1 (HNF3A) TF network was an enriched pathway among the meQTL.5a trans-CpGs. A regional enrichment analysis for the CpGs in the Blue module highlighted the same pathways and TF networks (Data S7; ), and this collectively suggests that the CpGs modulated by meQTL.5a are related to development, and may be targeted by related DNA binding factors, particularly the hepatocyte nuclear factors. In terms of genomic context, compared to the background array, the trans-CpGs targeted by meQTL.5a were highly enriched in predicted enhancer states (e.g., En-Pd, En-Pp, En-Sd, En-Sp, and En-W) [Citation6,Citation49,Citation50] and were mostly located in introns (Table S4).

Figure 4. Enriched functional pathways genomic regions enrichment analysis of (a) CpGs that are trans-modulated by meQTL.5a, and (b) CpGs that are members of the Blue module. Gene ontology and pathway enrichment among (c) transcripts, and (d) proteins that map to meQTL.5a in liver.

Figure 4. Enriched functional pathways genomic regions enrichment analysis of (a) CpGs that are trans-modulated by meQTL.5a, and (b) CpGs that are members of the Blue module. Gene ontology and pathway enrichment among (c) transcripts, and (d) proteins that map to meQTL.5a in liver.

To test if we find intersecting biological functions in the transcriptome and proteome, we performed a gene ontology (GO) enrichment analysis of the trans-modulated mRNAs and proteins that map to meQTL.5a (Data S8). There were no functionally enriched categories after FDR correction. At a nominal p-value, the top 10 GO (for biological processes) and top KEGG pathways for both the trans-modulated proteins and transcripts were related to protein transport and protein catabolism (; Data S8). The trans-pQTLs were also nominally enriched in telomere maintenance, and the trans-eQTLs in mitophagy and autophagy. While not enriched, there were a few notable development genes among the transcripts with trans-eQTL in meQTL.5a. This includes Notch1, Hdac5, and the methyl-CpG binding gene Mbd3 (Fig S3) [Citation67–69]. However, there was limited overlap in functional pathways between the trans-modulated CpGs and trans-modulated mRNAs/proteins.

High-priority candidate genes in meQTL.5a

The peak markers in linkage disequilibrium within the meQTL.5a interval are between 114.5 and 116.5 Mb on Chr5 (Data S2). This is the location of genes such as the hepatic TF Hnf1a, the sirtuin gene Sirt4, heat shock protein Hspb8, and the coenzyme Coq5 (). For candidate gene ranking, we narrowed to the 114.5–116.5 Mb peak interval and retained positional candidate genes that (1) have missense or protein truncating variants that segregate in the BXDs, and/or (2) are modulated in expression by a cis-eQTL. This identified 20 positional candidates located in the peak region of meQTL.5a (). Fourteen of these had missense mutations, and we further used the SIFT (Sorting Intolerant From Tolerant) score to predict the potential deleterious effects on protein function [Citation70]. SIFT scores range from 0 to 1 with low values (<0.05) predicted to be deleterious. Variants with low SIFT scores are in Oasl2, Srsf9, Pxn, Rab35, Cit, and Prkab1, and the variant in Hnf1a also had a comparatively low SIFT score ().

Table 2. Candidate genes in meQtl.5a.

Our next goal was to determine which of these candidate genes formed the most cohesive functional network(s) with the trans-modulated CpGs. For this, we took the list of genes cognate to the trans-CpGs (i.e., gene in which the CpG is located or the nearest gene if CpG is intergenic) and the list of positional candidates, and searched the STRING database for protein–protein interactions (PPI) [Citation71]. This resulted in a large and highly connected network with an average node degree of 4.21 (PPI enrichment p < 1e-16), and a high enrichment in developmental genes. The central hub was around the trans-modulated Crebbp, which had the highest degree of connected nodes at 47 (Fig S4). In this CpG-based network, the candidate gene with the highest degree of connections was Hnf1a (13 nodes; Table S5). Notably, HNF1A is one of the direct interaction partners of CREBBP [Citation72]. The most enriched KEGG pathway was ‘maturity onset diabetes of the young’ or MODY (mmu04950), and six members in the central hub were members of this pathway (). This included the positional candidate, HNF1A, which is the causal gene for MODY3 [Citation47]. Enriched GO terms included metabolic processes, cell differentiation, and developmental processes (Data S9). PXN was another candidate gene in meQTL.5a with high connectivity, but it formed a more peripheral cluster (Fig S4). Other candidate genes such as Sirt4, Coq5, Oasl2 and Hspb8 had connections of only 0 to 2 (Table S5). A CpG located in an exon of the hub gene, Crebbp (cg27201505), mapped as a strong trans-meQTL to meQTL.5a. However, in terms of how this relates to gene expression, the Crebbp transcript had low expression in adult liver, and the mRNA had only a weak eQTL in meQTL.5a (–Log10p = 2.09 at the meQTL.5a peak interval) (). Similarly, a CpG (cg12712768), located in the promoter of Hnf1a, mapped as a strong cis-meQTL, but the Hnf1a mRNA only had weak evidence of cis-modulation (). The cis-modulated CpG in Hnf1a had a strong positive correlation with the trans-modulated CpG in Crebbp (), and there was also a strong positive correlation between their transcripts (). However, the CpG-to-mRNA correlation between them was relatively modest, and although CREBBP formed the central node in the PPI network, the expression of Crebbp was uncorrelated with its cognate CpG, and instead, the Crebbp transcript had a modestly significant inverse correlation with the Hnf1a CpG (). The Hnf1a transcript was also modestly correlated with its CpGs ().

Figure 5. Interaction networks that connect the trans-meQTLs to the candidate genes in meQTL.5a. (a) the trans-modulated CpGs and candidate genes in meQTL.5a form a highly connected and functionally enriched protein–protein interaction network. HNF1A (yellow triangle) is the most connected candidate in the central sub-network, and member of the enriched MODY (maturity onset diabetes of the young) pathway (red nodes). (b) CREBBP is the hub gene in the meQTL-based network, and a CpG in its exon is trans-modulated by meQtl.5a (top). Its mRNA (bottom of mirrored Manhattan plot) has a weak peak in meQTL.5a. Crebbp is located on chromosome 16 (red triangle). (c) the CpG in the 5’UTR of Hnf1a is cis-modulated, and its expression has a weak cis-eQTL. Strong positive correlation between the CpGs (d), and between the mRNAs (e) of Hnf1a and Crebbp. Weak inverse correlation between (f) the Hnf1a mRNA and Crebbp methylation, and (g) between the mRNA and methylation of Hnf1a.

Figure 5. Interaction networks that connect the trans-meQTLs to the candidate genes in meQTL.5a. (a) the trans-modulated CpGs and candidate genes in meQTL.5a form a highly connected and functionally enriched protein–protein interaction network. HNF1A (yellow triangle) is the most connected candidate in the central sub-network, and member of the enriched MODY (maturity onset diabetes of the young) pathway (red nodes). (b) CREBBP is the hub gene in the meQTL-based network, and a CpG in its exon is trans-modulated by meQtl.5a (top). Its mRNA (bottom of mirrored Manhattan plot) has a weak peak in meQTL.5a. Crebbp is located on chromosome 16 (red triangle). (c) the CpG in the 5’UTR of Hnf1a is cis-modulated, and its expression has a weak cis-eQTL. Strong positive correlation between the CpGs (d), and between the mRNAs (e) of Hnf1a and Crebbp. Weak inverse correlation between (f) the Hnf1a mRNA and Crebbp methylation, and (g) between the mRNA and methylation of Hnf1a.

We performed a similar PPI analysis for the lists of genes with trans-eQTLs and trans-pQTLs in meQTL.5a. The trans-modulated mRNAs formed a network with an average node degree of 2.42 (PPI enrichment = 1e-06; Fig S5). The trans-modulated proteins formed a smaller but highly connected network with an average node degree of 2.34 (PPI enrichment = 5.2e-10; Fig S6). At both the transcriptomic and proteomic levels, HNF1A no longer occupied a central position (only one degree of connection for HNF1A in both), and instead, OASL2, a member of the interferon-induced signalling pathway [Citation73], was the most connected positional candidate for networks defined from the trans-eQTL and trans-pQTL (Fig S5, Fig S6). The functional profiles of the networks were also altered, and the most enriched KEGG pathway in the eQTL-based PPI network was autophagy, and the pQTL-based PPI network was enriched in metabolic pathways (Data S8). The eQTL and pQTL networks shared similarities; for instance, there was a clique of proteasome subunits (PSMB8, PSMB9 and PSMB10) connected to OASL2, and suggests overlapping interactional and regulatory networks at the transcriptomic and proteomic levels that are disconnected from the developmental networks at the methylome level. Overall, this suggests that Hnf1a is a strong positional candidate for the trans-meQTLs, but not for the pathways that connect the trans-modulated expression traits.

While mitochondria-related genes such Sirt4 and Coq5, and autophagy genes such as Hspb8 and Prkab1 are very relevant to ageing and the epigenome [Citation74–76], these were peripheral nodes with low connectivity within the CpG-based network (; Table S5; Fig S4). Instead, the PPI network built from the trans-modulated CpGs highlighted the MODY sub-network as an important regulatory node. Due to the apparent centrality of HNF1A within the CpG network, we searched the STRING database for the top 10 high-scoring interaction partners for HNF1A (). The present array targets only a few highly conserved CpGs in each of these genes. However, even with this sparse profiling of CpGs, six of the top 10 PPI interaction partners of HNF1A mapped as trans-meQTLs to meQTL.5a (two members, PCBD1 and PPARA, did not have meQTL data as no CpG probes in the mammalian array targeted these genes). We performed pair-wise expression correlations for these 10 interaction partners and Hnf1a using the liver RNA-seq data, and the transcripts formed a highly interconnected network in which the mRNA for Hnf1a was connected to 9 of the 10 PPI-based members at |r| ≥ 0.5 (). As was the case for the Crebbp and Hnf1a transcripts, the mRNAs of Foxa2 () and Hnf4a also mapped as weak trans-eQTLs to the same interval (GEMMA-based linkage statistics in Data S6). For Gata4, its mRNA had a relatively strong trans-eQTL in meQTL.5a ().

Figure 6. Primary protein–protein interaction partners of HNF1A and their trans-modulation (a) the network shows the top 10 interaction partners of HNF1A based on protein–protein interactions. CpGs in CREBBP, FOXA2 (HNF3B), GATA4, HNF4A, ONECUT1, and PCBD2 are trans-modulated by the meQTL.5a locus, making Hnf1a a prime candidate. (b) at the transcriptomic level, expression of these genes in the liver are also highly intercorrelated. Only correlations |r| > 0.45 are shown (line thickness conveys strength of correlation; blue: negative; red: positive). Mirrored meQTL (top), and eQTL (bottom) for two example gene: (c) Foxa2, and (d) Gata4. Red triangles mark the location of genes.

Figure 6. Primary protein–protein interaction partners of HNF1A and their trans-modulation (a) the network shows the top 10 interaction partners of HNF1A based on protein–protein interactions. CpGs in CREBBP, FOXA2 (HNF3B), GATA4, HNF4A, ONECUT1, and PCBD2 are trans-modulated by the meQTL.5a locus, making Hnf1a a prime candidate. (b) at the transcriptomic level, expression of these genes in the liver are also highly intercorrelated. Only correlations |r| > 0.45 are shown (line thickness conveys strength of correlation; blue: negative; red: positive). Mirrored meQTL (top), and eQTL (bottom) for two example gene: (c) Foxa2, and (d) Gata4. Red triangles mark the location of genes.

While we certainly cannot dismiss the other genes highlighted in , Hnf1a stands out as a strong candidate for the trans-meQTLs. Our observations suggest that meQTL.5a modulates the methylation and, to a lesser extent, the expression of genes that functionally interact with HNF1A. The missense mutation in Hnf1a (rs33234601) results in a proline to serine substitution, with proline as the conserved amino acid across most mammals (based on the comparative genomics track on the UCSC Genome Browser) [Citation77].

Interaction with ageing, diet, and potential impact on longevity

We next examined how the trans-CpGs interface with ageing and diet, and how these may potentially influence physiological traits and lifespan. Based on the level of overlap with DMCs that we have previously defined [Citation6], the CpGs trans-modulated by meQTL.5a had threefold higher enrichment in age-gain CpGs (hypergeometric p = 5.4e-92). Notably, these age-gain CpGs included the developmental genes, and some of the known interaction partners of HNF1A (e.g., Tet3, Sox2, Prdm16, Kdm6b, Hnf4a, Foxa2, etc.). There were also significant enrichments in weight- and diet-CpGs and modest enrichment in age-loss CpGs, but no DMCs related to strain-dependent life expectancy (; Table S6). Since 502 of the 2088 trans-meQTLs (24%) are in meQTL.5a, we find that after excluding the 502 CpGs, the remaining trans-CpGs are no longer enriched in age-gain CpGs (see ), and instead, the trans-CpGs become significantly depleted in both age-gain and age-loss CpG (Table S2). The comparison of DMC enrichment among the trans-CpGs with and without the meQTL.5a modulated set indicates that the trans-modulation of age-associated CpGs is specific to meQTL.5a. Intriguingly, for the age-dependent trans-CpGs, whether a site gained or lost methylation with ageing depended on the allele effect of meQTL.5a (). Trans-CpGs with D positive additive effect were more likely to gain methylation with age, whereas the few trans-CpGs with higher methylation for the B allele were associated with a decrease in methylation with ageing (; Data S2). This is not due to spurious co-segregation between age and genotype in meQTL.5a since there is no difference in mean age between the samples with the DD genotype (421 ± 170 days) and those with the BB genotype (425 ± 184) (Data S1). This pattern of allele-dependent effect of age is exemplified by the CpG located in the 5’UTR of Jarid2, a canonical member of the Polycomb-Repressive Complex 2 (PRC2) () [Citation78]. This CpG gained methylation with age, and across all ages, mice with the D allele in meQTL.5a tended to have higher methylation. An meQTL map for the Jarid2 CpG using GEMMA is displayed in . The expression of Jarid2 in adult liver was low and showed no covariance with age, but the mRNA mapped as a weak trans-eQTL to meQTL.5a (p = 0.01). We have previously reported that being fed HFD augments the age-dependent gains in methylation such that HFD results in a more aged methylome [Citation6,Citation45]. For the meQTL.5a trans-CpGs, all the CpGs associated with higher methylation for the D allele were also increased in methylation by HFD (). These trans-CpGs with higher methylation for the D allele were also more likely to inversely correlate with body weight (). Overall, this pattern suggests a genotype-dependent susceptibility of CpGs to the effects of ageing and diet.

Figure 7. Joint modulation of CpGs by genetic and non-genetic variables (a) CpGs with trans-meQTLs in meQTL.5a are enriched in differentially methylated CpGs (DMCs) associated with aging, diet, and body weight. Asterisks denote hypergeometric enrichment p ≤ 0.001; hash denotes p = 0.006. (b) trans-CpGs that are increased in methylation by the D allele in meQTL.5a gain methylation with age (positive regression estimate on the y-axis) while those with higher methylation for the B allele lose methylation with age. Dashed red horizontal line indicate Bonferroni p ≤ 0.05 for DMC. (c) CpG in the Jarid2 5’UTR gains methylation with age and has higher methylation in BXDs with the DD genotype in meQTL.5a. (d) QTL plots for the Jarid2 CpG (top Manhattan plot), and mRNA (bottom). Red triangle marks the location of Jarid2. (e) allele dependent increase in methylation at the meQTL.5a trans-CpGs due to HFD. (f) allele dependent association with body weight for the trans-CpGs. (g) Overall mean methylation of the trans-modulated CpG is higher for BXDs with DD genotype in meQTL.5a for both control diet (CD; 0.35 ± 0.02 for BB; 0.38 ± 0.03 for DD; pair-wise p < 0.0001) and high fat diet (HFD; 0.36 ± 0.03 for BB; 0.38 ± 0.03 for DD pair-wise p < 0.005). (h) allele dependent increase in mean methylation for the CpGs with trans-meQTL in meQTL.5a. (i) body weight is slightly higher for BXDs with BB in meQTL.5a for both CD (27 ± 6 g for BB; 26 ± 5 g for DD; pair-wise p < 0.1) and HFD (47 ± 13 g for BB; 42 ± 13 g for DD; pair-wise p < 0.02).

Figure 7. Joint modulation of CpGs by genetic and non-genetic variables (a) CpGs with trans-meQTLs in meQTL.5a are enriched in differentially methylated CpGs (DMCs) associated with aging, diet, and body weight. Asterisks denote hypergeometric enrichment p ≤ 0.001; hash denotes p = 0.006. (b) trans-CpGs that are increased in methylation by the D allele in meQTL.5a gain methylation with age (positive regression estimate on the y-axis) while those with higher methylation for the B allele lose methylation with age. Dashed red horizontal line indicate Bonferroni p ≤ 0.05 for DMC. (c) CpG in the Jarid2 5’UTR gains methylation with age and has higher methylation in BXDs with the DD genotype in meQTL.5a. (d) QTL plots for the Jarid2 CpG (top Manhattan plot), and mRNA (bottom). Red triangle marks the location of Jarid2. (e) allele dependent increase in methylation at the meQTL.5a trans-CpGs due to HFD. (f) allele dependent association with body weight for the trans-CpGs. (g) Overall mean methylation of the trans-modulated CpG is higher for BXDs with DD genotype in meQTL.5a for both control diet (CD; 0.35 ± 0.02 for BB; 0.38 ± 0.03 for DD; pair-wise p < 0.0001) and high fat diet (HFD; 0.36 ± 0.03 for BB; 0.38 ± 0.03 for DD pair-wise p < 0.005). (h) allele dependent increase in mean methylation for the CpGs with trans-meQTL in meQTL.5a. (i) body weight is slightly higher for BXDs with BB in meQTL.5a for both CD (27 ± 6 g for BB; 26 ± 5 g for DD; pair-wise p < 0.1) and HFD (47 ± 13 g for BB; 42 ± 13 g for DD; pair-wise p < 0.02).

To explore this further, we computed the mean methylation value for all 502 trans-CpGs targeted by meQTL.5a. As expected, the DD genotype in meQTL.5a had higher average methylation than the BB genotype for both diet groups () and similar to Jarid2, the mean methylation increased with age but was overall higher for the DD genotype (). Body weight was not directly correlated with the mean methylation of the trans-CpGs, despite the enrichment in DMCs inversely correlated with body weight among the trans-CpGs. A multivariable regression showed that the strongest predictor of mean methylation of the trans-CpGs was age, followed by the meQTL.5a genotype, and then diet, but not body weight (). If we treat body weight as the outcome variable, there is only a slightly higher weight for the BB genotype (), and a multivariable regression shows a modestly significant association between weight and meQTL.5a (). This suggests a pleiotropic effect of meQTL.5a on DNAm and body weight, with contrasting allelic effects.

Table 3. Multivariable regressions for mean methylation and weight.

We obtained bodyweight and lifespan data from a different BXD cohort that were allowed to survive until natural mortality [Citation37]. We segregated the samples into two groups based on homozygous genotype at the meQTL.5a marker, rs29733222: BB (n = 801) or DD (n = 972), and tested two predictions: (1) that the BB genotype in meQTL.5a will be associated with higher body weight at age 6 months, and (2) although higher body weight is associated with shorter lifespan, we predicted that at this locus, based on a more ‘aged methylome’, the DD genotype will have slightly shorter lifespan. As the longevity cohort has no DNAm data, we could not directly verify whether the DD in this group indeed has a more aged methylome, and this is purely an assumption based on the meQTL data. As predicted, the BB genotype had significantly higher mean body weight (, ; histograms of body weight distribution by genotype are provided in Fig S7). Also, consistent with prediction, the DD genotype was associated with a slightly higher risk of death between the ages of 650 and 1100 days compared to the BB genotype, but only in the CD group (Log-Rank p = 0.008 for CD; Log-Rank p = 0.57 for HFD; ; histograms in Fig S7). This small difference in the CD group resulted in a median lifespan of 702 days and maximum LS of 1250 days in the BB genotype (n = 399), compared to a median of 685 days and maximum of 1197 days in the DD genotype (n = 510). Using a multivariable regression model with genotype, diet, and body weight at 6 months as predictors, we find that the strongest predictors of lifespan were weight at 6 months, followed by diet, and then by the meQTL.5a genotype (). Given the strong association between body weight at a young age and longevity [Citation37], the lifespan advantage for the BB genotype becomes more apparent when we adjust the longevity data for the weight at 6 months (). For the CD mice, after adjusting for weight, the BB mice were predicted to have a median lifespan that was 46 days longer than the DD mice (log-rank p < 0.0001). In HFD, the median lifespan of BB was only 7 days longer than that of DD mice (log-rank p < 0.03). Our results suggest a pleiotropic effect of a genetic locus on multiple traits that are also modified by diet and ageing. Here, we use the terminology by Tyler et al. [Citation79,Citation80], and the model in depicts a horizontal pleiotropic effect on body weight and CpG methylation that can moderate the association between this locus and lifespan. We suggest a modest vertical pleiotropic effect on lifespan that is mediated by the methylome and the genotype with the less aged methylome (BB) having a slight lifespan advantage.

Figure 8. Pleiotropic influence on meQTL.5a on body weight at young age and lifespan (a) body weight at 6 months (mos) from a separate cohort of BXD mice show higher mean weight for strains with BB genotype in meQTL.5a for control diet (CD; 26 ± 6 g for BB; 25 ± 5 g for DD; pair-wise p < 0.004) and high fat (HFD; 34 ± 9 g for BB; 31 ± 8 g for DD; pair-wise p < 0.0001). Samples numbers: 383 BB and 503 DD for CD; 392 BB and 457 DD for HFD. (b) Kaplan-Meir survival plots by genotype at meQTL.5a for CD mice (399 BB and 510 DD). Median lifespan in days (MedianLS) for the genotypes shown (log-rank p = 0.008). (c) similar survival plot for HFD shows no significant difference between genotypes (402 BB and 462 DD). (d) Kaplan-Meir survival after adjusting lifespan for 6 mos weight. Adjusted median lifespan in days shown for each genotype-by-diet below the graph. Within each diet, the BB genotype has longer lifespan compared to DD (based on pairwise comparison, log-rank p < 0.0001 for CD, and p = 0.03 for HFD). (e) model depicting horizontal pleiotropic influence on CpG methylation and weight, and vertical pleiotropic influence on lifespan mediated by CpG, which are also under the influence of ageing and diet.

Figure 8. Pleiotropic influence on meQTL.5a on body weight at young age and lifespan (a) body weight at 6 months (mos) from a separate cohort of BXD mice show higher mean weight for strains with BB genotype in meQTL.5a for control diet (CD; 26 ± 6 g for BB; 25 ± 5 g for DD; pair-wise p < 0.004) and high fat (HFD; 34 ± 9 g for BB; 31 ± 8 g for DD; pair-wise p < 0.0001). Samples numbers: 383 BB and 503 DD for CD; 392 BB and 457 DD for HFD. (b) Kaplan-Meir survival plots by genotype at meQTL.5a for CD mice (399 BB and 510 DD). Median lifespan in days (MedianLS) for the genotypes shown (log-rank p = 0.008). (c) similar survival plot for HFD shows no significant difference between genotypes (402 BB and 462 DD). (d) Kaplan-Meir survival after adjusting lifespan for 6 mos weight. Adjusted median lifespan in days shown for each genotype-by-diet below the graph. Within each diet, the BB genotype has longer lifespan compared to DD (based on pairwise comparison, log-rank p < 0.0001 for CD, and p = 0.03 for HFD). (e) model depicting horizontal pleiotropic influence on CpG methylation and weight, and vertical pleiotropic influence on lifespan mediated by CpG, which are also under the influence of ageing and diet.

Table 4. Multivariable regressions for body weight at 6 months of age and strain lifespan.

Phenome-wide association analysis for Hnf1a

The BXDs have accrued a vast collection of traits over decades, and we next performed a phenome-wide association analysis (PheWAS) to identify other higher-order traits that may be modulated by the meQTL.5a interval [Citation81]. Note that although we used ‘Hnf1a’ as the term in the PheWAS search [Citation82], the results are from family-based linkage mapping (not allelic associations), and the linkages are to relatively large QTL intervals close to the Hnf1a locus, and not to a Hnf1a allele. At -Log10p ≥ 3, there were 10 BXD traits that included one immune related phenotype, four traits related to the nervous system and five metabolic traits related to fat content and amino acid ratios (). The strongest QTL was for susceptibility to rickettsia infection [Citation83]; however, the peak region for this trait was proximal to the meQTL.5a interval (~104 Mb; GEMMA-based linkage statistics is Data S6). The brain-related traits were also little distant from meQTL.5a (at ~107 Mb for brain activity measured by Ito et al. [Citation84] and at 117 Mb for cell proliferation [Citation85]). The metabolic traits peaked at the meQTL interval and these included measures of fat content in liver and ratio of branched chain amino acids to total amino acids (Data S5, Fig S8) [Citation86–88]. Although the QTLs for the metabolic traits are suggestive, they do indicate a potential role for the meQTL.5a region in higher-order metabolism, and similar to the higher body weight, the BB genotype had higher liver fat content.

Table 5. Phenotypes that map to meQtl.5a.

Complementing the murine family-based PheWAS, we also searched for GWAS hits associated with variants in HNF1A using two GWAS databases: GWAS Atlas, and the NHGRI-EBI GWAS Catalog [Citation89,Citation90]. At minimum p < 1e-05, the GWAS Atlas identified 85 traits, and the GWAS Catalog identified over 400 traits associated with variants in HNF1A and HNF1A-AS1, the antisense RNA transcribed from HNF1A (Data S10, Data S11). The trait with the strongest association was C-reactive protein levels, a measure of inflammation and cardiovascular health, followed by levels of Gamma glutamyl transpeptidase (GGT), a measure of liver damage [Citation91–94]. These were followed by lipid levels and coronary artery disease [Citation95–97]. Other traits associated with HNF1A included age at puberty, birth weight, and diabetes [Citation98–102]. While not a replicated genome-wide significant hit, a variant near HNF1A (rs6489785) is reported to be one of the 37 ‘longevity SNPs’ that have a small effect on human lifespan [Citation103]. This is consistent with our model, where the meQTL.5a locus could make an indirect and modest contribution to lifespan variation.

Discussion

We have provided an overview of the regulatory loci that influence methylation of conserved CpGs in the murine liver. Overall, the results show complex interrelationships and genetic pleiotropy that influence DNA methylation and physiological traits. As expected, cis-meQTLs are associated with higher LOD scores compared to the trans-meQTLs [Citation16]. Given the strong genetic contribution to the variance of cis-CpGs, the cis-modulated CpGs were depleted in DMCs related to ageing and diet. In contrast, the trans-CpGs were enriched in DMCs related to both genetic traits (lifespan, body weight) and non-genetic variables (ageing and diet). In terms of genomic location, the cis-CpGs were enriched in intergenic sites, which is consistent with reports from human studies [Citation10,Citation17], and also in intronic regions, and were highly depleted in 5’UTR and bivalent promoter states (PrB), which are regions that are strongly modified by ageing and typically show methylation gains over time [Citation6]. This suggests that ageing has a limited impact on CpGs that are under strong cis-modulation. On the other hand, trans-CpGs have multifactorial variations and could present key sites for gene-by-environment interactions.

The cohort we used for the meQTL mapping consisted of 41 BXD progeny strains, F1 hybrids, and the parent strains, and had a variable number of biological replicates per genotype. Due to this, we performed the meQTL mapping using a stringent regression model that adjusted for age, diet, weight, and genetic relatedness, and corrected for unmeasured variance by including the top principal component as a cofactor. The use of linear mixed models improves both precision and power [Citation46]. Nonetheless, due to the modest sample size and the family-based design, we used a rather relaxed threshold of LOD ≥ 3.5 for both cis- and trans-effects. If we increase the stringency for the trans-meQTLs to LOD ≥ 4.5, then only 309 strong trans-effects remained and 76 of these (i.e., nearly 25%) were in meQTL.5a. The relaxed statistical threshold is a caveat to keep in mind. For additional evidence, we evaluated the trans-meQTLs for biological coherence and overlap with eQTLs. For instance, many of the immediate interaction partners of HNF1A have weak trans-eQTLs overlapping the trans-meQTLs. The other strategy we used was to reduce the dimensionality of the methylome data by performing an unsupervised clustering of the CpGs into modules and treating the module eigengenes as the representative quantitative traits, and this too supported meQTL.5a as a modulator of functionally connected CpG networks.

The meQTL hotspot Eaa19 is also noteworthy. Eaa19 is linked to epigenetic clock acceleration, and potentially influences susceptibility to entropy accumulation in the liver methylome [Citation6]. In our 2022 paper [Citation6], we identified several candidate genes in Eaa19. However, like meQTL.5a, Eaa19 is gene dense and harbours several positional candidates. In the present work, we focused on meQTL.5a, and used several strategies to prioritize the most plausible candidates. There are several candidate genes within this interval, including a few that contain variants predicted to be deleterious (e.g., Oasl2, Srsf9, Pxn, Rab35, Cit, Prkab1). Among these, Prkab1 codes for the regulatory subunit of AMP-activated protein kinase (AMPK), and the AMPK signalling pathway, along with the Sirtuins, play important roles in metabolic regulation and ageing, and possibly in epigenomic maintenance [Citation104,Citation105]. The other relevant positional candidates include Sirt4 and Coq5, both of which map as strong cis-eQTLs in the liver [Citation74,Citation75]. Another candidate gene, Oasl2, involved in the interferon-induced signalling pathway [Citation73], forms a central hub in the transcriptome and proteome-based networks, but not in the CpG-based network. These alternate candidate genes cannot be dismissed; however, our analysis of the trans-modulated CpGs, and the high enrichment of MODY genes, led us to Hnf1a as a functionally highly relevant prime suspect in meQTL.5a, and one of the potential links between development, ageing, and the epigenome.

Developmental genes, the methylome, and ageing

The list of genes with CpGs trans-modulated by meQTL.5a included several key players in early development and cell differentiation (e.g., Kdm6b, Prdm16, Sox2, Foxp2, Esrrb, Tet3, Foxa2, etc.). The prime candidate in meQTL.5a, Hnf1a, is a member of the hepatocyte nuclear factor family of TFs, and primarily expressed in the liver, kidney, and pancreas [Citation106]. The other HNF members include HNF4A, FOXA2 (aka, HNF3B), and ONECUT1, which all have trans-meQTLs in meQTL.5a. During embryonic development, the HNFs and GATA TFs participate in complex autoregulatory networks that modulate the spatial and temporal expressions of downstream genes [Citation25,Citation106,Citation107]. Targets of HNF1A include the metabolic and longevity gene, Igf1 (insulin-like growth factor 1) [Citation107,Citation108]. MODY3, which is caused by mutations in HNF1A, is the most common form of maturity onset diabetes of the young [Citation109,Citation110]. HNF1A mutations also lead to dysregulation in fatty acid synthesis and transport, which can cause fatty acid accumulation in the liver [Citation106]. A GWAS study also found that a variant in HNF1A (rs6489785) is one of the 169 variants that jointly contribute to human longevity [Citation103]. Some mutations in HNF1A do not cause MODY but increase the susceptibility to type 2 diabetes and lower BMI [Citation111,Citation112]. In mice, the deletion of Hnf1a causes Laron dwarfism and hyperglycaemia [Citation113–115].

Although HNF1A is not a direct regulator of DNAm, there is some intriguing evidence that it contributes to the epigenetic state. For instance, deletion of Hnf1a in mice causes a change in the local chromatin structure and affects the spatial location of its target regions in the nucleus [Citation116]. Furthermore, a study from 2008 showed that CpGs located in HNF1A binding motifs were hypomethylated in the liver and had tissue-dependent differential methylation that correlated with gene expression [Citation117]. This suggests that the binding affinity of HFN1A at these sites could influence CpG methylation. Generally, the binding of protein factors (e.g., GATA6, CTCF and REST) to motifs that contain CpGs results in low methylation [Citation22,Citation25]. In the case of the BXDs, the D2 allele in rs33234601 (Pro423Ser) is the unusual variant as almost all vertebrate species have a proline at this amino acid position, and only few have serine (e.g., squirrel, elephant; based on the Vertebrate Multiz Alignment track in the UCSC Genome Browser) [Citation77]. The expression of Hnf1a has a modest cis-eQTL that is associated with a positive additive effect for the B allele at meQTL.5a.

Many of these CpGs that are trans-modulated by meQTL.5a are characterized by a low methylation profile (‘hypomethylated’ with methylation beta-scores closer to 0) and an increase in methylation with ageing (illustrated by the Jarid2 CpG and the mean methylation of the trans-CpGs in ). Since binding by TFs generally result in lower methylation at the biding motifs [Citation22,Citation25], perhaps the D variant of HNF1A has a lower DNA binding affinity, and the BXD strains with DD at meQTL.5a could begin life with heightened methylation at the target sites. If we consider this in terms of epigenetic entropy, then a hypomethylated state presents a low-entropy landscape [Citation6]. For the DD genotype, however, the methylation beta-values at these CpGs will be closer to 0.5 compared to BB, and will approach a more random epigenetic state at an earlier age compared to strains that have a BB genotype at meQTL.5a. In other words, rather than differences in the age-dependent rate of change, we posit that these trans-CpGs have baseline variation in methylation that is dependent on the genotype at meQTL.5a. Although speculative at this point, the baseline variation in DNAm may be due to differential binding affinities of HNF1A variants at the downstream targets that overlap the trans-CpGs, many of which are in or near other developmental genes.

The meQTL.5a locus presents an interesting intersection in the genetic modulation of genes involved in early development, and genes that are epigenetically modified with ageing. The idea that early development is linked to ageing is nothing new (de Magalhães has written excellent reviews on this) [Citation118,Citation119]. According to the developmental theory of ageing, ageing is a consequence of developmental programming, and based on this, a partial reprogramming of the epigenome has been proposed as a viable way to reset the youthfulness of cells, and potentially of the whole organism [Citation120,Citation121]. Directly relevant to this, one of the trans-modulated CpGs is located within the Sox2 gene, and SOX2 is one of the Yamanaka TFs that have been used to reset the epigenetic clock of cells, and reverse age-related conditions in vivo. [Citation122,Citation123]

An interesting feature of the Hnf1a gene is that the promoter and first intron overlap the long non-coding RNAs (lncRNA), Hnf1aos1 and Hnf1aos2 [Citation124]. The cis-regulated CpG in Hnf1a (shown in ) is in this lncRNA, and the RNA products have been shown to have a cis-acting regulatory role and implicated in cell proliferation and tumour progression [Citation124,Citation125]. Furthermore, Hnf1aos1 interacts with EZH2, the catalytic subunit of PRC2, in liver tissue [Citation126]. Genes that are regulated by PRC2 and CpG sites that interact with EZH2 are known to be highly susceptible to age-dependent increases in methylation [Citation127–129], and the lncRNA is another plausible link between Hnf1a and the epigenome. Notably, one of the strongest trans-modulated CpGs is located in Jarid2, a member of the PRC2 complex [Citation78], and we can see that while the CpG in Jarid2 gains methylation with age, the DD strains start out with higher methylation compared to the BB strains (see ). This presents links between a development TF and the PRC2 complex that suggest deeper connections between epigenesis (i.e., embryonic development), and the ageing of the epigenome.

Pleiotropy on CpGs and physiological traits

In the BXDs, low body weight at a young age predicts longer lifespan and slower epigenetic ageing [Citation6,Citation37,Citation130]. However, the meQTL.5a interval has contrasting allelic effects on body weight and lifespan. Specifically, despite the higher body weight and higher liver lipid levels for the B allele in meQTL.5a, it is the D allele that is associated with a slightly shorter lifespan. This effect on weight (specifically, lower body weight) is also seen in Hnf1a-null mice, and HNF1A variants in humans. Generally, when downstream targets of Hnf1a are deleted, it results in smaller stature and longer lifespan in both humans and mice. For instance, deficiency in growth hormone or IGF1 confers longer lifespan and healthspan [Citation131,Citation132]. In some instances of Laron syndrome , individuals exhibit insulin resistance and hyperlipidaemia but still have long lives [Citation133,Citation134]. Mouse models of Laron dwarfism also age slower and have longer lifespan [Citation135,Citation136]. However, direct deletion of Hnf1a in mice or deleterious mutations in HNF1A in humans do not appear to confer any lifespan advantage despite the mice having a form of Laron dwarfism and humans have lower BMI [Citation111–115].

We suggest that HNF1A, in addition to its role as a TF for developmental and metabolic genes, also influences the epigenome early in life, and contributes to epigenomic maintenance in adulthood and ageing. We present a model in which the meQTL.5a locus exerts horizontal pleiotropic effects on physiological traits and CpG methylation. The pleiotropic influence results in the D allele increasing methylation at sites that typically have low methylation when young, and the B allele increasing body weight and lipid levels. The methylation of the target CpGs, which are also under convergent influence of ageing and diet, then contributes to variation in survival trait, with the D allele associated with slightly shorter lifespan. In this model, lifespan is a distal complex trait that shows only a modest linkage to meQTL.5a, while the intermediate traits (the CpGs) have a stronger linkage.

In conclusion, we have identified meQTL.5a as a trans-meQTL hotspot that modulates several CpGs in trans. The pleiotropic effect of meQTL.5a could contribute to variations in body size, metabolic traits, CpG methylation and lifespan. Hnf1a is a key candidate in this locus, and the potential influence of the HNFs on the epigenomic state during development could contribute to ageing and longevity.

Methods

Description of DNAm samples and data

The data we use in this study have been previously reported, and the full data are available from NCBI Gene Expression Omnibus (GEO accession ID GSE199979) [Citation6]. In brief, these are liver DNAm data generated on the HorvathMammalMethylChip40 array from 339 mice that belong to the BXD family. Information on each animal (strain, age, weight, diet, etc.) along with all relevant variables used in this study are provided in Data S1. Detailed annotations of the CpG probes including probe sequence, genomic coordinates and locations, gene annotations, and local CpG density (computed using the ‘Repitools’ R package) [Citation137] are provided in Data S2.

Methylation QTL mapping with R/QTL2

Each CpG was mapped against 7127 informative autosomal genotype markers distributed across the autosomal chromosomes using the R/qtl2 software [Citation48]. The full methylation data are available from the NCBI Gene Expression Omnibus database (GEO accession ID GSE199979), and the genotype data used for mapping is provided as Data S12. We performed QTL mapping using a univariate linear mixed model that accounts for genetic relatedness. We first computed genotype probabilities and employed that to obtain genetic relatedness matrices (GRM), or the kinship, using a Leave One Chromosome Out (LOCO) scheme. Genome scans included age, diet and the top PC as covariates (variables provided in Data S1) and were implemented using a ‘scan1’ function with genotype probabilities as input while adjusting for relatedness outside the chromosome of interest. We next estimated the genetic effects, and genetic directions between the two genotypes were computed as (DDBB). The R codes are provided as Supplemental information (Data S13).

CpG co-methylation networks

We used the WGCNA R package to cluster the CpGs into inter-correlated modules [Citation51]. The full set of CpGs (~28K) was used for network definition. Prior to WGCNA, we performed hierarchical clustering (hclust function in R with method = ‘average’) for outlier detection and excluded one sample (UT153). WGCNA first constructs a pair-wise correlation matrix, and this was converted to a scale-free adjacency matrix using default parameters, with a soft power threshold, β = 6. The β = 6 was associated with a mean connectivity of 168, and maximum connectivity of 1560. The adjacency matrix was converted to a topological overlap matrix (TOM), and the dissimilarity matrix (1 – TOM), and the hclust() function with the ‘average’ method was used to cluster the CpGs. To group the CpGs into modules, we applied the dynamic tree cutting method (cutreeDynamic), with minimum module size = 35, and deepSplit = 2. This resulted in the 14 CpG families (aka, modules) and the grey module, which had 1284 CpGs that did not fit into the other modules. The top principal component was derived from each module and taken as the representative ME. The R codes used are provided as supplementary information (Data S14).

QTL mapping using the genenetwork web tool

Aside from the main meQTL mapping that was done using R/qtl2, additional QTL analyses were done on the web platform, GeneNetwork, which provides interface to few different mapping algorithms [Citation36,Citation57]. We used the GEMMA algorithm, which adjusts for the BXD kinship structure using linear mixed modelling [Citation55,Citation56]. The MEs from the WGCA were uploaded to GeneNetwork, and QTL mapping for each ME was done with age, diet and body weight (weight at time of tissue collection) as cofactors. Instructions on how to retrieve the ME traits on GeneNetwork are provided in Data S6. The QTL mapping for the higher-order traits identified by the PheWAS was also done using GEMMA, and for these, the data are at the strain levels (i.e., strain means), and instructions on trait retrieval are provided in Data S6.

Enrichment analysis and other statistics

As previously described, we have annotated each CpG by genomic context (i.e., intergenic, 3’UTR, intron, exon and 5’UTR) and chromatin state [Citation6,Citation49,Citation50]. For enrichment analysis, we compared the frequency of these features among the cis- and trans-modulated CpGs relative to the array background (i.e., ~28K CpGs), and enrichment or depletion p-values were calculated using a hypergeometric test (formulae provided under Table S1). In addition to genomic locations, the CpGs have been classified as differentially methylated by age, diet (high fat vs. normal lab chow) and body weight based on a multivariable epigenome-wide association analysis [Citation6]. The frequency of these differentially methylated CpGs among the cis- and trans-modulated CpGs was also compared against the array background using the hypergeometric test (the R codes are provided under Table S2). All other statistical tests (Pearson correlations, linear regression modelling, and survival analyses) were done using JMP (version 16).

Bioinformatic resources

For the meQTL.5a trans-modulated CpGs, and CpGs in the Blue module, biological functions and transcription factor motif enrichment analysis were done using the R package for Genomic Regions Enrichment of Annotations Tool (rGREAT; version 3) [Citation58,Citation59]. The base coordinate for each CpG was provided (GRCm38/mm10 reference genome), and comparison was made against the array background. For the trans-modulated mRNAs and proteins, we used the gene symbol as the identifier, and enrichment analysis was done on DAVID [Citation138]. Another enrichment analysis to connect the trans-modulated genes with the positional candidates was based on protein–protein networks, and for this, a non-redundant list of the trans-modulated genes and candidate genes was uploaded to STRING (version 11.5) [Citation139,Citation140].

For candidate gene selection, we searched for cis-eQTL in the BXD liver RNA-seq data using the GeneNetwork search tool [Citation43]. To identify protein truncating and missense variants located in the positional candidate genes, we used the Ensembl Variant Table tool for the mouse gene (GRCm38) and the Mouse Genome Informatics variant database and selected the genes that had such variants between B6 and D2 [Citation141–144]. We used the integrated systems genetics web platform to perform a PheWAS for the Hnf1a locus in the BXDs [Citation81,Citation82]. For human PheWAS, we used HNF1A as the search term and retrieved GWAS hits from two databases: the GWAS Atlas, and the GWAS Catalog [Citation89,Citation90].

Supplemental material

Supplemental Material

Download Zip (32.6 MB)

Supplemental Material

Download Zip (35.5 MB)

Acknowledgments

The present study used data generated from the biospecimen resource created by the UTHSC BXD Aging Colony (PI: R Williams), and we have benefitted greatly from Dr. Robert Williams’ vision, leadership, and enthusiasm. We thank all members of the team, especially Dr. Lu Lu, Casey Chapman, and Dr. Suheeta Roy. We thank all members of the GeneNetwork Bioinformatics team. This work was funded by the NIA NIH grant R21AG055841. The BXD Aging Colony was funded by the NIA NIH grant R01AG043930.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data used in this study are available from NCBI Gene Expression Omnibus (GEO accession ID GSE199979). Genotype data are provided as Supplemental Data.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/15592294.2023.2252631

Additional information

Funding

The work was supported by the National Institutes of Health [R21AG055841and R01AG043930].

References

  • Ehrlich M, Lacey M. DNA methylation and differentiation: silencing, upregulation and modulation of gene expression. Epigenomics. 2013;5:553–26. doi: 10.2217/epi.13.43
  • Schuettengruber B, Bourbon HM, Di Croce L, et al. Genome regulation by Polycomb and trithorax: 70 years and counting. Cell. 2017;171:34–57. doi: 10.1016/j.cell.2017.08.002
  • Yang JH, Hayano M, Griffin PT, et al. Loss of epigenetic information as a cause of mammalian aging. Cell. 2023;186:305–326 e327. doi: 10.1016/j.cell.2022.12.027
  • Trevino LS, Dong J, Kaushal A, et al. Epigenome environment interactions accelerate epigenomic aging and unlock metabolically restricted epigenetic reprogramming in adulthood. Nat Commun. 2020;11:2316. doi: 10.1038/s41467-020-15847-z
  • Donohoe DR, Bultman SJ. Metaboloepigenetics: interrelationships between energy metabolism and epigenetic control of gene expression. J Cell Physiol. 2012;227:3169–3177. doi: 10.1002/jcp.24054
  • Mozhui K, Lu AT, Li CZ, et al. Genetic loci and metabolic states associated with murine epigenetic aging. Elife. 2022;11. doi: 10.7554/eLife.75244.
  • Field AE, Robertson NA, Wang T, et al. DNA methylation clocks in aging: categories, causes, and Consequences. Molecular Cell. 2018;71:882–895. doi: 10.1016/j.molcel.2018.08.008
  • Horvath S. DNA methylation age of human tissues and cell types. Genome Bio. 2013;14:R115. doi: 10.1186/gb-2013-14-10-r115
  • Lu AT, Quach A, Wilson JG, et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging. 2019;11:303–327. doi: 10.18632/aging.101684
  • Villicana S, Bell JT. Genetic impacts on DNA methylation: research findings and future perspectives. Genome Biol. 2021;22:127. doi: 10.1186/s13059-021-02347-6
  • Lin D, Chen J, Perrone-Bizzozero N, et al. Characterization of cross-tissue genetic-epigenetic effects and their patterns in schizophrenia. Genome Med. 2018;10(1). doi: 10.1186/s13073-018-0519-4
  • Huan T, Joehanes R, Song C, et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun. 2019;10:4267. doi: 10.1038/s41467-019-12228-z
  • Schadt EE, Lamb J, Yang X, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet. 2005;37:710–717. doi: 10.1038/ng1589
  • Kraus WE, Muoio DM, Stevens R, et al. Metabolomic quantitative trait loci (mQTL) mapping Implicates the Ubiquitin Proteasome System in cardiovascular disease Pathogenesis. PLoS Genet. 2015;11:e1005553. doi: 10.1371/journal.pgen.1005553
  • Nath AP, Ritchie SC, Byars SG, et al. An interaction map of circulating metabolites, immune gene networks, and their genetic regulation. Genome Bio. 2017;18:146. doi: 10.1186/s13059-017-1279-y
  • Min JL, Hemani G, Hannon E, et al. Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet. 2021;53:1311–1321. doi: 10.1038/s41588-021-00923-x
  • Hawe JS, Wilson R, Schmid KT, et al. Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat Genet. 2022;54:18–29. doi: 10.1038/s41588-021-00969-x
  • Ma J, Joehanes R, Liu C, et al. Elucidating the genetic architecture of DNA methylation to identify promising molecular mechanisms of disease. Sci Rep. 2022;12:19564. doi: 10.1038/s41598-022-24100-0
  • Volkov P, Olsson AH, Gillberg L, et al. A Genome-wide mQTL Analysis in human adipose tissue identifies genetic variants associated with DNA methylation, gene expression and metabolic traits. PLoS One. 2016;11:e0157776. doi: 10.1371/journal.pone.0157776
  • McRae AF, Marioni RE, Shah S, et al. Identification of 55,000 replicated DNA methylation QTL. Sci Rep. 2018;8:17605. doi: 10.1038/s41598-018-35871-w
  • Mozhui K, Ciobanu DC, Schikorski T, et al. Dissection of a QTL hotspot on mouse distal chromosome 1 that modulates neurobehavioral phenotypes and gene expression. PLoS Genet. 2008;4:e1000260. doi: 10.1371/journal.pgen.1000260
  • Stadler MB, Murr R, Burger L, et al. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011;480:490–495. doi: 10.1038/nature10716
  • Hop PJ, Luijk R, Daxinger L, et al. Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference. Genome Biol. 2020;21:220. doi: 10.1186/s13059-020-02114-z
  • Bonder MJ, Luijk R, Zhernakova DV, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 2017;49:131–138. doi: 10.1038/ng.3721
  • Suzuki T, Furuhata E, Maeda S, et al. GATA6 is predicted to regulate DNA methylation in an in vitro model of human hepatocyte differentiation. Commun Biol. 2022;5:414. doi: 10.1038/s42003-022-03365-1
  • Gibbs JR, van der Brug MP, Hernandez DG, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952. doi: 10.1371/journal.pgen.1000952
  • Bibikova M, Le J, Barnes B, et al. Genome-wide DNA methylation profiling using Infinium ® assay. Epigenomics. 2009;1:177–200. doi: 10.2217/epi.09.14
  • Wong NC, Ng J, Hall NE, et al. Exploring the utility of human DNA methylation arrays for profiling mouse genomic DNA. Genomics. 2013;102:38–46. doi: 10.1016/j.ygeno.2013.04.014
  • Gujar H, Liang JW, Wong NC, et al. Profiling DNA methylation differences between inbred mouse strains on the Illumina human Infinium MethylationEPIC microarray. PLoS One. 2018;13:e0193496. doi: 10.1371/journal.pone.0193496
  • Needhamsen M, Ewing E, Lund H, et al. Usability of human Infinium MethylationEPIC BeadChip for mouse DNA methylation studies. BMC Bioinf. 2017;18:486. doi: 10.1186/s12859-017-1870-y
  • Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45:e22. doi: 10.1093/nar/gkw967
  • Arneson A, Haghani A, Thompson MJ, et al. A mammalian methylation array for profiling methylation levels at conserved sequences. Nat Commun. 2022;13:783. doi: 10.1038/s41467-022-28355-z
  • Horvath S, Haghani A, Macoretta N, et al. DNA methylation clocks tick in naked mole rats but queens age more slowly than nonbreeders. Nat Aging. 2022;2:46–59. doi: 10.1038/s43587-021-00152-1
  • Horvath S, Haghani A, Peng S, et al. DNA methylation aging and transcriptomic studies in horses. Nat Commun. 2022;13:40. doi: 10.1038/s41467-021-27754-y
  • Ashbrook DG, Arends D, Prins P, et al. A platform for experimental precision medicine: the extended BXD mouse family. Cell Syst. 2021. doi: 10.1016/j.cels.2020.12.002.
  • Mulligan MK, Mozhui K, Prins P, et al. GeneNetwork: a toolbox for systems genetics. Methods Mol Biol. 2017;1488:75–120. doi: 10.1007/978-1-4939-6427-7_4
  • Roy S, Sleiman MB, Jha P, et al. Gene-by-environment modulation of lifespan and weight gain in the murine BXD family. Nat Metab. 2021;3:1217–1227. doi: 10.1038/s42255-021-00449-w
  • Wang X, Pandey AK, Mulligan MK, et al. Joint mouse–human phenome-wide association to test gene function and disease risk. Nat Commun. 2016;7:10464. doi: 10.1038/ncomms10464
  • Taylor BA, Heiniger HJ, Meier H. Genetic analysis of resistance to cadmium-induced testicular damage in mice. Proc Soc Exp Biol Med. 1973;143:629–633. doi: 10.3181/00379727-143-37380
  • Taylor BA, Wnek C, Kotlus BS, et al. Genotyping new BXD recombinant inbred mouse strains and comparison of BXD and consensus maps. Mamm Genome. 1999;10(4):335–348. doi: 10.1007/s003359900998
  • Peirce JL, Lu L, Gu J, et al. A new set of BXD recombinant inbred lines from advanced intercross populations in mice. BMC Genet. 2004;5(1):7. doi: 10.1186/1471-2156-5-7
  • Hook M, Roy S, Williams EG, et al. Genetic cartography of longevity in humans and mice: current landscape and horizons. Biochim Biophys Acta. 2018;1864:2718–2732. doi: 10.1016/j.bbadis.2018.01.026
  • Williams EG, Pfister N, Roy S, et al. Multiomic profiling of the liver across diets and age in a diverse mouse population. Cell Syst. 2022;13:43–57 e46. doi: 10.1016/j.cels.2021.09.005
  • Warner HR, Ingram D, Miller RA, et al. Program for testing biological interventions to promote healthy aging. Mech Ageing Dev. 2000;115:199–207. doi: 10.1016/S0047-6374(00)00118-4
  • Sandoval-Sierra JV, Helbing AHB, Williams EG, et al. Body weight and high-fat diet are associated with epigenetic aging in female members of the BXD murine family. Aging Cell. 2020;19:e13207. doi: 10.1111/acel.13207
  • Ashbrook DG, Arends D, Prins P, et al. A platform for experimental precision medicine: the extended BXD mouse family. Cell Syst. 2021;12:235–247 e239. doi: 10.1016/j.cels.2020.12.002
  • Maegawa S, Hinkal G, Kim HS, et al. Widespread and tissue specific age-related DNA methylation changes in mice. Genome Res. 2010;20:332–340. doi: 10.1101/gr.096826.109
  • Broman KW, Gatti DM, Simecek P, et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 2019;211:495–502. doi: 10.1534/genetics.118.301595
  • Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906
  • Gorkin DU, Barozzi I, Zhao Y, et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2020;583:744–751. doi: 10.1038/s41586-020-2093-3
  • Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559
  • Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol. 2007;1(54). doi: 10.1186/1752-0509-1-54
  • Mozhui K, Smith AK, Tylavsky FA, et al. Ancestry dependent DNA methylation and influence of maternal nutrition. PLoS One. 2015;10:e0118466. doi: 10.1371/journal.pone.0118466
  • Horvath S, Zhang Y, Langfelder P, et al. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Bio. 2012;13:R97. doi: 10.1186/gb-2012-13-10-r97
  • Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 2012;44:821–824. doi: 10.1038/ng.2310
  • Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848
  • http://genenetwork.org.
  • McLean CY, Bristor D, Hiller M, et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnol. 2010;28:495–501. doi: 10.1038/nbt.1630
  • Gu Z, Hubschmann D, Marschall T. rGREAT: an R/bioconductor package for functional enrichment on genomic regions. Bioinformatics. 2023;39: doi: 10.1093/bioinformatics/btac745
  • de Macedo MP, Glanzner WG, Gutierrez K, et al. Chromatin role in early programming of embryos. Anim Front. 2021;11:57–65. doi: 10.1093/af/vfab054
  • Pinheiro I, Margueron R, Shukeir N, et al. Prdm3 and Prdm16 are H3K9me1 methyltransferases required for mammalian heterochromatin integrity. Cell. 2012;150:948–960. doi: 10.1016/j.cell.2012.06.048
  • Chuikov S, Levi BP, Smith ML, et al. Prdm16 promotes stem cell maintenance in multiple tissues, partly by regulating oxidative stress. Nat Cell Biol. 2010;12:999–1006. doi: 10.1038/ncb2101
  • Ito S, D’Alessio AC, Taranova OV, et al. Role of tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature. 2010;466:1129–1133. doi: 10.1038/nature09303
  • Su YR, Gu S-M, Liu Y-R, et al. Partial cellular reprogramming stably restores the stemness of senescent epidermal stem cells. Eur Rev Med Pharmacol Sci. 2023;27:5397–5409. doi: 10.26355/eurrev_202306_32774
  • Bell E, Curry EW, Megchelenbrink W, et al. Dynamic CpG methylation delineates subregions within super-enhancers selectively decommissioned at the exit from naive pluripotency. Nat Commun. 2020;11:1112. doi: 10.1038/s41467-020-14916-7
  • Tsui D, Vessey JP, Tomita H, et al. FoxP2 regulates neurogenesis during embryonic cortical development. J Neurosci. 2013;33:244–258. doi: 10.1523/JNEUROSCI.1665-12.2013
  • Kaji K, Nichols J, Hendrich B. Mbd3, a component of the NuRD co-repressor complex, is required for development of pluripotent cells. Development. 2007;134:1123–1132. doi: 10.1242/dev.02802
  • Zhu JN, Jiang L, Jiang J-H, et al. Hepatocyte nuclear factor-1beta enhances the stemness of hepatocellular carcinoma cells through activation of the notch pathway. Sci Rep. 2017;7:4793. doi: 10.1038/s41598-017-04116-7
  • Verdin E, Dequiedt F, Kasler HG. Class II histone deacetylases: versatile regulators. Trends Genet. 2003;19:286–293. doi: 10.1016/S0168-9525(03)00073-8
  • Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11:863–874. doi: 10.1101/gr.176601
  • von Mering C, Jensen LJ, Snel B, et al. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33:D433–437. doi: 10.1093/nar/gki005
  • Dohda T, Kaneoka H, Inayoshi Y, et al. Transcriptional coactivators CBP and p300 cooperatively enhance HNF-1alpha-mediated expression of the albumin gene in hepatocytes. J Biochem. 2004;136:313–319. doi: 10.1093/jb/mvh123
  • Kumar S, Mitnik C, Valente G, et al. Expansion and molecular evolution of the interferon-induced 2’-5’ oligoadenylate synthetase gene family. Mol Biol Evol. 2000;17:738–750. doi: 10.1093/oxfordjournals.molbev.a026352
  • Rodriguez-Hidalgo M, Luna-Sánchez M, Hidalgo-Gutiérrez A, et al. Reduction in the levels of CoQ biosynthetic proteins is related to an increase in lifespan without evidence of hepatic mitohormesis. Sci Rep. 2018;8:14013. doi: 10.1038/s41598-018-32190-y
  • Ji Z, Liu GH, Qu J. Mitochondrial sirtuins, metabolism, and aging. J Genet Genomics. 2022;49:287–298. doi: 10.1016/j.jgg.2021.11.005
  • Morrow G, Tanguay RM. Drosophila melanogaster Hsp22: a mitochondrial small heat shock protein influencing the aging process. Front Genet. 2015;6:1026. doi: 10.3389/fgene.2015.00103
  • Blanchette M, Kent WJ, Riemer C, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104
  • Li G, Margueron R, Ku M, et al. Jarid2 and PRC2, partners in regulating gene expression. Genes Dev. 2010;24:368–380. doi: 10.1101/gad.1886410
  • Paaby AB, Rockman MV. The many faces of pleiotropy. Trends Genet. 2013;29:66–73. doi: 10.1016/j.tig.2012.10.010
  • Tyler AL, Asselbergs FW, Williams SM, et al. Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. BioEssays. 2009;31:220–227. doi: 10.1002/bies.200800022
  • Li H, Wang X, Rukina D, et al. An integrated systems genetics and omics toolkit to probe gene function. Cell Syst. 2018;6:90–102 e104. doi: 10.1016/j.cels.2017.10.016
  • https://systems-genetics.org/phewas.
  • Groves MG, Rosenstreich DL, Taylor BA, et al. Host defenses in experimental scrub typhus: mapping the gene that controls natural resistance in mice. J Immunol. 1980;125(3):1395–1399. doi: 10.4049/jimmunol.125.3.1395
  • Ito J, Roy S, Liu Y, et al. Whisker barrel cortex delta oscillations and gamma power in the awake mouse are linked to respiration. Nat Commun. 2014;5:3572. doi: 10.1038/ncomms4572
  • Poon A, Goldowitz D. Identification of genetic loci that modulate cell proliferation in the adult rostral migratory stream using the expanded panel of BXD mice. BMC Genomics. 2014;15:206. doi: 10.1186/1471-2164-15-206
  • Dogan A, Lasch P, Neuschl C, et al. ATR-FTIR spectroscopy reveals genomic loci regulating the tissue response in high fat diet fed BXD recombinant inbred mouse strains. BMC Genomics. 2013;14:386. doi: 10.1186/1471-2164-14-386
  • Andreux PA, Williams EG, Koutnikova H, et al. Systems genetics of metabolism: the use of the BXD murine reference panel for multiscalar integration of traits. Cell. 2012;150:1287–1299. doi: 10.1016/j.cell.2012.08.012
  • Leandro J, Violante S, Argmann CA, et al. Mild inborn errors of metabolism in commonly used inbred mouse strains. Mol Genet Metab. 2019;126:388–396. doi: 10.1016/j.ymgme.2019.01.021
  • Watanabe K, Stringer S, Frei O, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019;51:1339–1348. doi: 10.1038/s41588-019-0481-0
  • Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2022;51(D1):D977–D985. doi: 10.1093/nar/gkac1010
  • Wojcik GL, Graff M, Nishimura KK, et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4
  • Kanai M, Akiyama M, Takahashi A, et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6
  • Han X, Ong J-S, An J, et al. Using Mendelian randomization to evaluate the causal relationship between serum C-reactive protein levels and age-related macular degeneration. Eur J Epidemiol. 2020;35:139–146. doi: 10.1007/s10654-019-00598-z
  • Sakaue S, Kanai M, Tanigawa Y, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53:1415–1424. doi: 10.1038/s41588-021-00931-x
  • Shin SY, Fauman EB, Petersen A-K, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46:543–550. doi: 10.1038/ng.2982
  • Siewert KM, Voight BF. Bivariate Genome-wide association scan identifies 6 novel loci associated with lipid levels and coronary artery disease. Circ Genom Precis Med. 2018;11:e002239. doi: 10.1161/CIRCGEN.118.002239
  • Graham SE, Clarke SL, Wu K-H-H, et al. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600:675–679. doi: 10.1038/s41586-021-04064-3
  • Kichaev G, Bhatia G, Loh P-R, et al. Leveraging polygenic functional enrichment to improve GWAS power. Am J Hum Genet. 2019;104:65–75. doi: 10.1016/j.ajhg.2018.11.008
  • Day FR, Thompson DJ, Helgason H, et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet. 2017;49:834–841. doi: 10.1038/ng.3841
  • Warrington NM, Beaumont RN, Horikoshi M, et al. Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat Genet. 2019;51:804–814. doi: 10.1038/s41588-019-0403-1
  • Mahajan A, Taliun D, Thurner M, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50:1505–1513. doi: 10.1038/s41588-018-0241-6
  • Parra EJ, Below JE, Krithika S, et al. Genome-wide association study of type 2 diabetes in a sample from mexico city and a meta-analysis of a Mexican-American sample from Starr County, Texas. Diabetologia. 2011;54:2038–2046. doi: 10.1007/s00125-011-2172-y
  • Yashin AI, Wu D, Arbeev KG, et al. Joint influence of small-effect genetic variants on human longevity. Aging. 2010;2:612–620. doi: 10.18632/aging.100191
  • Field AE, Adams PD. Targeting chromatin aging - the epigenetic impact of longevity-associated interventions. Exp Gerontol. 2017;94:29–33. doi: 10.1016/j.exger.2016.12.010
  • Riera CE, Merkwirth C, De Magalhaes Filho CD, et al. Signaling networks Determining life span. Annu Rev Biochem. 2016;85:35–64. doi: 10.1146/annurev-biochem-060815-014451
  • Lau HH, Ng NHJ, Loo LSW, et al. The molecular functions of hepatocyte nuclear factors - in and beyond the liver. J Hepatol. 2018;68:1033–1048. doi: 10.1016/j.jhep.2017.11.026
  • Duncan SA, Navas MA, Dufort D, et al. Regulation of a transcription factor network required for differentiation and metabolism. Science. 1998;281:692–695. doi: 10.1126/science.281.5377.692
  • Junnila RK, List EO, Berryman DE, et al. The GH/IGF-1 axis in ageing and longevity. Nat Rev Endocrinol. 2013;9:366–376. doi: 10.1038/nrendo.2013.67
  • Yamagata K, Oda N, Kaisaki PJ, et al. Mutations in the hepatocyte nuclear factor-1α gene in maturity-onset diabetes of the young (MODY3). Nature. 1996;384:455–458. doi: 10.1038/384455a0
  • Byrne MM, Sturis J, Menzel S, et al. Altered insulin secretory responses to glucose in diabetic and nondiabetic subjects with mutations in the diabetes susceptibility gene MODY3 on chromosome 12. Diabetes. 1996;45:1503–1510. doi: 10.2337/diab.45.11.1503
  • Li LM, Jiang BG, Sun LL. HNF1A: From Monogenic diabetes to type 2 diabetes and Gestational diabetes Mellitus. Front Endocrinol. 2022;13:829565. doi: 10.3389/fendo.2022.829565
  • Hegele RA, Cao H, Harris SB, et al. The hepatic nuclear factor-1alpha G319S variant is associated with early-onset type 2 diabetes in Canadian Oji-Cree. J Clin Endocrinol Metab. 1999;84:1077–1082. doi: 10.1210/jc.84.3.1077
  • Lee YH, Sauer B, Gonzalez FJ. Laron dwarfism and non-insulin-dependent diabetes mellitus in the Hnf-1alpha knockout mouse. Mol Cell Biol. 1998;18:3059–3068. doi: 10.1128/MCB.18.5.3059
  • Yang X, Song JH, Cheng Y, et al. Long non-coding RNA HNF1A-AS1 regulates proliferation and migration in oesophageal adenocarcinoma cells. Gut. 2014;63:881–890. doi: 10.1136/gutjnl-2013-305266
  • Liu Z, Wei X, Zhang A, et al. Long non-coding RNA HNF1A-AS1 functioned as an oncogene and autophagy promoter in hepatocellular carcinoma through sponging hsa-miR-30b-5p. Biochem Biophys Res Commun. 2016;473:1268–1275. doi: 10.1016/j.bbrc.2016.04.054
  • Luco RF, Maestro MA, Sadoni N, et al. Targeted deficiency of the transcriptional activator Hnf1α alters subnuclear positioning of its genomic targets. PLoS Genet. 2008;4:e1000079. doi: 10.1371/journal.pgen.1000079
  • Yagi S, Hirabayashi K, Sato S, et al. DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression. Genome Res. 2008;18:1969–1978. doi: 10.1101/gr.074070.107
  • de Magalhaes JP. Ageing as a software design flaw. Genome Biol. 2023;24:51. doi: 10.1186/s13059-023-02888-y
  • de Magalhaes JP, Church GM. Genomes optimize reproduction: aging as a consequence of the developmental program. Physiology. 2005;20:252–259. doi: 10.1152/physiol.00010.2005
  • Singh PB, Zhakupova A. Age reprogramming: cell rejuvenation by partial reprogramming. Development. 2022;149: doi: 10.1242/dev.200755
  • Zhang W, Qu J, Liu GH, et al. The ageing epigenome and its rejuvenation. Nat Rev Mol Cell Biol. 2020;21:137–150. doi: 10.1038/s41580-019-0204-5
  • Lu Y, Brommer B, Tian X, et al. Reprogramming to recover youthful epigenetic information and restore vision. Nature. 2020;588:124–129. doi: 10.1038/s41586-020-2975-4
  • Ocampo A, Reddy P, Martinez-Redondo P, et al. In vivo amelioration of age-associated hallmarks by partial reprogramming. Cell. 2016;167:1719–1733 e1712. doi: 10.1016/j.cell.2016.11.052
  • Beucher A, Miguel-Escalada I, Balboa D, et al. The HASTER lncRNA promoter is a cis-acting transcriptional stabilizer of HNF1A. Nat Cell Biol. 2022;24:1528–1540. doi: 10.1038/s41556-022-00996-8
  • Liu Y, Zhao F, Tan F, et al. HNF1A-AS1: a tumor-associated long non-coding RNA. Curr Pharm Des. 2022;28:1720–1729. doi: 10.2174/1381612828666220520113846
  • Wang Y, Xie Y, Li L, et al. EZH2 RIP-seq Identifies tissue-specific long non-coding RNAs. Curr Gene Ther. 2018;18:275–285. doi: 10.2174/1566523218666181008125010
  • Beerman I, Bock C, Garrison B, et al. Proliferation-dependent alterations of the DNA methylation landscape underlie hematopoietic stem cell aging. Cell Stem Cell. 2013;12:413–425. doi: 10.1016/j.stem.2013.01.017
  • Dozmorov MG. Polycomb repressive complex 2 epigenomic signature defines age-associated hypermethylation and gene expression changes. Epigenetics. 2015;10:484–495. doi: 10.1080/15592294.2015.1040619
  • Mozhui K, Pandey AK. Conserved effect of aging on DNA methylation and association with EZH2 polycomb protein in mice and humans. Mech Ageing Dev. 2017;162:27–37. doi: 10.1016/j.mad.2017.02.006
  • Sandoval-Sierra JV, Helbing AH, Williams EG, et al. Body weight and high-fat diet are associated with epigenetic aging in female members of the BXD murine family. Aging Cell. 2020;e13207. doi: 10.1111/acel.13207
  • Guevara-Aguirre J, Balasubramanian P, Guevara-Aguirre M, et al. Growth hormone receptor deficiency is associated with a major reduction in pro-aging signaling, cancer, and diabetes in humans. Sci Transl Med. 2011;3:70ra13. doi: 10.1126/scitranslmed.3001845
  • Aguiar-Oliveira MH, Bartke A. Growth hormone deficiency: health and longevity. Endocr Rev. 2019;40:575–601. doi: 10.1210/er.2018-00216
  • Laron Z. Do deficiencies in growth hormone and insulin-like growth factor-1 (IGF-1) shorten or prolong longevity? Mech Ageing Dev. 2005;126:305–307. doi: 10.1016/j.mad.2004.08.022
  • Laron Z, Kauli R, Lapkina L, et al. IGF-I deficiency, longevity and cancer protection of patients with Laron syndrome. Mutat Res Rev Mutat Res. 2017;772:123–133. doi: 10.1016/j.mrrev.2016.08.002
  • Petkova SB, Yuan R, Tsaih S-W, et al. Genetic influence on immune phenotype revealed strain-specific variations in peripheral blood lineages. Physiol Genomics. 2008;34:304–314. doi: 10.1152/physiolgenomics.00185.2007
  • Wang T, Tsui B, Kreisberg JF, et al. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Bio. 2017;18(1). doi: 10.1186/s13059-017-1186-2
  • Statham AL, Strbenac D, Coolen MW, et al. Repitools: an R package for the analysis of enrichment-based epigenomic data. Bioinformatics. 2010;26:1662–1663. doi: 10.1093/bioinformatics/btq247
  • Dennis G Jr., Sherman BT, Hosack DA, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4(5):3. doi: 10.1186/gb-2003-4-5-p3
  • Szklarczyk D, Gable AL, Nastou KC, et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49:D605–D612. doi: 10.1093/nar/gkaa1074
  • Szklarczyk D, Gable AL, Lyon D, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47:D607–D613. doi: 10.1093/nar/gky1131
  • Keane TM, Goodstadt L, Danecek P, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. doi: 10.1038/nature10413
  • Yalcin B, Wong K, Agam A, et al. Sequence-based characterization of structural variation in the mouse genome. Nature. 2011;477:326–329. doi: 10.1038/nature10432
  • Blake JA, Baldarelli R, Kadin JA, et al. Mouse Genome database (MGD): knowledgebase for mouse–human comparative biology. Nucleic Acids Res. 2021;49:D981–D987. doi: 10.1093/nar/gkaa1083
  • Cunningham F, Allen JE, Allen J, et al. Ensembl 2022. Nucleic Acids Res. 2022;50:D988–D995. doi: 10.1093/nar/gkab1049