1,773
Views
0
CrossRef citations to date
0
Altmetric
Original Investigations

Rare variant association study of veteran twin whole-genomes links severe depression with a nonsynonymous change in the neuronal gene BHLHE22

, , , , , , , , , , ORCID Icon, , , , & show all
Pages 295-306 | Received 08 Aug 2021, Accepted 10 Sep 2021, Published online: 29 Nov 2021

Abstract

Objectives

Major Depressive Disorder (MDD) is a complex neuropsychiatric disease with known genetic associations, but without known links to rare variation in the human genome. Here we aim to identify rare genetic variants associated with MDD using deep whole-genome sequencing data in an independent population.

Methods

We report the sequencing of 1,688 whole genomes in a large sample of male-male Veteran twins. Depression status was classified based on a structured diagnostic interview according to DSM-III-R diagnostic criteria. Searching only rare variants in genomic regions from recent GWAS on MDD, we used the optimised sequence kernel association test and Fisher's Exact test to fine map loci associated with severe depression.

Results

Our analysis identified one gene associated with severe depression, basic helix loop helix e22 (PAdjusted = 0.03) via SKAT-O test between unrelated severely depressed cases compared to unrelated non-depressed controls. The same gene BHLHE22 had a non-silent variant rs13279074 (PAdjusted = 0.032) based on a single variant Fisher’s Exact test between unrelated severely depressed cases compared to unrelated non-depressed controls.

Conclusion

The gene BHLHE22 shows compelling genetic evidence of directly impacting the severe depression phenotype. Together these results advance understanding of the genetic contribution to major depressive disorder in a new cohort and link a rare variant to severe forms of the disorder.

Introduction

Major Depressive Disorder (MDD) is a psychiatric condition caused, in part, by a complex network of heritable risk factors that scale with severity (Sullivan et al. Citation2000; Levinson Citation2006). MDD is estimated to affect one in six adults in their lifetime (Otte et al. Citation2016). Recent genome-wide-association studies (GWAS) have identified genomic regions that may contain genetic variants that could functionally contribute to the heritability of this disorder (Hyde et al. Citation2016; Wray et al. Citation2018; Howard et al. Citation2019) These GWAS efforts rely on utilising array-based technology which samples only common variation, which are unlikely to have functional links to disease and fail to explain the full heritability of MDD (Auer and Lettre Citation2015). Similarly, commonly studied depression candidate genes have yet to be conclusively linked to functional changes that directly cause increased risk of MDD (Border et al. Citation2019). Expanding to a whole-genome-sequence association study provides an opportunity to identify rare causative variants, which are overlooked by common-variant array-based genetic association studies.

The Vietnam Era Twin (VET) Registry is a cohort of male-male twin pairs maintained since the 1980s, many of whom served during the Vietnam era, and includes physiologic and psychological measures as well as bio-specimens from its members (Eisen et al. Citation1987). In 2001 a special biorepository was created, comprising whole-blood samples collected from a subset of 1819 twins, mostly paired siblings. It was this special biorepository of twin-pairs that became the source of DNA samples used for the WGS study reported here. Previous VET Registry research has focussed on the heritability of MDD and a significant genetic influence was found for the most severe types of depression (Lyons et al. Citation1998). In that study, depression phenotype variables were collected using the Diagnostic Interview Schedule III Revised (DIS-III-R) which assesses Diagnostic and Statistical Manual-III-R (DSM-III-R) diagnostic criteria via telephone interview. This same data and samples are used for the current study and defined the scope of cases and controls for each phenotype.

One of the primary challenges of a genetic association study is achieving adequate statistical power necessary to identify associations with small effect sizes. Many current WGS cohorts, including the one presented here, lack sufficiently large sample sizes needed to identify low frequency genetic associations with genome-wide significance. However, many array-based studies which have identified regions associated with MDD using millions of individuals have narrowed the search space for potential casual variants. We hypothesised that rare, potentially causal variation for severe MDD could be discovered by analysing the genomic regions previously identified by GWAS. In contrast to correcting for multiple testing by using thresholds for genome-wide studies, controlling for the total number of tests performed by limiting to these regions (‘testing wide significance’) provides appropriate control for this study design and provides power for this small cohort. As hypothesised, this type of focussed approach successfully linked severe MDD to a non-silent variant on chromosome 8, in the single exon of the basic helix loop helix e22 (BHLHE22) gene.

Methods

Sample collection

The cohort for this investigation was drawn from the Vietnam-Era Twins (VET) Registry. The VET Registry contains 7369 male-male twin pairs (14,738 individuals) in which one or both brothers served in the United States military during the Vietnam Era (1965–1975). Informed consent was obtained from all individual participants involved in the study, and all experiments were performed in accordance with relevant guidelines and regulations. Full details about the construction of the VET Registry have been previously published (Eisen et al. Citation1987). The VET Registry biorepository is a subsample of 1,819 members whose blood was collected during previous in-person studies. The entire biospecimen repository was sequenced and samples in this cohort were sequenced without regard to depression status, twin status, or batching.

There are 4 depression phenotypes variables in the VET registry. These depression variables were derived from the Diagnostic Interview Schedule-III-R (DIS-III-R) depression module according to the Diagnostic and Statistical Manual—III-Revised (DSM-III-R) criteria. This structured telephone interview was administered between 1991 and 1994 as part of the Harvard Drug Study (Tsuang et al. Citation2001). Participants were not selected on the basis of any substance use or other diagnostic characteristic. Responses and diagnostic criteria from the DIS-III-R were used to define four different values that reflect the lifetime history of depression for each individual within the cohort up to 1994. The first variable is the lifetime history of MDD diagnosis. The second variable is the count of depression symptoms experienced during a member’s lifetime, ranging from 0 to 9 of 9 symptoms experienced. The third variable records the age at which the individual first experienced a depressive episode, which then informs a binary classification denoting early-onset (age < 30) or late onset (age > 30). The last variable is the DSM-III-R classification of the severity of the depressive episode experienced among those with a lifetime diagnosis of depression. This 3-level classification within the DIS-III-R is as follows: mild- meaning few, if any symptoms more than the minimum to make the diagnosis and only minor impairment in usual social activities or relationships; moderate-meaning impairment and symptoms between mild and severe; and severe/severe-psychotic-meaning several symptoms more than the minimum required for the diagnosis, marked by interference with occupational or social functioning with the potential for additional psychosis-related delusions or hallucinations. All individuals labelled as severely depressed within the VET Registry and the VET Registry Biorepository were without additional psychosis symptoms.

Sequencing and genotyping

Samples of DNA were housed at the Department of Veteran Affairs (VA) Massachusetts Veterans Epidemiology Research and Information Centre (MAVERIC) biorepository and were sent to The American Genome Centre at Uniformed Services University. They were then sequenced on an Illumina HiSeq X System using paired-end 151 bp read lengths targeting 400 million total reads. Sequenced reads were processed by the Illumina HiSeq Analysis Software version 2.0 (2.5.55.13). Variants across all samples in the cohort were merged and normalised. Population level autosomal variant sites were then filtered using two conditions: (1) the proportion of samples with non-reference alleles having a PASS filter is at least 90%; and (2) the proportion of samples having a minimum genotype quality (greater or equal to 20) of at least 90%. Variants within the cohort VCF were annotated using ANNOVAR (Wang et al. Citation2010). Reference population allele frequencies were obtained from TwinsUK, 1000 Genomes Project, GnomAD, and ExAC (1000 Genomes Project Consortium et al. Citation2015; Moayyeri et al. Citation2013; Lek et al. Citation2016). Non-silent mutations were assessed using data from SIFT, Polyphen2, MutationAssessor, PROVEAN, and fathmm (Adzhubei et al. Citation2010; Choi et al. Citation2012; Shihab et al. Citation2013; Vaser et al. Citation2016). Twin mate identities were verified by calculating identity by state (IBS) via rvtests (Zhan et al. Citation2016). To compare two samples' genome-wide genotypes including non-reference and reference-matching genotypes, genome variant call format (GVCF) files were merged and analysed for concordance.

Target regions for focussed analysis

We focussed on the most recent study of MDD with the largest sample size for the selection of our genome regions of interest (Howard et al. Citation2019). Using an initial list of 102 lead variants (substitutions and indels), we measured the size of the linkage disequilibrium (LD) block centred on each lead variant within the 1000 Genomes background populations of Americans of African Ancestry in SW USA (ASW) and Utah Residents (CEPH) with Northern and Western European Ancestry (CEU) to estimate a similar composition of ancestry compared to the known ancestry of our cohort using the LDLink tool (Machiela and Chanock Citation2015). Borders for each unique LD block were set as the breadth of the peak which was greater than 0.2 r2 for each locus of interest. This method follows a similar example in the literature of a haplotype-based search in psychiatric disorders (Budde et al. Citation2019; Howard et al. Citation2019). The borders of the LD region for each associated locus from the set of 102 sites are listed in Supplemental Table 1.

Table 1. Clinical data and sequencing status from a repository cohort of 1,688 individuals sequenced from the Vietnam Era Twins Registry Biorepository.

VET registry WGS cohort construction

This biorepository cohort and its resultant WGS data were conceived to serve a broad set of research goals, including but not limited to associations with MDD. It was therefore necessary to sequence the full biorepository which included monozygotic and dizygotic twins. To account for the presence of related data we therefore intended to balance the need to include the maximum amount of unique DNA and cases, with the need to exclude samples for familial relatedness. Within the literature there are multiple strategies for dealing with genomic data from related pairs in genetic association studies. These include genome-wide association study designs such as including all genomic data from all twin pairs (Valdes et al. Citation2008; Verweij et al. Citation2010; Miller et al. Citation2012), including both DZ twins and removing one MZ twin (Wallace et al. Citation2008; Loukola et al. Citation2014), and not including any genomic information from the paired twin when it was available (Luciano et al. Citation2011; Kettunen et al. Citation2012; Armour et al. Citation2014). The final cohort as defined by striking one individual from each MZ pair, including both individuals of a DZ pair, and including all singletons. In addition, the cohort was limited to European individuals. This resulted in a cohort of up to N = 1060 individuals. Samples with missing phenotype data were excluded. To account for the addition of relatedness by the inclusion of DZ pairs in the cohort, we constructed a more conservative follow up cohort which included only unrelated individuals of European ancestry, and controls with no history of depression (n = 676). Peddy was used to measure the coefficient of relatedness between unrelated individuals. No individuals with a coefficient of relatedness ≥0.1 were present in the unrelated cohort, as expected for a coefficient of relatedness between unrelated pair comparisons (Manichaikul et al. Citation2010; Pedersen and Quinlan Citation2017).

Genetic association testing

Association testing was conducted across each gene and non-silent variant within the 102 previously associated autosomal loci. Association at each gene was measured using the optimal sequence kernel association test, SKAT-O (Lee et al. Citation2012). Single-variant enrichment was evaluated using Fisher’s Exact Test. Statistical significance was determined after correction for multiple testing by using a Bonferroni adjusted p value based on the number of genes tested. Each of the four DSM-III-R MDD diagnostics variables was measured for enrichment in genes and single variants independently of one another, similar to previous VET Registry research (Lyons et al. Citation1998). Similar approaches in non-depression research have shown precedent for testing related but independent phenotypes across an exome (Haddad et al. Citation2016).

Prior to testing, variants were filtered to those with a MAF < 0.05, those that are non-silent, and those that have an allele count of at least three alternative alleles. ‘Non-silent’ was defined as the following classes of variant as annotated by ANNOVAR: nonsynonymous, frameshift, stopgain, and stoploss. Principal components 1–4 were used as covariates to account for population structure for the SKAT-O. Four MDD categories were tested for association. These variables included lifetime diagnosis of a major depressive episode, severity of depression, the number of symptoms experienced, and early onset of MDD. Plotting of variant association was performed using LDAssoc (Machiela and Chanock Citation2018). Track data for brain tissue expression was sourced from Genotype-Tissue Expression (GTEx) project (GTEx Consortium Citation2013).

Results

Sequencing the VET registry WGS cohort allows the inclusion of genomic data from related pairs

We sequenced blood-derived DNA from 1688 male VET Registry members with a resulting mean read depth of 43.7x. These high-quality whole-genome sequences were sourced from biorepository samples of 982 monozygotic (MZ) twins, 632 dizygotic (DZ) twins, and 74 unrelated individuals in the VET Registry, whose blood was collected between the years of 2002 and 2013 (, ). Three duplicate pairs of DNA samples were also sequenced as technical replicates. Phenotype data described self-reported ancestry and four MDD DSM-III-R variables.

Figure 1. Genome-wide identity by state (IBS) between twin pairs for a cohort of monozygotic, dizygotic, unrelated individuals, and technical replicate whole-genome sequences. Percent IBS for each twin pair, technical replicates, and randomly paired unrelated individuals is listed as a box plot. The centre and ends of the boxes represent the 25th, 50th and 75th percentiles of IBS values and the whiskers on each plot represent the range of IBS within 1.5 times the interquartile range of the lower and upper quartiles. Each individual IBS value is also plotted. Inset is a plot of genome-wide percent identity by state for monozygotic twins and technical replicates, respectively.

Figure 1. Genome-wide identity by state (IBS) between twin pairs for a cohort of monozygotic, dizygotic, unrelated individuals, and technical replicate whole-genome sequences. Percent IBS for each twin pair, technical replicates, and randomly paired unrelated individuals is listed as a box plot. The centre and ends of the boxes represent the 25th, 50th and 75th percentiles of IBS values and the whiskers on each plot represent the range of IBS within 1.5 times the interquartile range of the lower and upper quartiles. Each individual IBS value is also plotted. Inset is a plot of genome-wide percent identity by state for monozygotic twins and technical replicates, respectively.

To better understand the population structure of the genomes sourced from the VET Registry, we conducted a principal component (PC) analysis including the first 20 eigenvectors computed from cohort-wide variation data (). The plots of eigenvectors for the VET Registry cohort closely resemble genome-wide PC distributions from previous studies of global human populations (1000 Genomes Project Consortium et al. 2015), and revealed no biases due to reagent batch, collection date, or other potential confounding effects.

Figure 2. Regional displays of the BHLHE22 and SEZ6 loci and the single-variant associations at rs13279074 and rs117736100. (A) Variant associations to severe depression within the local region surrounding the BHLHE22 gene for rare non-silent variants (black border) as well common variants (no border). Shading of each circle is proportional to linkage disequilibrium, D‘ from the 1000 Genomes cohort. The size of the circle is proportional to Minor Allele Frequency. Recombination rate and overlapping gene models are also shown. Labeled loci are previously described common-variant associations to MDD and the variant of interest. (B) Gene structure display of BHLHE22 with protein domains and region annotation labelled. Two other previously-reported associated variants with phenotypes related to MDD are displayed. Expression data describing the tissue-specific expression profile of BHLHE22 is sourced from the Genotype-Tissue Expression project, most prevalent tissues are labelled. (C) Plot of variant associations to severe depression within the local region surrounding the SEZ6 gene. Labels and descriptions correspond to those seen in Figure 2A. (D) Gene structure display of SEZ6 with protein domains and region annotation labelled where available.

Figure 2. Regional displays of the BHLHE22 and SEZ6 loci and the single-variant associations at rs13279074 and rs117736100. (A) Variant associations to severe depression within the local region surrounding the BHLHE22 gene for rare non-silent variants (black border) as well common variants (no border). Shading of each circle is proportional to linkage disequilibrium, D‘ from the 1000 Genomes cohort. The size of the circle is proportional to Minor Allele Frequency. Recombination rate and overlapping gene models are also shown. Labeled loci are previously described common-variant associations to MDD and the variant of interest. (B) Gene structure display of BHLHE22 with protein domains and region annotation labelled. Two other previously-reported associated variants with phenotypes related to MDD are displayed. Expression data describing the tissue-specific expression profile of BHLHE22 is sourced from the Genotype-Tissue Expression project, most prevalent tissues are labelled. (C) Plot of variant associations to severe depression within the local region surrounding the SEZ6 gene. Labels and descriptions correspond to those seen in Figure 2A. (D) Gene structure display of SEZ6 with protein domains and region annotation labelled where available.

Twin genome variation reflects zygosity status

Having whole-genome sequence data of monozygotic twins created the opportunity to investigate the nature of twin genetics. Recent studies have suggested that monozygotic twins may harbour rare variation that is somatic in nature, and that sesquizygotic twinning can occur, albeit rarely (Nishioka et al. Citation2018; Gabbett et al. Citation2019). Given the large number of MZ twins within the VET Registry cohort, we explored the genome-wide concordance of alleles between all pairs of monozygotic, dizygotic, replicate and unrelated samples within the cohort (). On average we detected 3,094,980,951 co-genotyped sites between two individuals. Monozygotic twins showed nearly the same percentage of identical alleles (mean 99.99841%, n = 485 pairs) as technical replicates (mean 99.99849%, n = 3). Dizygotic twins also displayed high allelic similarity (mean 99.948%, n = 312 pairs) although that similarity was reduced compared with monozygotic twins. Both monozygotic and dizygotic twins had significantly greater similarity compared with paired unrelated individuals (mean 99.91%, n = 32 randomly assigned pairs). Among the population of monozygotic twins, we identified 596 homozygous non-silent variants discordant between pairs of MZ twins (n = 491 pairs); this included non-synonymous variants, frameshifts, stop-loss, and stop-gains. Of this set, 132 unique non-silent variants had a functional impact (CADD) score, of at least 20. Gene ontology analysis of these 132 disruptive sites between MZ twins did not reveal any remarkable gene sets. In summary, the cohort of twins matched expectations of relatedness and does not show any clusters of non-silent mutations.

Gene-collapsed rare variant analysis identifies non-silent burden associated with MDD

To determine if genes in the 102 regions previously associated with MDD might have an enrichment of non-silent rare variants, we utilised gene-collapsed SKAT-O testing (Lee et al. Citation2012). A total of 340 genes exist within the borders of the 102 regions, making them available for inclusion in a gene-wise SKAT-O analysis. The analysis was conducted using one test per gene for each of the four phenotype variables (lifetime depression diagnosis, early age-of-onset, symptom count, severe depression). Of these 102 regions known to be associated, 22 regions had no genes present. For the severe depression phenotype, the resulting testing space of 80 regions contained a total of 212 genes with at least 3 non-silent variants, upon which the tests were conducted. This test included n = 47 severely depressed individuals compared to n = 1007 non-severely-depressed European individuals. Full results are reported in Supplemental Table 2. We found that the developmental transcription factor BHLHE22 was significantly enriched for rare variation among individuals with severe depression (SKATO P = 0.000123, PAdjusted = 0.027, , Supplemental Table 2A). This basic helix loop helix transcription factor contained 21 non-silent variants across the gene on chromosome 8. No other significant associations were found (Supplemental Table 2C–H). By limiting the cohort to severely depressed unrelated individuals compared to unrelated individuals with no history of depression (n = 676), we found that BHLHE22 continued to have a significant enrichment of rare, non-silent variants (SKATO P = 0.000149, PAdjusted = 0.032) (Supplemental Table 3A).

Table 2. Gene burden test and single variant results for the 102 previously associated regions for four depression phenotype variables.

Single variant analysis identifies a non-silent rare variant associated with MDD

We also performed single variant analysis on all non-silent, rare (MAF ≤ 0.05) variants within the borders of the 102 regions for four different MDD-related variables (lifetime depression diagnosis, age-of-MDD onset before 30, symptom count 0–9, and depression severity). In the case of severe depression, 548 tests were performed, one for each non-silent variant. As utilised above, this test included n = 47 severely depressed individuals compared to n = 1007 non-severely-depressed European individuals. One non-silent variant, (rs13279074) BHLHE22, was significantly associated with severe depression (Fisher’s Exact P = 6.98 × 10−05, PAdjusted = 0.038, , Supplemental Table 2B). The allele frequency for rs13279074 among severe MDD cases was 0.08 compared to 0.009 for those without severe depression, odds ratio: 8.45 (95% CI 3.4614–20.6404), an odds ratio similar in size to other reported rare-variant associations (Fritsche et al. Citation2016; Pigors et al. Citation2018). This variant at rs13279074 encodes at G>T change in the first and only exon of the BHLHE22 gene, also known as BHLHB5. This translates to a glycine to valine change at position 106 of the protein. To rule out relatedness or other cryptic effects, we reduced the cohort to unrelated individuals of European descent, by selecting one twin from each pair of MZ and DZ samples with priority for inclusion given to individuals diagnosed with MDD, and all singleton samples. We found that rs13279074 remained enriched when comparing severely depressed unrelated individuals to unrelated individuals with no history of depression (n = 676) (Fisher’s Exact P = 8.01 × 10−05, PAdjusted = 0.032, Supplemental Table 3B). The rare variant in BHLHE22 was statistically significant after Bonferroni correction in the large (n = 1,054) and small cohorts (n = 696).

Regional context of rs13279074 in BHLHE22

To better understand the context of the association of rs13279074 with severe depression, we plotted variant associations for all rare and common variants across its parent genome region on chromosome 8 (). Among rare, non-silent variants in this region, only rs13279074 had significance. Linkage disequilibrium sourced from the 1,000 Genomes cohort showed an extended block of linkage and an elevated group of p-values centred on the previously reported lead MDD SNP – rs7837935. The novel rare variant present in BHLHE22 appears to be within the same block of inheritance.

Further context analysis of rs13279074 within the BHLHE22 gene supports a functional role in MDD (). Among the 35 human tissues assayed by the Genotype-Tissue Expression project (GTEx), BHLHE22 was found to be primarily expressed in brain tissue and research has shown it is involved in the development of the neocortex (Joshi et al. Citation2008; GTEx Consortium Citation2013). The gene has one exon with three annotated protein domains corresponding to the common basic helix loop helix protein structure (Xu et al. Citation2002). The associated rare variant rs13279074 is located outside the annotated BHLH domains. However, the associated change of a glycine to a valine found at position 106 is conserved across species and does not overlap with the known low complexity regions of the protein. Additionally, BHLHE22 forms a repressor complex with the PRDM8 protein, whose protein-protein interaction site is not yet understood and may overlap with this association with severe depression (Ross et al. Citation2012). Multiple studies have linked other variants in BHLHE22 with neurological phenotypes and disease, including increased risk taking, and early-onset familial MDD (Subaran et al. Citation2016; Clifton et al. Citation2018). Lastly, we find that rs13279074, through imputation, is associated with Schizophrenia (P = 0.0004 n = 96,499) within the Finnish genetics cohort (FinnGen Freeze 2) database (mFinnGen Citation2020), providing additional psychiatric disease-association evidence for this rare variant.

Across the VET Registry cohort, the variant (rs13279074) BHLHE22 was observed 41 times across 3376 total alleles, 37 of which were in a heterozygous context, and 4 alleles which were in two homozygous individuals. The two homozygous individuals are related monozygotic twins, one of whom was diagnosed with severe depression and the other was not clinically diagnosed with severe MDD but exhibited a high depression symptom count (8/9 symptoms and 7/9 symptoms respectively). In this cohort the BHLHE22 gene contained 20 other non-silent variants, four of which met the threshold for testing (minor allele count >3, minor allele frequency <0.05). Among that set of four, the severe depression associated variant rs13279074 had the highest CADD phred score, measuring 23.5. Classification of the amino acid change at position 106 was mixed, with Polyphen2 and fathmm predicting a deleterious classification and SIFT, MutationAssessor and PROVEAN predicting a tolerated effect. As previously stated, among cases of severe depression, the allele frequency for rs13279074 was found to be 0.08 compared to 0.009 for individuals without severe depression. Among all samples in the VET Registry cohort, the allele frequency was 0.012 for rs13279074. This is comparable to the allele frequency seen in large, unselected populations, which range between 0.002 and 0.02 with a mean of 0.01169 (Karczewski et al. Citation2020). Together, this evidence shows that the rare variant rs13279074 is significantly associated with severe depression, and imparts a protein change at a conserved site in a neurological gene that has been classified as disruptive.

Non-silent variant rs117736100 in SEZ6

The second most enriched single variant was the non-silent rare variant rs117736100, which was enriched for severe depression but not significant after correction (Fisher’s Exact P = 0.0003, PAdjusted = 0.18, , Supplemental Table 2B). The MAF for rs117736100 among severe MDD cases was 0.027 compared to 0.0004 in those without severe MDD, odds ratio: 66.3626 (95% CI 6.8359–644.2420). This variant at rs117736100 encodes a C > T nucleotide change in the gene, Seizure-Related-6 (SEZ6; rs117736100). The number of observations of rs117736100 in SEZ6 were too few to permit testing in an unrelated cohort. In the case of the novel rare variant observed in SEZ6, rs117736100 occurs in a region of elevated significance surrounding a block of inheritance previously associated with MDD (). As seen in , the rare variant rs117736100 in SEZ6 is expressed in the brain and present in a SUSHI/SCR domain.

Discussion

The primary finding in this paper links severe depression, a phenotype previously found to carry the high heritability of disease, to a rare non-silent change in the neuronal transcription factor BHLHE22. This non-silent variation at position 106 is predicted to cause a nonsynonymous change in a region of the protein predicted to be conserved and outside of known low complexity regions. The BHLHE22 rare variant rs13279074 has additional properties consistent with a possible effect on neuropsychiatric disorders. In brief, the rare variant is predicted to be disruptive: it occurs in a gene that is specific to neurons in the developing neocortex; it occurs at a higher rate than expected in severely depressed individuals; it occurs in a gene linked to multiple psychological diseases including familial MDD; and in an external cohort, the same variant rs13279074 was found to also be enriched among individuals diagnosed with Schizophrenia. Most importantly, this variant overlaps the same linkage block on chromosome 8 as those seen in the most recent MDD association studies. In addition to this site, we also found a promising SNP, rs117736100, in the SEZ6 gene, which shares many of the same lines of evidence as the BHLHE22 association but fell short of significance. Together these novel rare variant associations show that refinement of the current understanding of MDD genetics is possible and is guided by the previous literature on inheritance and association.

BHLHE22 is primarily expressed in the brain, yet its mechanism of influence on MDD is not fully understood. However, multiple lines of evidence relate the BHLHE22 gene to psychiatric disease. These include being a risk factor for schizophrenia (Hennig et al. Citation2017), PTSD (Gelernter et al. Citation2019), familial depression (Subaran et al. Citation2016) and risk-taking-behavior (Karlsson Linner et al. Citation2019). In particular, the imputed association of rs13279074 with schizophrenia within the Finngen2 database demonstrates the impact that changes in this gene and base pair can have on psychiatric disease. Functionally, BHLHE22 drives neocortical arealization, the process by which individual areas of the neocortex develop their own cytoarchitecture, connectivity and function (Alfano and Studer Citation2013). BHLHE22 also differs from other members of the class of Basic Helix-Loop-Helix genes in that it cannot bind directly to DNA and must have a binding partner that allows the complex to bind target regions. One such binding partner is the BHLH gene TCF4, and its paralog TCF3. The BHLHE22/TCF4 complex has been shown to drive oligodendrocyte differentiation and CNS myelinisation (Wedel et al. Citation2020). TCF4 has also recently been shown to be a schizophrenia risk gene on its own (Hennig et al. Citation2017). Thus the enhanced risk for schizophrenia associated with BHLHE22 could be related to the influence or interaction with TCF4, and its downstream targets. BHLHE22 also forms a transcriptional repressor complex with PRDM8, a member of a conserved family of methyltransferases that affects neuronal circuit assembly but whose influence on disease is unknown (Ross et al. Citation2012). This evidence suggests the mechanism of BHLHE22 in causing psychiatric disease is through transcriptional regulation via partner genes which alters neuronal architecture. The downstream effects of these transcriptional regulators are many, and provide a plausible mechanism by which BHLHE22 may contribute to a causal genetic determinate for severe MDD.

While not significantly enriched in this study, the variant rs117736100 in SEZ6 may be a promising candidate for future study as it has several intriguing features. Previous characterisation of this gene in mice has shown that SEZ6 acts to increase dendrite arborisation and excitability of neurons within the cortex (Gunnersen et al. Citation2007). There are multiple reported common variants in SEZ6 which have been linked to diseases such as febrile seizure, epilepsy, Alzheimer’s disease and schizophrenia (Yu et al. Citation2007; Jiang et al. Citation2012; Ambalavanan et al. Citation2016; Paracchini et al. Citation2018). Additional studies have also linked homologs of SEZ6, such as SEZ6L, to bipolar disorder (Xu et al. Citation2013).

We acknowledge some limitations to this work. Association studies comprised of whole-genome sequences are an emerging field. As a result, sample sizes are small while cohorts are sequenced and slowly aggregated. This includes this study, which is a first of its kind in sequencing a large cohort of WGS data from individuals with psychiatric disease. While the size of the cohort falls short of a study in a mature field, we feel that this initial result will point the way towards further investigation of the genes and variants identified when larger cohorts are available in the coming years. This WGS approach specifically uses regions with known associations to MDD from deeply sampled array-based studies. It is a reasonable assumption that those regions are enriched for variants associated with major depressive disorder. Our reported genes and variants meet significance thresholds for the number of tests performed, which is appropriate given our hypotheses for investigating these regions; however, this threshold is less than genome or exome-wide significance. Individuals diagnosed with severe depression have been shown to have the strongest heritability of MDD (Lyons et al. Citation1998). However, the most severe forms of depression may also have a diverse and unique biology that differentiates it from the MDD observed in this cohort. Additionally, the adult all-male cohort may limit whether these observations apply to a general population, as it is known that MDD has higher prevalence among women and lower prevalence among the elderly (Piccinelli and Wilkinson Citation2000; Kessler et al. Citation2010). Lastly, due to the inherent rarity of the variation being studied, we acknowledge that for the SEZ6 rs117736100 locus we report very low counts of severe depression cases and alternative alleles observed, resulting in the variant falling short of significance after correction for multiple testing. Despite these potential sources of bias, we feel that this investigation is a useful and meaningful step towards identifying the underlying genetics of MDD.

In conclusion, we have identified a rare variant in the BHLHE22 gene linked to the most severe forms of MDD. This rare disruptive change forms a functional link between phenotype and genotype. Also, the evidence that cases of severe depression are an important resource for finding genetic MDD associations gives insight for the future search for additional novel links to this disorder.

Supplemental material

Supplemental Table 3

Download MS Excel (43.6 KB)

Supplemental Table 2

Download MS Excel (153.3 KB)

Supplemental Table 1

Download MS Excel (14.6 KB)

Supplementary Figures

Download MS Word (152.2 KB)

Acknowlegements

We want to acknowledge the participants and investigators of FinnGen study. Most importantly, the authors gratefully acknowledge the cooperation and participation of the members of the VET Registry and their families. Without their contribution this research would not have been possible.

Disclosure statement

All statements and opinions are solely of the authors and do not necessarily reflect the position or policy of the VET Registry, the VA, or the United States Government. The opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the U.S. Army, Department of Defense, the U.S. Government or the Uniformed Services University of the Health Sciences. The use of trade names does not constitute an official endorsement or approval of the use of such reagents or commercial hardware or software. This document may not be cited for purposes of advertisement. The authors report no conflict of interest.

Data availability statement

Investigators interested in accessing phenotypic and genotypic data from the VET Registry (https://www.seattle.eric.research.va.gov/VETR/Home.asp) can apply for data access by contacting the VET Registry at [email protected].

Additional information

Funding

This work was supported by NHLBI/NIH under grant IAA-AA-HL-14-007 (Pollard); Cooperative Studies Program (CSP) of the United States Department of Veterans Affairs (VA) Office of Research & Development; R01DA004604 (Tsuang); R01AG050595 (Kremen, Lyons, Franz) and R01AG022381 (Kremen); and R01HL68630, R01AG026255, R01HL125246, and R01HL136205 (Vaccarino).

References

  • 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. 2015. A global reference for human genetic variation. Nature. 526(7571):68–74.
  • Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. 2010. A method and server for predicting damaging missense mutations. Nat Methods. 7(4):248–249.
  • Alfano C, Studer M. 2013. Neocortical arealization: evolution, mechanisms, and open questions. Dev Neurobiol. 73(6):411–447.
  • Ambalavanan A, Girard SL, Ahn K, Zhou S, Dionne-Laporte A, Spiegelman D, Bourassa CV, Gauthier J, Hamdan FF, Xiong L, et al. 2016. De novo variants in sporadic cases of childhood onset schizophrenia. Eur J Hum Genet. 24(6):944–948.
  • Armour JA, Davison A, McManus IC. 2014. Genome-wide association study of handedness excludes simple genetic models. Heredity. 112(3):221–225.
  • Auer PL, Lettre G. 2015. Rare variant association studies: considerations, challenges and opportunities. Genome Med. 7(1):16.
  • Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, Keller MC. 2019. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. Am J Psychiatry. 176(5):376–387.
  • Budde M, Friedrichs S, Alliey-Rodriguez N, Ament S, Badner JA, Berrettini WH, Bloss CS, Byerley W, Cichon S, Comes AL, et al. 2019. Efficient region-based test strategy uncovers genetic risk factors for functional outcome in bipolar disorder. Eur Neuropsychopharmacol. 29(1):156–170.
  • Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. 2012. Predicting the functional effect of amino acid substitutions and indels. PLOS One. 7(10):e46688.
  • Clifton EAD, Perry JRB, Imamura F, Lotta LA, Brage S, Forouhi NG, Griffin SJ, Wareham NJ, Ong KK, Day FR. 2018. Genome-wide association study for risk taking propensity indicates shared pathways with body mass index. Commun Biol. 1:36.
  • Eisen S, True W, Goldberg J, Henderson W, Robinette CD. 1987. The Vietnam Era Twin (VET) registry: method of construction. Acta Genet Med Gemellol. 36(1):61–66.
  • Fritsche LG, Igl W, Bailey JN, Grassmann F, Sengupta S, Bragg-Gresham JL, Burdon KP, Hebbring SJ, Wen C, Gorski M, et al. 2016. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat Genet. 48(2):134–143.
  • Gabbett MT, Laporte J, Sekar R, Nandini A, McGrath P, Sapkota Y, Jiang P, Zhang H, Burgess T, Montgomery GW, et al. 2019. Molecular support for heterogonesis resulting in sesquizygotic twinning. N Engl J Med. 380(9):842–849.
  • Gelernter J, Sun N, Polimanti R, Pietrzak R, Levey DF, Bryois J, Lu Q, Hu Y, Li B, Radhakrishnan K, et al. 2019. Genome-wide association study of post-traumatic stress disorder reexperiencing symptoms in >165,000 US veterans. Nat Neurosci. 22(9):1394–1401.
  • GTEx Consortium. 2013. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 45(6):580–585.
  • Gunnersen JM, Kim MH, Fuller SJ, De Silva M, Britto JM, Hammond VE, Davies PJ, Petrou S, Faber ES, Sah P, et al. 2007. Sez-6 proteins affect dendritic arborization patterns and excitability of cortical pyramidal neurons. Neuron. 56(4):621–639.
  • Haddad SA, Ruiz-Narvaez EA, Haiman CA, Sucheston-Campbell LE, Bensen JT, Zhu Q, Liu S, Yao S, Bandera EV, Rosenberg L, et al. 2016. An exome-wide analysis of low frequency and rare variants in relation to risk of breast cancer in African American Women: the AMBER Consortium. Carcinogenesis. 37(9):870–877.
  • Hennig KM, Fass DM, Zhao WN, Sheridan SD, Fu T, Erdin S, Stortchevoi A, Lucente D, Cody JD, Sweetser D, et al. 2017. WNT/β-catenin pathway and epigenetic mechanisms regulate the pitt-hopkins syndrome and schizophrenia risk gene TCF4. Mol Neuropsychiatry. 3(1):53–71.
  • Howard DM, Adams MJ, Clarke TK, Hafferty JD, Gibson J, Shirali M, Coleman JRI, Hagenaars SP, Ward J, Wigmore EM, et al. 2019. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 22(3):343–352.
  • Hyde CL, Nagle MW, Tian C, Chen X, Paciga SA, Wendland JR, Tung JY, Hinds DA, Perlis RH, Winslow AR. 2016. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat Genet. 48(9):1031–1036.
  • Jiang J, Chen X, Liu W, Zhao Y, Guan Y, Han Y, Wang F, Lu J, Yu Z, Du Z, et al. 2012. Correlation between human seizure-related gene 6 variants and idiopathic generalized epilepsy in a Southern Chinese Han population. Neural Regen Res. 7(2):96–100.
  • Joshi PS, Molyneaux BJ, Feng L, Xie X, Macklis JD, Gan L. 2008. Bhlhb5 regulates the postmitotic acquisition of area identities in layers II-V of the developing neocortex. Neuron. 60(2):258–272.
  • Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, et al. 2020. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 581(7809):434–443.
  • Karlsson Linner R, Biroli P, Kong E, Meddens SFW, Wedow R, Fontana MA, Lebreton M, Tino SP, Abdellaoui A, Hammerschlag AR, et al. 2019. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat Genet. 51(2):245–257.
  • Kessler RC, Birnbaum H, Bromet E, Hwang I, Sampson N, Shahly V. 2010. Age differences in major depression: results from the National Comorbidity Survey Replication (NCS-R). Psychol Med. 40(2):225–237.
  • Kettunen J, Tukiainen T, Sarin AP, Ortega-Alonso A, Tikkanen E, Lyytikainen LP, Kangas AJ, Soininen P, Wurtz P, Silander K, et al. 2012. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet. 44(3):269–276.
  • Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, Team N-E, Christiani DC, Wurfel MM, Lin X. 2012. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 91(2):224–237.
  • Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 536(7616):285–291.
  • Levinson DF. 2006. The genetics of depression: a review. Biol Psychiatry. 60(2):84–92.
  • Loukola A, Wedenoja J, Keskitalo-Vuokko K, Broms U, Korhonen T, Ripatti S, Sarin AP, Pitkaniemi J, He L, Happola A, et al. 2014. Genome-wide association study on detailed profiles of smoking behavior and nicotine dependence in a twin sample. Mol Psychiatry. 19(5):615–624.
  • Luciano M, Hansell NK, Lahti J, Davies G, Medland SE, Raikkonen K, Tenesa A, Widen E, McGhee KA, Palotie A, et al. 2011. Whole genome association scan for genetic polymorphisms influencing information processing speed. Biol Psychol. 86(3):193–202.
  • Lyons MJ, Eisen SA, Goldberg J, True W, Lin N, Meyer JM, Toomey R, Faraone SV, Merla-Ramos M, Tsuang MT. 1998. A registry-based twin study of depression in men. Arch Gen Psychiatry. 55(5):468–472.
  • Machiela MJ, Chanock SJ. 2015. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics. 31(21):3555–3557.
  • Machiela MJ, Chanock SJ. 2018. LDassoc: an online tool for interactively exploring genome-wide association study results and prioritizing variants for functional investigation. Bioinformatics. 34(5):887–889.
  • Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. 2010. Robust relationship inference in genome-wide association studies. Bioinformatics. 26(22):2867–2873.
  • mFinnGen. 2020. FinnGen Documentation of R3 release.
  • Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, Schork N, Sul JH, Iacono WG, McGue M. 2012. The Minnesota Center for Twin and Family Research genome-wide association study. Twin Res Hum Genet. 15(6):767–774.
  • Moayyeri A, Hammond CJ, Hart DJ, Spector TD. 2013. The UK Adult Twin Registry (TwinsUK Resource). Twin Res Hum Genet. 16(1):144–149.
  • Nishioka M, Bundo M, Ueda J, Yoshikawa A, Nishimura F, Sasaki T, Kakiuchi C, Kasai K, Kato T, Iwamoto K. 2018. Identification of somatic mutations in monozygotic twins discordant for psychiatric disorders. NPJ Schizophr. 4(1):7.
  • Otte C, Gold SM, Penninx BW, Pariante CM, Etkin A, Fava M, Mohr DC, Schatzberg AF. 2016. Major depressive disorder. Nat Rev Dis Primers. 2:16065.
  • Paracchini L, Beltrame L, Boeri L, Fusco F, Caffarra P, Marchini S, Albani D, Forloni G. 2018. Exome sequencing in an Italian family with Alzheimer's disease points to a role for seizure-related gene 6 (SEZ6) rare variant R615H. Alzheimers Res Ther. 10(1):106.
  • Pedersen BS, Quinlan AR. 2017. Who's who? Detecting and resolving sample anomalies in human DNA sequencing studies with peddy. Am J Hum Genet. 100(3):406–413.
  • Piccinelli M, Wilkinson G. 2000. Gender differences in depression. Critical review. Br J Psychiatry. 177:486–492.
  • Pigors M, Common JEA, Wong X, Malik S, Scott CA, Tabarra N, Liany H, Liu J, Limviphuvadh V, Maurer-Stroh S, et al. 2018. Exome sequencing and rare variant analysis reveals multiple filaggrin mutations in Bangladeshi families with atopic eczema and additional risk genes. J Invest Dermatol. 138(12):2674–2677.
  • Ross SE, McCord AE, Jung C, Atan D, Mok SI, Hemberg M, Kim TK, Salogiannis J, Hu L, Cohen S, et al. 2012. Bhlhb5 and Prdm8 form a repressor complex involved in neuronal circuit assembly. Neuron. 73(2):292–303.
  • Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, Gaunt TR. 2013. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat. 34(1):57–65.
  • Subaran RL, Odgerel Z, Swaminathan R, Glatt CE, Weissman MM. 2016. Novel variants in ZNF34 and other brain-expressed transcription factors are shared among early-onset MDD relatives. Am J Med Genet B Neuropsychiatr Genet. 171B(3):333–341.
  • Sullivan PF, Neale MC, Kendler KS. 2000. Genetic epidemiology of major depression: review and meta-analysis. Am J Psychiatry. 157(10):1552–1562.
  • Tsuang MT, Bar JL, Harley RM, Lyons MJ. 2001. The Harvard twin study of substance abuse: what we have learned. Harv Rev Psychiatry. 9(6):267–279.
  • Valdes AM, Loughlin J, Timms KM, van Meurs JJ, Southam L, Wilson SG, Doherty S, Lories RJ, Luyten FP, Gutin A, et al. 2008. Genome-wide association scan identifies a prostaglandin-endoperoxide synthase 2 variant involved in risk of knee osteoarthritis. Am J Hum Genet. 82(6):1231–1240.
  • Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC. 2016. SIFT missense predictions for genomes. Nat Protoc. 11(1):1–9.
  • Verweij KJ, Zietsch BP, Medland SE, Gordon SD, Benyamin B, Nyholt DR, McEvoy BP, Sullivan PF, Heath AC, Madden PA, et al. 2010. A genome-wide association study of Cloninger's temperament scales: implications for the evolutionary genetics of personality. Biol Psychol. 85(2):306–317.
  • Wallace C, Newhouse SJ, Braund P, Zhang F, Tobin M, Falchi M, Ahmadi K, Dobson RJ, Marcano AC, Hajat C, et al. 2008. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. Am J Hum Genet. 82(1):139–149.
  • Wang K, Li M, Hakonarson H. 2010. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16):e164.
  • Wedel M, Frob F, Elsesser O, Wittmann MT, Lie DC, Reis A, Wegner M. 2020. Transcription factor Tcf4 is the preferred heterodimerization partner for Olig2 in oligodendrocytes and required for differentiation. Nucleic Acids Res. 48(9):4839–4857.
  • Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, Adams MJ, Agerbo E, Air TM, Andlauer TMF, et al. 2018. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 50(5):668–681.
  • Xu C, Mullersman JE, Wang L, Bin Su B, Mao C, Posada Y, Camarillo C, Mao Y, Escamilla MA, Wang KS. 2013. Polymorphisms in seizure 6-like gene are associated with bipolar disorder I: evidence of gene x gender interaction. J Affect Disord. 145(1):95–99.
  • Xu ZP, Dutra A, Stellrecht CM, Wu C, Piatigorsky J, Saunders GF. 2002. Functional and structural characterization of the human gene BHLHB5, encoding a basic helix-loop-helix transcription factor. Genomics. 80(3):311–318.
  • Yu ZL, Jiang JM, Wu DH, Xie HJ, Jiang JJ, Zhou L, Peng L, Bao GS. 2007. Febrile seizures are associated with mutation of seizure-related (SEZ) 6, a brain-specific gene. J Neurosci Res. 85(1):166–172.
  • Zhan X, Hu Y, Li B, Abecasis GR, Liu DJ. 2016. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics. 32(9):1423–1426.