10,011
Views
61
CrossRef citations to date
0
Altmetric
Review

The genetics of Crohn’s disease and ulcerative colitis – status quo and beyond

, , &
Pages 13-23 | Received 02 Nov 2014, Accepted 18 Nov 2014, Published online: 19 Dec 2014

Abstract

The two major subtypes of inflammatory bowel disease (IBD), ulcerative colitis (UC, MIM#191390) and Crohn’s disease (CD, MIM#266600), are chronic relapsing-remitting inflammatory disorders affecting primarily the gastrointestinal tract. Prevalence rates in North America and Europe range from 21 to 246 per 100,000 for UC and 8 to 214 per 100,000 for CD. Although CD and UC share some clinical and pathological features, they can be distinguished by localization, endoscopic appearance, histology and behavior, which suggest differences in the underlying pathophysiology. The importance of genetic risk factors in disease etiology is high and has been documented more clearly for CD than for UC (relative sibling risks λs: 15–35 for CD, 6–9 for UC). The most recent and largest genetic association study for IBD, which employed genome-wide association data for over 75,000 patients and controls, established the association of 163 susceptibility loci with IBD. Although the disease variance explained by the 163 loci only amounts to 13.6% for CD and 7.5% for UC, the identified loci and the candidate genes within yielded valuable insights into the pathogenesis of IBD and the relevant disease pathways. We here review the current research on the genetics of IBD and provide insights into on current efforts as well as suggest topics for future research.

Inflammatory bowel disease as a figurehead for complex diseases: a plethora of genetic disease loci

Inflammatory bowel diseases (IBDs) are chronic inflammatory diseases, which affect different parts of the gastrointestinal tract. Susceptibility to IBD is determined mainly by environmental factors, which remain largely unknown. Genetic susceptibility plays an additional role in disease development and genetic variation can be studied at higher resolution and more accurately than environmental factors. Owing to technology developments, systematic genome-wide and large-scale international studies, the genetic risk map of IBD has grown tremendously. Most identified and robustly replicated loci have been detected by means of genome-wide association studies (GWAS) [Citation1, Citation2, Citation3]. GWAS are largely hypothesis-free approaches that identify common risk alleles, which are significantly more frequent in patients compared to healthy individuals. The number of discovered risk loci that are statistically associated with the disease has risen drastically through several independent systematic studies and meta-analyses as summarized in . We are aware of further studies that will report at least 40 additional novel loci, highlighting that mapping efforts continue and that more loci are likely to be identified in the future, in particular, revealing rare variant associations given the growing amounts of whole-exome/-genome resequencing data.

Table I. Large-scale association studies performed in IBD until 11/2014. The table likely misses important studies (mostly after 2010) and solely serves to illustrate that a large number of studies have been performed by many international groups and that sample sizes increased over time.

As most Crohn’s disease (CD) and ulcerative colitis (UC) patients carry a large number of common risk variants, the effect size of most individual risk variants is low (odds ratio (OR) <1.3; ). GWAS hence require large case–control sample collections in order to identify the aforementioned small effect sizes and in order to reach sufficient statistical evidence for a robust variant versus disease association. The currently accepted significance threshold is a p-value smaller than 5×10−8, under the assumption that appropriate and adjusted (ancestry, gender, other covariates of importance) analyses have been carried out, a threshold that is also known as ‘genome-wide significance’ [Citation4]. It has been assumed that all risk loci with a minor allele frequency >5% in the general population and an OR >1.2 have been identified to date in IBD patients with European ancestry by means of GWAS [Citation5]. One clearly has to distinguish between common and rare susceptibility variants. Common variants are more frequent than 1% in the general population. Current GWAS are usually designed with enough statistical power to detect such variants [Citation6]. Variants with <1% frequency are generally referred to as rare variants and variants that are observed only within single pedigrees are termed private variants (see e.g. in Cirulli and Goldstein for variant classification [Citation7]). If such private variants have arisen de novo, that is, the nonreference allele is absent in the parents but present in the child’s genome, this variant can be referred to as a true mutation. Since sequencing approaches identify common, rare, private and de novo variants at once (in contrast to microarrays where known single nucleotide polymorphisms [SNPs] are tested) these variants are collectively summarized as single nucleotide variants (SNVs). More than 163 independent autosomal genetic risk loci have been identified to this end for IBD, specifically for the two major IBD subtypes CD and UC. The disease variances explained by these 163 loci, actually comprising 193 statistically independent SNPs, are 13.6% and 7.5% for CD and UC, respectively [Citation8]. Twin and family studies for IBD have shown that a child has a 26-fold increased risk for developing CD when another sibling already has CD and the risk is increased 9-fold in the case of UC [Citation9]. The afore mentioned facts not only underline the importance of genetic factors but also illustrate that there are other contributing effects that need to be taken into account and that require future investigations.

Figure 1. Risk allele frequency versus odds ratio plot. Data for 163 index SNPs were extracted from the latest meta-analysis on inflammatory bowel disease (IBD) [Citation2]. The risk allele frequency is plotted on the X-axis and shows that most identified alleles have a minor allele frequency >5%. Alleles can be protective (risk allele frequency >0.50) or exert risk (<0.50). The Y-axis shows the odds ratio and that almost all identified risk variants have an odds ratio <1.30 (corresponding to >0.77 for protective variants). Colors indicate whether the particular variant reached genome-wide significance for Crohn’s disease (CD) (red), ulcerative colitis (UC) (blue) or both diseases (green). The by far strongest UC-associated locus was the noncoding SNP rs6927022 (p = 4.71 × 10-133; OR [odds ratio] = 1.44; 95% CI = (1.39–1.50)) near the class I gene HLA-DQA1. However, more relevant UC-associated HLA-DRB1 alleles as DRB1*15:01, DRB1*07:01 and DRB1*01:03 show much larger effect sizes [Citation20].

Figure 1. Risk allele frequency versus odds ratio plot. Data for 163 index SNPs were extracted from the latest meta-analysis on inflammatory bowel disease (IBD) [Citation2]. The risk allele frequency is plotted on the X-axis and shows that most identified alleles have a minor allele frequency >5%. Alleles can be protective (risk allele frequency >0.50) or exert risk (<0.50). The Y-axis shows the odds ratio and that almost all identified risk variants have an odds ratio <1.30 (corresponding to >0.77 for protective variants). Colors indicate whether the particular variant reached genome-wide significance for Crohn’s disease (CD) (red), ulcerative colitis (UC) (blue) or both diseases (green). The by far strongest UC-associated locus was the noncoding SNP rs6927022 (p = 4.71 × 10-133; OR [odds ratio] = 1.44; 95% CI = (1.39–1.50)) near the class I gene HLA-DQA1. However, more relevant UC-associated HLA-DRB1 alleles as DRB1*15:01, DRB1*07:01 and DRB1*01:03 show much larger effect sizes [Citation20].

Disease-specific versus shared associations

On the genetic level, 110 out of 163 loci are shared between CD and UC [Citation8], even when taking effect directions into account and not only positional overlap (which is actually common practice in many studies to show a shared etiology). This molecular similarity between CD and UC perhaps reflects the clinical similarities observed by the gastroenterologist between both diseases. It further indicates a heterogeneous and continuous disease spectrum and may explain why clinicians often fail to make a definitive diagnosis and why for a large fraction of patients the diagnosis changes during disease course. Given that IBD is an immune-mediated disease, it is not surprising that several disease variants cluster in pathways relevant for the innate (e.g. NOD2, ATG16L1) and adaptive immune system (IL23R, human leukocyte antigen [HLA] locus). Moreover, >50% of the known IBD loci overlap with those of other immune-mediated diseases. The strongest overlaps to date have been observed between IBD and ankylosing spondylitis, psoriasis, and primary sclerosing cholangitis. The following review articles address the shared etiology of IBD and related diseases in detail [Citation10, Citation11]. The observation that distinct diseases affecting different organs (e.g. intestine vs. joints vs. skin) have similar genetic etiologies suggests two potentially fruitful future directions: first, clinicians working in different disciplines such as gastroenterology and dermatology should interact even more not only in research but also on the clinical level (e.g. treatments for disease A might also work in disease B, joint discussions about ‘complicated patients’ in analogy to cancerous diseases and comprehensive cancer centers). Second, systematic genetic cross-phenotype association studies should be carried out in order to dissect the disease associations even further and to identify additional associations, also of pleiotropic nature. It is not yet clear for most joint loci it is not yet clear whether (a) the same variants are associated and if (b) effect directions are similar. And while it is easier to identify joint effects, it remains a challenge to annotate a given variant as disease-specific because confidence intervals (CI) of OR are often overlapping. Therefore, studies reporting disease-specific associations should be handled with caution and replication studies or rigorous thresholds are required for claiming a disease-specific finding. Often, patients also have more than one disease and it will be a challenge to account for comorbidities – while in some instances the secondary disease is yet unknown because most cases are not followed up longitudinally – in association studies.

Further fine mapping is needed

Although the costs for sequencing complete human exomes (the entirety of all known human exons) and even genomes have significantly dropped in the last years, large-scale resequencing studies for IBD, to an extent as they were carried out using SNP arrays, are still scarce and work in progress. Only a few targeted sequencing efforts, most of them employing DNA pooling approaches, have been carried out for IBD [Citation12, Citation13, Citation14]. Therefore, a comprehensive list of rare and also private variants relevant for IBD will still require several years of research. To this end, for most GWAS loci it cannot be concluded that an associated SNP is actually related to the disease on a biological level, that is, being the actual causative variant, or whether the index SNP is just a marker for a neighboring causative variant. Functional experiments are required to ultimately prove the disease-relevance of a particular candidate gene or risk variant. Most of the associated genomic regions contain more than one gene (∼1444 candidate genes before and 300 after in silico prioritization in Jostins et al. study [Citation8]) and some do not even contain any known coding sequence. Notably, twice as many of the IBD susceptibility variants are predicted to impact gene expression (so-called Expression quantitative trait loci [eQTL] association) compared to index variants that are coding or in linkage disequilibrium with a known coding variant. Functional experiments are even more important, or rather essential, for private variants, as classical statistical approaches to prove disease-relevance are not applicable to singular observations. Besides, from a biological and also statistical point of view, more than one variant in a given risk locus can independently contribute to disease risk. However, many studies report only an ‘index SNP’ with the smallest p-value for the identified genetic loci. To address this shortcoming, concerted fine mapping efforts are on the way for the known genetic risk loci employing large and dense array data sets. Similar studies have been completed for other diseases, such as psoriasis [Citation15], and have shown an additional layer of complexity and allelic heterogeneity at known risk loci. In addition, more transethnic genetic studies are needed for IBD in order to explore the genetic architecture of IBD risk across different populations [Citation16].

Role of genetic variants in the HLA complex

The highly polymorphic genomic region on chromosome 6p21 from megabases 25 to 34 is known as the major histocompatibility complex (MHC) or as the HLA region. It contains many genes that are relevant to immune system function. Pinpointing the causative variants and even the exact association signals in the HLA remains a challenge.

The by far strongest UC-associated locus in the largest genetic study for IBD to date [Citation8] was SNP rs6927022 (p = 4.71 × 10−133; OR = 1.44; 95% CI = (1.39–1.50)) near the class I gene HLA-DQA1. The association signal in the HLA region for CD proved to be strongest for SNP rs9264942 (p = 4.96 × 10−28; OR = 1.15; 95% CI = (1.11–1.18)) located in the HLA-B gene locus of the MHC class I region. Even linkage studies, which were carried out over 10 years ago, identified the HLA-locus as a genome-wide significant genetic risk locus for IBD [Citation17], formerly known as the IBD3 locus. Much progress has been made since this linkage era in mapping the IBD-relevant signals within the HLA, one of the most polymorphic regions in the genome with extensive linkage disequilibrium and a high gene density. Furthermore, the HLA is an ancestry-informative locus and diverse across different ethnicities. Several association scenarios may exist and they may be different in CD and in UC: (1) certain class I alleles may be associated, (2) class II alleles may play the prominent role, (3) association signals may reside outside the classical class I and II loci, or (4) a combination of (1)–(3) may be true. Recent decrease in cost for HLA typing – and more prominently imputing classical HLA alleles [Citation18, Citation19] – have facilitated large-scale in silico HLA fine mapping approaches using existing dense GWAS SNP array data. The latest fine mapping study of the IBD association signal on 6p21 was carried out by Duerr and colleagues [Citation20]. They analyzed 562 UC patients, 611 CD patients and 1428 control individuals analyzed for 10,347 SNPs within the HLA region (29.299–33.884 Mb; National Center for Biotechnology Information [NCBI] build hg18). While only suggestive association evidence was obtained for CD (rs17880124; p = 3.82 × 10−5; OR = 2.23; 95% CI = (1.52–3.27) for the G allele), the best UC association signal was narrowed down to the single amino acid position 11 of the HLA-DRB1 P6 antigen-binding pocket. The best SNP signal was observed for rs2647025 (p = 1.94 × 10−12; OR = 1.95 for the G allele; 95% CI = (1.62–2.35)), which was in moderate linkage disequilibrium (LD) (r2 = 0.63) with the previously reported SNP rs9268853 of the IIBDGC UC GWAS meta-analysis [Citation3]. Including classical HLA alleles, allele groups and binary HLA amino acid information in the analyses with the SNP data, resulted in rs9269955 as the most significant association with UC in the HLA (p = 2.67 × 10−13; OR = 0.51; 95% CI = (0.43–0.61) for allele C). rs9269955 determines the codon for the aforementioned amino acid position 11 where six different amino acid alleles are observed in the population. The relevant UC-associated HLA-DRB1 alleles were DRB1*15:01, DRB1*07:01 and DRB1*01:03 with the latter having the largest effect size of 38.39 (95% CI: 7.50–169.60).

Given that the association of variants within the HLA region with both UC and CD are very strong with large effect sizes – and given the particular importance of this region in immunity in general – the understanding of IBD pathophysiology will only be accomplished by understanding the causative nature of the HLA association in more detail.

For CD, HLA associations have been reported in several studies (e.g. SNP rs1799964 in Franke et al. CD meta-analysis [Citation2]); however, large coordinated fine mapping efforts employing more than 60,000 samples are on their way to accurately map signals and to compare the association signal to UC in more detail (Goyette and Boucher et al., submitted). However, array-based mapping approaches have a limited resolution and will likely not provide answers to all open questions concerning the HLA association. The identification of a potentially existing antigen that contributes to the outbreak of IBD will ultimately require synergistic and complementary approaches such as, for example, T-/B-cell, repertoire sequencing and peptide elutions with subsequent analyses [Citation21, Citation22].

Clinical relevance of genetic findings

One remarkable observation is that even though many individuals carry risk alleles, they do not develop one of the diseases (see in Festen et al. [Citation23]). It may be of interest to stratify healthy individuals and patients by their cumulative genetic risk. For example, individuals from the extreme tails of the risk score distribution can be selected and analyzed in more detail (e.g. run stratified expression analyses). Is a healthy person with an extremely high IBD risk score maybe not healthy, that is, has other symptoms? Is this group maybe having subclinical disease symptoms? Is there a strong environmental factor, such as a chronic infection with a pathogen, in patients with an extremely low genetic risk (and without any ‘attractive’ rare/private variants)? However, it has to be kept in mind that most of these risk alleles only increase the total risk by a modest amount. Furthermore, even if an individual has a high cumulative risk score, the disease might not develop. Potential explanations are that it does not only matter how many risk alleles an individual carries but also what kind of combinations of risk alleles are present. Furthermore, the lifestyle and the exposition to other environmental factors seem to play the most important role.

Figure 2. Inflammatory bowel disease (IBD) etiology as an example for a complex chronic inflammatory disease. In most patients (prevalence illustrated by dashed curve), IBD develops through (mostly unknown) environmental factors – also including the gut microbiome – that act on a genetically predisposed host. Yet, cases have been described where highly penetrant genetic mutations cause (monogenic) IBD. It is still discussed how relevant infections are in the onset of IBD. The gene–environment interactions (red arrows) are important in disease etiology and should be studied in more detail in the future. While genome-wide association studies (GWAS) are successful to identify common disease loci, linkage studies are more powerful to identify monogenic forms of IBD. Recently, exome-sequencing has been employed for these early-onset forms since next-generation sequencing provides single base pair resolution for mutation detection and also allows for de novo mutation detection. Modified from Kaser et al. [Citation83].

Figure 2. Inflammatory bowel disease (IBD) etiology as an example for a complex chronic inflammatory disease. In most patients (prevalence illustrated by dashed curve), IBD develops through (mostly unknown) environmental factors – also including the gut microbiome – that act on a genetically predisposed host. Yet, cases have been described where highly penetrant genetic mutations cause (monogenic) IBD. It is still discussed how relevant infections are in the onset of IBD. The gene–environment interactions (red arrows) are important in disease etiology and should be studied in more detail in the future. While genome-wide association studies (GWAS) are successful to identify common disease loci, linkage studies are more powerful to identify monogenic forms of IBD. Recently, exome-sequencing has been employed for these early-onset forms since next-generation sequencing provides single base pair resolution for mutation detection and also allows for de novo mutation detection. Modified from Kaser et al. [Citation83].

On monogenic and oligogenic forms of IBD

While IBD is regarded as a genetically complex disease with a large number of variants, each contributing a small part of the total heritability, monogenic and oligogenic forms of IBD also exist (see left part of ). While the disease onset for CD as well as UC can occur at any age, the peak incidence is in early adulthood. Monogenic and oligogenic forms, however, usually manifest earlier in life. Early-onset cases of IBD, with a disease manifestation during the first 18 years of life, also often show a more severe disease course with a higher risk of complications, very-early-onset cases even develop the disease during the first 8 years of life. GWAS and candidate gene sequencing studies performed specifically for early-onset IBD have identified several genes and variants that seem to be exclusively associated with this form of IBD. Nevertheless, there is significant overlap between the genetic causes of early- and adult-onset IBD. A large spectrum of monogenic diseases can also present with IBD-like intestinal inflammation [Citation24]. The underlying causes include defects of the intestinal barrier function, immunodeficiencies with a disturbed granulocyte and phagocyte activity, hyper- as well as auto-inflammatory effects and disturbed B- and T-lymphocyte function.

Several studies have identified shared genetic factors underlying monogenic or early-onset and adult-onset IBD cases with rather oligogenic or polygenic causes. A genetic linkage analysis followed by candidate gene sequencing in two consanguineous families with children affected by early-onset IBD during the first years of life revealed homozygous mutations in genes for the IL10 receptor (IL10R) subunit proteins [Citation25] to be causative. In two other patients, who did not show variants in the IL10RA/B genes, distinct mutations in the IL10 gene itself were shown to be disease-causing [Citation26]. IL10 was also associated with adult-onset UC [Citation27] and CD [Citation2] in GWAS. An autosomal-recessive case of inflammatory skin and bowel disease in two siblings with consanguineous parents was shown to result from a homozygous deletion in ADAM17 (ADAM metallopeptidase domain 17) by using a combination of SNP-homozygosity mapping and targeted NGS-Sequencing of the linkage regions [Citation28]. Although the direct overlap between key genes that have been associated with IBD and IBD-like monogenic disorders is surprisingly low [Citation24], the affected proteins often interact directly or indirectly with each other and share common signaling cascades that contribute to IBD etiology. Exome sequencing of a male patient with a severe case of CD during the first year of life recently revealed a de novo nonsense variant in the X-linked inhibitor of apoptosis gene [Citation29]. Targeted sequencing in ∼1,300 CD patients then revealed two additional missense and one nonsense variant each private to patients that showed an age of onset between 13 and 16. Variants in one gene can thereby have a varying impact on disease manifestation and the same gene can simultaneously be involved in monogenic early-onset as well as oligo- or polygenic adult-onset cases. The analyses of monogenic forms of IBD therefore have the potential to give important insights into further mechanisms contributing to the common forms of IBD. For more details on early-onset monogenic IBD, we recommend reading the reviews by Uhlig and colleagues [Citation24, Citation30].

From genetics to mechanistic insights

Current therapy of IBD symptoms cannot prevent the need for surgery in more than half of the patients. Therefore, novel treatment strategies and a detailed understanding of the disease causes are of high importance. Learning about the relevant biological pathways and proteins that are influenced by the genetic susceptibility variants may help in defining new targeted therapies and the pathways may also point at important environmental risk factors that should be studied. Several excellent review articles have summarized the current understanding of the pathophysiology of IBD and how the identified genetic susceptibility factors may contribute to disease etiology. Perhaps one of the most exhaustive review articles, which we recommend reading for understanding IBD pathophysiology, has been published by Kaser et al. [Citation31]. Several other recent reviews have elucidated the mechanistic insights and pathways gained through genetic studies [Citation6, Citation32]. One should be cautious though in interpreting genetic association loci as most of these loci, contain more than one candidate gene as mentioned previously (in contrast to the monogenic forms where interpretation of loci is much more straightforward). Therefore, well-designed mechanistic studies are needed to prove the disease-relevance of a particular candidate gene/variant in a given GWAS locus. Of course, in silico prioritization approaches may be useful to narrow down the list of potential candidate genes in a GWAS locus; however, although these approaches may be very accurate, results should be handled with care until the functional proof is ultimately available. Often, candidate genes are also (cherry-)picked based on the previous experience and knowledge of the interpreting scientist. Only a handful of candidate genes and pathways exist for IBD where there is no doubt that these are playing a causative role in disease etiology, for example NOD2, IL10, IL23R and ATG16L1. The latter is an excellent example how genetic findings fuel subsequent functional studies. After our initial discovery of a disease-associated coding variant in ATG16L1 in 2007 [Citation33] – and hence the first implication of the autophagy pathway in IBD disease etiology – several high-impact articles were published that elucidated the role of ATG16L1 in IBD in more detail, also employing animal models [Citation34, Citation35, Citation36, Citation37, Citation38, Citation39, Citation40]. While no publication mentioned the gene ATG16L1 in its abstract before the initial discovery in 2007, more than 281 publications have listed it afterward (until November 2, 2014). Out of these 281 publications, 203 abstracts also mention the disease terms CD or IBD (PubMed query: ‘ATG16L1 AND [‘Crohn’ OR ‘Crohn’s’ OR ‘inflammatory bowel disease’ OR ‘inflammatory bowel diseases’]). Another 177 independent manuscripts mentioned the word ‘autophagy’ and CD/IBD in their manuscript. Therefore, we feel that genetic association studies were a worthwhile undertaking, in particular in IBD genetics research. GWAS point to regions, candidate genes, functional elements or pathways that should be studied in greater depth. In addition, we have learnt that the complex diseases CD and UC are much more than different diseases with an aberrant T-cell response (past view of IBD pathophysiology), but rather very related diseases with several disturbed cellular pathways (autophagy, ER stress response, innate immunity, epithelial barrier dysfunction, …; see in [Citation6] for details). Exome sequencing for cases with monogenic IBD forms nicely complements the GWAS approach and points to additional or similar pathways as identified in GWAS.

Conclusion and outlook

Although the genetic risk map for IBD has grown tremendously in the last decade, the exact disease etiology and disease cause remain largely unknown. One of the great challenges of future IBD research is to understand how the disease-associated mutations actually contribute to disease etiology. When it comes to predicting disease or clinical subphenotypes, larger studies have been published, showing that this is not sufficiently accurate enough for clinical purposes as most of the identified variants are common and have low penetrance [Citation41, Citation42]. Therefore, and due to the large environmental component in disease development, it is likely that genetics alone won’t succeed in predicting disease outbreak and course, but that the genetic risk maps will serve as an important puzzle piece in a statistical model that considers nongenetic data in parallel. For rare, severe early-onset forms of IBD the situation is different as compared to the more common and complex disease forms (). In patients presenting with severe, early-onset IBD, a single rare/private variant or a mutation may be the only cause; however, incomplete penetrance and allelic heterogeneity may also blur predictions in these cases. Nevertheless, diagnostic candidate gene resequencing should be carried out in these early-onset patients in clinical practice.

The large number of genetic findings demands for large-scale functional genomics analyses and cannot only rely on mouse models. However, introducing specific and multiple mutations into the mouse genome has become easier and quicker with the advent of the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas engineering technique [Citation43].

According to the ENCODE project [Citation44] most disease loci overlap with regulatory regions [Citation45]. It is likely that these loci in noncoding regions play a regulatory role and influence the activation and concentration of proteins that are relevant to immune system function and behavior. It may be a greater challenge to characterize such regulatory variants, however, first attempts have been made, as in the case of PRDM1 [Citation46]. The common CD risk allele C at SNP rs7746082, identified in GWAS, correlates with reduced expression of PRDM1 in ileal biopsy specimens and peripheral blood mononuclear cells. Notably, exome sequencing may not identify the relevant susceptibility variants in most patients as >84% of the IBD-associated variants are not located in exons (∼16% are nonsynonymous variants) and as more than ∼36% are eQTLs, thus affecting gene expression [Citation5]. This fact plus that larger genetic variants – such as copy number variants, insertions/deletions, structural variation and mobile element insertions – are understudied in IBD genetics research (the large CD-associated deletion upstream of the IRGM gene being one of the few IBD CNV success stories [Citation47]), emphasizes that large-scale whole genome sequencing efforts are urgently needed. Moreover, larger gene expression data sets for the disease-relevant tissues and cell types should be generated to allow for high-resolution and more eQTL studies since eQTLs are often cell-specific [Citation48].

The aforementioned ongoing and future genetic studies in IBD will further increase the resolution of the IBD genetic risk map. However, the known phenomenon of the ‘missing heritability’ will likely remain after the mapping efforts [Citation49]. Therefore, other opportunities to identify heritable components should be considered [Citation50]. One opportunity could be to study epigenetic factors that may be transmitted transgenerationally. A review from Jirtle and Skinner provides a good overview on this topic [Citation51]. More recently, several studies in mice presented evidence that different environmental exposures can lead to epigenetic changes in the next generations [Citation52]. Especially the exposed sperm epigenome of male mice has received a lot of attention recently [Citation53, Citation54, Citation55]; in accordance with a zebra fish study [Citation56]. Human studies exist that suggest an important role of the parental epigenome ‘recording’ environmental exposures and a subsequent inheritance of this ‘recorded information’ to the next generation (mostly in utero environmental exposure that affects F1 generation) [Citation53]. Only if an effect is seen in the F3 generation, since an exposure in a pregnant woman may directly affect the F1 and F2 generation, one can speak of a true transgenerational effect. In the paternal model, a phenotype change must be also seen in the F2 generation. Therefore, it may be fruitful to study whether certain environmental factors can induce epigenetic changes that are heritable and that could further explain the familial occurrence of IBD. Moreover, deleterious epigenetic changes may be solely responsible for disease onset in some patients. The picture becomes more complex if one considers that genetic susceptibility factors make a host susceptible to certain environmental exposures, which in turn may influence the epigenomes of cells and that these epigenetic changes could be inherited to offspring cells or children. Interestingly a recent study found that exposure to DDT resulted in obesity in half of the rats in the F3 generation (every second rat had obesity). No effect of the epimutations was observed in the F1 and F2 generation, showing that multigenerational studies are necessary to study transgenerational epigenetic inheritance [Citation57].

It has been hypothesized that host–bacteria interactions in the gastrointestinal tract trigger chronic inflammation in the genetically susceptible host. In fact, several studies indicate that IBD patients have a dysbiosis of the intestinal microbiota [Citation58]. Protective and risk-associated bacterial genera were further identified and transplantation/co-housing experiments in mice even point towards a disease-causing (and not only disease-correlated) role of the microbiota [Citation59]. It has been speculated – and not yet convincingly demonstrated – that the genetic IBD association signals are enriched for genes that are relevant for host–microbiota interaction in the intestine [Citation60]. Jostins et al. have also carried out preliminary selection analyses for the 163 IBD loci, which shows a remarkable enrichment of evolutionary signatures in the IBD loci. Since the host and the microbiota have coevolved, genetic data from patients should be jointly analyzed with gut microbiome data from the same patients and proper association/interaction tests be carried out. First candidate gene studies (e.g. for FUT2 and IRGM) in humans indicate that genome-wide host–microbiota association studies may be as fruitful as GWAS [Citation61, Citation62].

In addition, more information about the lifestyle of patients could prove to be useful. More importantly, large longitudinal cohort studies are needed to ‘follow individuals into disease.’ Individuals with subclinical disease may be the ideal study cohort to identify predictive biomarkers and also the actual factors that will drive the development of clinical symptoms. Once the disease is chronic, it may be too late to change certain triggers, leaving only the option to treat clinical symptoms, as is the current state-of-the-art in clinical practice (anti-inflammatory drugs).

Big and multilevel omics data, including large genomic resequencing and genetic data, steadily increase for IBD and its related diseases. Concerted and interdisciplinary efforts of, for example, analysts, clinicians and molecular biologists are needed to identify meaningful results from the data deluge. We encourage all researchers to share their data and methods as open as possible in order to comprehensively understand this complex disease and we look forward to (contribute to) the next 10 years of IBD (genetics) research.

Acknowledgement

The authors are supported by the German Ministry of Education and Research (BMBF), the Deutsche Forschungsgemeinschaft (DFG) the Cluster of Excellence ‘Inflammation at Interfaces’ and the DFG PhD Research Training Group ‘Genes, Environment and Inflammation’ No. 1743. Andre Franke receives an endowment professorship (Peter Hans Hofschneider Professorship) of the ‘Stiftung Experimentelle Biomedizin’ located in Zuerich, Switzerland.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

  • JCBarrett, SHansoul, DLNicolae, JHCho, RHDuerr, JDRioux, et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat Genet 2008;40:955–62.
  • AFranke, DPMcGovern, JCBarrett, KWang, GLRadford-Smith, TAhmad, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 2010;42:1118–25.
  • CAAnderson, GBoucher, CWLees, AFranke, MD’Amato, KDTaylor, et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 2011;43:246–52.
  • IPe’er, RYelensky, DAltshuler, MJDaly. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 2008;32:381–5.
  • JZLiu, CAAnderson. Genetic studies of Crohn’s disease: past, present and future. Best Pract Res Clin Gastroenterol 2014;28:373–86.
  • BKhor, AGardet, RJXavier. Genetics and pathogenesis of inflammatory bowel disease. Nature 2011;474:307–17.
  • ETCirulli, DBGoldstein. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010;11:415–25.
  • LJostins, SRipke, RKWeersma, RHDuerr, DPMcGovern, KYHui, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 2012;491:119–24.
  • MBBengtson, CSolberg, GAamodt, JJahnsen, BMoum, JSauar, et al. Clustering in time of familial IBD separates ulcerative colitis from Crohn’s disease. Inflamm Bowel Dis 2009;15:1867–74.
  • MParkes, ACortes, DAvan Heel, MABrown. Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nat Rev Genet 2013;14:661–73.
  • CWLees, JCBarrett, MParkes, JSatsangi. New IBD genetics: common pathways with other diseases. Gut 2011;60:1739–53.
  • MARivas, MBeaudoin, AGardet, CStevens, YSharma, CKZhang, et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet 2011;43:1066–73.
  • YMomozawa, MMni, KNakamura, WCoppieters, SAlmer, LAmininejad, et al. Resequencing of positional candidates identifies low frequency IL23R coding variants protecting against inflammatory bowel disease. Nat Genet 2011;43:43–7.
  • MBeaudoin, PGoyette, GBoucher, et al. Deep resequencing of GWAS loci identifies rare variants in CARD9, IL23R and RNF186 that are associated with ulcerative colitis. PLoS Genet 2013;9:e1003723.
  • SDas, PEStuart, JDing, TTejasvi, YLi, LCTsoi, et al. Fine mapping of eight psoriasis susceptibility loci. Eur J Hum Genet 2014. [ Epub ahead of print] [Accessed 3 September 2014].
  • SCNg, KKTsoi, MAKamm, BXia, JWu, FKChan, et al. Genetics of inflammatory bowel disease in Asia: systematic review and meta-analysis. Inflamm Bowel Dis 2012;18:1164–76.
  • DAvan Heel, SAFisher, AKirby, MJDaly, JDRioux, CMLewis; Genome Scan Meta-Analysis Group of the IBD International Genetics Consortium. Inflammatory bowel disease susceptibility loci defined by genome scan meta-analysis of 1952 affected relative pairs. Hum Mol Genet 2004;13:763–70.
  • XJia, BHan, SOnengut-Gumuscu, WMChen, et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 2013;8(6):e64683.
  • ATDilthey, LMoutsianas, SLeslie, GMcVean. HLA*IMP–an integrated framework for imputing classical HLA alleles from SNP genotypes. Bioinformatics 2011;27(7):968–72.
  • JPAchkar, LKlei, PIde Bakker, GBellone, NRebert, RScott, et al. Amino acid position 11 of HLA-DRbeta1 is a major determinant of chromosome 6p association with ulcerative colitis. Genes Immun 2012;13:245–52.
  • DJWoodsworth, MCastellarin, RAHolt. Sequence analysis of T-cell repertoires in health and disease. Genome Med 2013;5:98.
  • JDengjel, PDecker, OSchoor, FAltenberend, TWeinschenk, HGRammensee, et al. Identification of a naturally processed cyclin D1 T-helper epitope by a novel combination of HLA class II targeting and differential mass spectrometry. Eur J Immunol 2004;34:3644–51.
  • EAFesten, RKWeersma. How will insights from genetics translate to clinical practice in inflammatory bowel disease? Best Pract Res Clin Gastroenterol 2014;28:387–97.
  • HHUhlig. Monogenic diseases associated with intestinal inflammation: implications for the understanding of inflammatory bowel disease. Gut 2013;62:1795–805.
  • EOGlocker, DKotlarz, KBoztug, EMGertz, AASchäffer, FNoyan, et al. Inflammatory bowel disease and mutations affecting the interleukin-10 receptor. N Engl J Med 2009;361:2033–45.
  • EOGlocker, NFrede, MPerro, NSebire, MElawad, NShah, et al. Infant colitis – it’s in the genes. Lancet 2010;376:1272.
  • AFranke, TBalschun, THKarlsen, JSventoraityte, SNikolaus, GMayr, et al. Sequence variants in IL10, ARPC2 and multiple other loci contribute to ulcerative colitis susceptibility. Nat Genet 2008;40:1319–23.
  • DCBlaydon, PBiancheri, WLDi, VPlagnol, RMCabral, MABrooke, et al. Inflammatory skin and bowel disease linked to ADAM17 deletion. N Engl J Med 2011;365:1502–8.
  • YZeissig, BSPetersen, SMilutinovic, EBosse, GMayr, KPeuker, et al. XIAP variants in male Crohn’s disease. Gut 2014. [ Epub ahead of print] [Accessed 26 February 2014].
  • HHUhlig, TSchwerd, SKoletzko, NShah, JKammermeier, AElkadri, et al. The Diagnostic Approach to Monogenic Very Early Onset Inflammatory Bowel Disease. Gastroenterology 2014;147(5):990–1007.
  • AKaser, SZeissig, RSBlumberg. Inflammatory bowel disease. Annu Rev Immunol 2010;28:573–621.
  • DBGraham, RJXavier. From genetics of inflammatory bowel disease towards mechanistic insights. Trends Immunol 2013;34:371–8.
  • JHampe, AFranke, PRosenstiel, ATill, MTeuber, KHuse, et al. A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nat Genet 2007;39:207–11.
  • KCadwell, JYLiu, SLBrown, HMiyoshi, JLoh, JKLennerz, et al. A key role for autophagy and the autophagy gene Atg16l1 in mouse and human intestinal Paneth cells. Nature 2008;456:259–63.
  • KCadwell, KKPatel, NSMaloney, TCLiu, ACNg, CEStorer, et al. Virus-plus-susceptibility gene interaction determines Crohn’s disease gene Atg16L1 phenotypes in intestine. Cell 2010;141:1135–45.
  • AMMarchiando, DRamanan, YDing, LEGomez, VMHubbard-Lucey, KMaurer, et al. A deficiency in the autophagy gene Atg16L1 enhances resistance to enteric bacterial infection. Cell Host Microbe 2013;14:216–24.
  • VMHubbard-Lucey, YShono, KMaurer, MLWest, NVSinger, CGZiegler, et al. Autophagy Gene Atg16l1 Prevents Lethal T Cell Alloreactivity Mediated by Dendritic Cells. Immunity 2014;41:579–91.
  • AMurthy, YLi, IPeng, MReichelt, AKKatakam, RNoubade, et al. A Crohn’s disease variant in Atg16l1 enhances its degradation by caspase 3. Nature 2014;506:456–62.
  • TEAdolph, MFTomczak, LNiederreiter, HJKo, JBöck, EMartinez-Naves, et al. Paneth cells as a site of origin for intestinal inflammation. Nature 2013;503:272–6.
  • TSaitoh, NFujita, MHJang, SUematsu, BGYang, TSatoh, et al. Loss of the autophagy protein Atg16L1 enhances endotoxin-induced IL-1beta production. Nature 2008;456:264–8.
  • ICleynen, JRGonzalez, CFigueroa, AFranke, DMcGovern, MBortlík, et al. Genetic factors conferring an increased susceptibility to develop Crohn’s disease also influence disease phenotype: results from the IBDchip European Project. Gut 2013;62:1556–65.
  • LJostins, JCBarrett. Genetic risk prediction in complex disease. Hum Mol Genet 2011;20:R182–8.
  • HWang, HYang, CSShivalila, MMDawlaty, AWCheng, FZhang, et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 2013;153:910–18.
  • EPConsortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57–74.
  • MTMaurano, RHumbert, ERynes, REThurman, EHaugen, HWang, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 2012;337:1190–5.
  • DEllinghaus, HZhang, SZeissig, SLipinski, ATill, TJiang, et al. Association between variants of PRDM1 and NDP52 and Crohn’s disease, based on exome sequencing and functional studies. Gastroenterology 2013;145:339–47.
  • SAMcCarroll, AHuett, PKuballa, SDChilewski, ALandry, PGoyette, et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat Genet 2008;40:1107–12.
  • CDBrown, LMMangravite, BEEngelhardt. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet 2013;9:e1003649.
  • TAManolio, FSCollins, NJCox, DBGoldstein, LAHindorff, DJHunter, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747–53.
  • EEEichler, JFlint, GGibson, AKong, SMLeal, JHMoore, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010;11:446–50.
  • RLJirtle, MKSkinner. Environmental epigenomics and disease susceptibility. Nat Rev Genet 2007;8:253–62.
  • BRCarone, LFauquier, NHabib, JMShea, CEHart, RLi, et al. Paternally induced transgenerational environmental reprogramming of metabolic gene expression in mammals. Cell 2010;143:1084–96.
  • VHughes. Epigenetics: The sins of the father. Nature 2014;507:22–4.
  • BGDias, KJRessler. Parental olfactory experience influences behavior and neural structure in subsequent generations. Nat Neurosci 2014;17:89–96.
  • EJRadford, MIto, HShi, JACorish, KYamazawa, EIsganaitis, et al. In utero effects. In utero undernourishment perturbs the adult sperm methylome and intergenerational metabolism. Science 2014;345:1255903.
  • LJiang, JZhang, JJWang, LWang, LZhang, GLi, et al. Sperm, but not oocyte, DNA methylome is inherited by zebrafish early embryos. Cell 2013;153:773–84.
  • MKSkinner, MManikkam, RTracey, CGuerrero-Bosagna, MHaque, EENilsson. Ancestral dichlorodiphenyltrichloroethane (DDT) exposure promotes epigenetic transgenerational inheritance of obesity. BMC Med 2013;11:228.
  • DGevers, SKugathasan, LADenson, YVázquez-Baeza, WVan Treuren, BRen, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 2014;15:382–92.
  • ACouturier-Maillard, TSecher, ARehman, SNormand, ADe Arcangelis, RHaesler, et al. NOD2-mediated dysbiosis predisposes mice to transmissible colitis and colorectal cancer. J Clin Invest 2013;123:700–11.
  • PRosenstiel, CSina, AFranke, SSchreiber. Towards a molecular risk map – recent advances on the etiology of inflammatory bowel disease. Semin Immunol 2009;21:334–45.
  • PRausch, ARehman, SKunzel, RHäsler, SJOtt, SSchreiber, et al. Colonic mucosa-associated microbiota is influenced by an interaction of Crohn disease and FUT2 (Secretor) genotype. Proc Natl Acad Sci USA 2011;108:19030–5.
  • CQuince, EELundin, ANAndreasson, DGreco, JRafter, NJTalley, et al. The impact of Crohn’s disease genes on healthy human gut microbiota: a pilot study. Gut 2013;62:952–4.
  • KYamazaki, DMcGovern, JRagoussis, MPaolucci, HButler, DJewell, et al. Single nucleotide polymorphisms in TNFSF15 confer susceptibility to Crohn’s disease. Hum Mol Genet 2005;14:3499–506.
  • RHDuerr, KDTaylor, SRBrant, JDRioux, MSSilverberg, MJDaly, et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 2006;314:1461–3.
  • AFranke, JHampe, PRosenstiel, CBecker, FWagner, RHäsler, et al. Systematic association mapping identifies NELL1 as a novel IBD disease gene. PLoS One 2007;2:e691.
  • JVRaelson, RDLittle, ARuether, HFournier, BPaquin, PVan Eerdewegh, et al. Genome-wide association study for Crohn’s disease in the Quebec Founder Population identifies multiple validated disease loci. Proc Natl Acad Sci USA 2007;104:14747–52.
  • CLibioulle, ELouis, SHansoul, CSandor, FFarnir, DFranchimont, et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet 2007;3:e58.
  • The Welcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661–78.
  • SAFisher, MTremelling, CAAnderson, RGwilliam, SBumpstead, NJPrescott, et al. Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn’s disease. Nat Genet 2008;40:710–12.
  • SKugathasan, RNBaldassano, JPBradfield, PMSleiman, MImielinski, SLGuthery, et al. Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease. Nat Genet 2008;40:1211–15.
  • MSSilverberg, JHCho, JDRioux, DPMcGovern, JWu, VAnnese, et al. Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet 2009;41:216–20.
  • KAsano, TMatsushita, JUmeno, NHosono, ATakahashi, TKawaguchi, et al. A genome-wide association study identifies three new susceptibility loci for ulcerative colitis in the Japanese population. Nat Genet 2009;41:1325–9.
  • Consortium UIG. JCBarrett, JCLee, CWLees, NJPrescott, CAAnderson, APhillips, et al. Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat Genet 2009;41:1330–4.
  • MImielinski, RNBaldassano, AGriffiths, RKRussell, VAnnese, MDubinsky, et al. Common variants at five new loci associated with early-onset inflammatory bowel disease. Nat Genet 2009;41:1335–40.
  • AFranke, TBalschun, CSina, DEllinghaus, RHäsler, GMayr, et al. Genome-wide association study for ulcerative colitis identifies risk loci at 7q22 and 22q13 (IL17REL). Nat Genet 2010;42:292–4.
  • KWang, RBaldassano, HZhang, HQQu, MImielinski, SKugathasan, et al. Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects. Hum Mol Genet 2010;19:2059–67.
  • DPMcGovern, AGardet, LTorkvist, PGoyette, JEssers, KDTaylor, et al. Genome-wide association identifies multiple ulcerative colitis susceptibility loci. Nat Genet 2010;42:332–7.
  • EEKenny, IPe’er, AKarban, LOzelius, AAMitchell, SMNg, et al. A genome-wide scan of Ashkenazi Jewish Crohn’s disease suggests novel susceptibility loci. PLoS Genet 2012;8:e1002559.
  • AJulia, EDomenech, ERicart, RTortosa, VGarcía-Sánchez, JPGisbert, et al. A genome-wide association study on a southern European population identifies a new Crohn’s disease susceptibility locus at RBX1-EP300. Gut 2013;62:1440–5.
  • KYamazaki, JUmeno, ATakahashi, AHirano, TAJohnson, NKumasaka, et al. A genome-wide association study identifies 2 susceptibility Loci for Crohn’s disease in a Japanese population. Gastroenterology 2013;144:781–8.
  • SKYang, MHong, WZhao, YJung, JBaek, NTayebi, et al. Genome-wide association study of Crohn’s disease in Koreans revealed three new susceptibility loci and common attributes of genetic susceptibility across ethnic populations. Gut 2014;63:80–7.
  • AJulia, EDomenech, MChaparro, VGarcía-Sánchez, FGomollón, JPanés, et al. A genome-wide association study identifies a novel locus at 6q22.1 associated with ulcerative colitis. Hum Mol Genet 2014. [ Epub ahead of print] [Accessed 29 July 2014].
  • AKaser, SZeissig, RSBlumberg. Genes and environment: how will our concepts on the pathophysiology of IBD develop in the future? Dig Dis 2010;28:395–405.