950
Views
12
CrossRef citations to date
0
Altmetric
TRENDS IN MOLECULAR MEDICINE

Planning a genome-wide association study: Points to consider

&
Pages 451-460 | Received 01 Jun 2010, Accepted 09 Mar 2011, Published online: 19 May 2011

Abstract

It is well established that genetic diversity combined with specific environmental exposures contributes to disease susceptibility. However, it has turned out to be challenging to isolate the genes underlying the genetic component conferring susceptibility to most complex disorders. Traditional candidate gene and family-based linkage studies, which dominated gene discovery efforts for many years, were largely unsuccessful in unraveling the genetics of these traits due to the relatively limited information gained. Within the last 5 years, new advances in high-throughput methods have allowed for large volumes of single nucleotide polymorphisms (SNPs) throughout the genome to be genotyped across large and comprehensively phenotyped patient cohorts. Unlike previous approaches, these ‘genome-wide association studies’ (GWAS) have extensively delivered on the promise of uncovering genetic determinants of complex diseases, with hundreds of novel disease-associated variants being largely replicated by independent groups. This review provides an overview of these recent breakthroughs in the context of the pitfalls and challenges related to designing and carrying out a successful GWAS.

Key messages

  • Genome-wide association studies (GWAS) have delivered multiple novel genes associated with various complex diseases and traits.

  • Key considerations for performing a GWAS include having an estimate of the genetic component of the disease, ascertaining sufficient sample size, and adjusting for population stratification.

  • Next-generation sequencing will be leveraged to determine the remaining ‘missing heritability’ portion of a given complex trait that could not be uncovered by GWAS.

Introduction

The genetic underpinnings of complex traits, such as diabetes, autism, asthma, and inflammatory bowel disease, have remained largely elusive until very recently, when the advent of new array-based technologies has enabled investigators to leverage genetic variation across the entire genome, in a non-hypothesis manner, to pin-point major contributing genetic factors.

The spirit that drove the genome sequencing projects at the turn of the twenty-first century has lived on in the era of ‘genome-wide association studies’ (GWAS), where intense efforts by numerous research groups, both competitively and collaboratively, to establish loci contributing to various complex traits have driven forward genetic research at an impressive pace in the last 5 years. This has delivered hundreds of genetic associations that have been largely replicated by others, leading to real consensus for the first time in the history of human complex genetics regarding what variants are actually central players in the pathogenesis of these phenotypes.

The determination of such loci through GWAS is highly dependent on multiple factors, including the degree of a genetic component to the trait, the power of the study, the sample ascertainment scheme or diagnostic criteria, plus the relative heterogeneity of the study population.

In this review we will outline such challenges and pitfalls for carrying out a successful GWAS and also highlight the multiple victories through the employment of this approach.

Evidence for a genetic component

The pathogenesis of complex traits is invariably as a consequence of the interplay between behavioral, environmental, and genetic factors influencing individual outcome. Before one starts out on an often expensive endeavor to uncover the genetic components contributing to a given trait, it is important to get a handle on how extensive the genetic contribution actually is. Fortunately, there is now clear evidence of a strong genetic component to most complex diseases.

Evidence can be gleaned from prevalence differences between ethnic groups, family studies to investigate risk ratios among siblings of patients, and concordance rate differences between monozygotic and dizygotic twin sets.

This can be exemplified by type 1 diabetes (T1D), which is most prevalent in populations of European ancestry compared to other ethnic groups. The disease is also highly heritable, with first-degree relatives of cases being at approximately 10 times greater risk than the general population (Citation1) (in contrast to just 3.5 for type 2 diabetes (Citation2)) and concordance among monozygotic twins being greater than 40%, with dizygotic twins being only about a half of that (Citation3). This evidence strongly suggests a major genetic contribution to this particular trait, making it a prime candidate for the approaches we describe below.

Before GWAS

The vast majority of reported associations in the pre-GWAS era have remained debatable, primarily due to the use of the only methodologies that were available at the time and which were much more limited than GWAS. Pre-2005, investigators generally worked with the restrictive candidate gene and family-based linkage approaches.

The ‘candidate gene’ approach was for many years the only option to approach the genetics of complex disease and was the most logical, as it was based on a particular biological hypothesis regarding the pathogenesis of a given trait. However, such studies have been plagued with the ‘winner's curse’ (Citation4), where an initial report of association does not hold up in subsequent replication attempts by other investigative groups.

Linkage studies became the first option to assess the genetics of a given complex disease genome-wide, albeit with relatively low resolution by post-2005 standards, through the systematic search for common regions of haplotype sharing at given chromosomal locations within and across multiple affected families. Such positional cloning approaches have been highly successful in uncovering variants that underlie monogenetic disorders but very much less so in complex traits, mainly due to the generic problem that this methodology is generally poor in identifying common genetic variants contributing to disease risk.

Although the outcomes of these classical approaches are far from satisfying in the complex disease setting, they have set the scene for the advent of GWAS, which has successfully leveraged many of the already existing DNA collections for key gene discoveries.

Build-up to GWAS

Before GWAS became a reality, certain key events had to occur. First up, the completion of the sequencing of the human genome in 2001 (Citation5,Citation6) laid out the canvas on which genetic variation, primarily SNPs, could be overlaid, as facilitated by the International HapMap Project (Citation7,Citation8). This resulted in crucial insights into human genetic diversity.

Advances in single-base extension biochemistry and hybridization/detection to synthetic oligonucleotides have made it possible to accurately and cost-effectively genotype hundreds of thousands of single nucleotide polymorphisms (SNPs) in tandem (Citation9). By taking advantage of the knowledge gained from the HapMap project, which also revealed that the genome is organized into discrete linkage disequilibrium blocks with limited haplotype diversity within each block, a minimal set of SNPs necessary for detecting all major haplotypes could be determined. As such, genome-wide genotyping of over 500,000 SNPs can now be readily achieved in an efficient and highly accurate manner to capture (‘tag’) the bulk of common diversity in genomes of European ancestry (Citation9); indeed, such a strategy has a high likelihood of tagging a disease-causing mutation if the study is sufficiently powered.

Genome-wide association

With the application of this recent revolution in SNP genotyping technology across large cohorts of patients and controls, GWAS of complex diseases serves the critical need for a more comprehensive and unbiased strategy to identify causal genes related to complex disease.

High accuracy and yields of the SNP genotyping platform of choice is crucial to decrease false-positive signals due to disparities in the quality and information content of the data generated on both patients and controls. Such a problem is much more acute in GWAS where the number of statistical tests is three to four orders of magnitude greater than most candidate gene studies.

Currently there are two major vendors of this high-throughput genome-wide SNP genotyping technology, namely Affymetrix and Illumina. With both companies, their assays can scale to as many SNPs as can be represented on the array and are readily adaptable to automation. One challenge for investigators has been the large differences in the repertoire of SNPs selected for each platform, so direct comparisons or combination of data were initially challenging; however, by leveraging the HapMap data set, missing genotypes on either platform can be inferred by a statistical approach commonly known as ‘imputation’. This approach both allows for association testing of genetic markers that are not actually directly genotyped in a given study and enables the combination of genotype data from multiple studies that may have not been uniformly genotyped on the same platform in order to greatly increase the power of a GWAS in a ‘meta-analysis’ setting (Citation10).

Already with this technology there has been increasing evidence that GWAS indeed represents a powerful approach in the identification of genes involved in common human diseases, with compelling evidence (including robust replication, which must feature in any GWAS design) for genetic variants being involved in disorders ranging from autism (Citation11), asthma (Citation12,Citation13), prostate cancer (Citation14–16), birth-weight (Citation17), inflammatory bowel disease (Citation18–24), type 1 diabetes (Citation25–27), type 2 diabetes (Citation27–35), obesity (Citation36–40), age-related macular degeneration (Citation41), heart disease (Citation42,Citation43) to breast cancer (Citation44) already reported, mostly in the top-impact scientific journals. A catalog of these studies is now available at the NHGRI web site (http://www.genome.gov/gwastudies).

Sample size

For a GWAS approach, cohort size is a vital prerequisite to ensure sufficient statistical power within any given study. Indeed, sample size has been a major limiting factor in uncovering the full repertoire of genes in any complex disease to date. Simply put, the larger the size of the patient and control sets, the more likely it is that at least one variant conferring risk of the disease is uncovered. A key consideration in this context is statistical power, which behaves directly as a function of effect and sample size, where the typical disease-associated variant expected to be detected will be relatively common in the population (10%–30%) and conferring modest risk (relative risk = 1.1–1.4). Experience from the last 5 years of GWAS has indicated that approximately 1,000 cases and a similar number of controls will uncover the low-hanging fruit of 1–5 variants associated with a given trait, with subsequent larger and larger sample sizes uncovering additional variants, but with typically diminishing contributions to the disease. Besides the discovery groups, a necessity for an acceptable study design is to also have access to independent groups of patients and controls for replication and validation efforts.

The numbers stated above are a rough rule of thumb, but there are exceptions to these rules. The first published GWAS was for age-related macular degeneration (AMD), a leading cause of blindness in the developed world, where only 96 AMD cases and 50 controls were required to identify the first robustly associated variant at the complement factor H (CFH) locus (Citation41). The association was centered over a coding variant within the gene, where having at least one histidine at amino acid position 402 increased an individual's risk of developing AMD 2.7-fold and, staggeringly, was estimated to account for approximately 50% of the attributable risk of AMD.

But at the other end of the spectrum substantially larger numbers of cases and controls are required to determine robust associations; for example, in order for us to uncover association with autism, samples from a full national repository was required (Citation11). In addition, in order to uncover a reasonable number of novel loci associated with adult body mass index (BMI), in excess of 30,000 subjects have been required (Citation40).

Population stratification

For all patient and control comparisons, especially in diverse communities such as the US, it is very important to match control groups to the patient groups in terms of genetic background as much as possible to avoid false-positives due to variation in population substructure. Put simply, the frequency of alleles for many SNPs vary across ethnicities and often even within racial groups, and this must be accounted for otherwise premature excitement will ensue.

Fortunately when compared to candidate gene association studies, the great advantage of whole-genome SNP data is that inflation of test statistics due to population substructure can be readily identified and adjusted due to the wealth of data available. Populations do not just differ by one or two SNPs, they differ at many loci, so whole-genome data aids in identifying stratification, including even extremely fine-scale subpopulations within Europe (Citation45).

The GWAS community has established robust methods to deal with population stratification, and these methods effectively adjust for common variants. There are now standard practices to adjust for population stratification that are being applied in case/control GWAS, including genomic control (Citation46), EigenStrat (Citation47), and multidimensional scaling (MDS). Furthermore, family-based study designs have the advantage of protecting against stratification. There are still challenges with the analyses of rare variants, hyper-variable variants, or interrogation of recently admixed populations, but these are currently active research topics.

Another crucial factor that needs to be considered is that most of the disease-associated variants reported to date have been discovered utilizing cohorts of European ancestry, primarily because the SNP arrays are designed to capture optimally the haplotype diversity in this ethnicity and many more SNPs are required for the same degree of capture in populations of African ancestry, i.e. it is more feasible to carry out such studies in Caucasians. However, as more knowledge is generated on the genetics of complex disease, it is going to be vital to elucidate the full role of each locus in a worldwide context. Some progress has been achieved in this regard, in particular with global assessment of the role of the most strongly associated locus with type 2 diabetes (T2D), the transcription factor 7-like 2 (TCF7L2) gene (Citation48), where it has turned out to be equally relevant in populations of African ancestry (Citation49) but substantially less impactful in East Asians (Citation34). This global elucidation will ultimately be vital for the new era of consumer-based genetics, where companies such as 23andMe and Navigenics will have already started offering content to consumers of all racial backgrounds.

Primary focus

Type 2 diabetes (T2D) has been the focus of more GWAS than any other disorder studied to date. From the first batch of such studies, published in Nature (Citation27,Citation28) and Science (Citation29–31) in 2007, the strongest association by far was with TCF7L2; this is now considered the most significant genetic finding in T2D to date (Citation50) and represents one of the major findings in the arena for complex disease. The variant within TCF7L2 is very typical of the type of variant uncovered with GWAS, i.e. common but conferring very modest risk. It is approximately 1.5 times more common in patients than in controls; this corresponds to approximately 50% increase in risk of T2D per copy carried.

In excess of 20 loci have now come out of various GWAS efforts on T2D (Citation27–35). This is comparable with other metabolic traits that have also had similar levels of attention including BMI (Citation36–40) and bone mineral density (Citation51–54). However, two inflammation-mediated disorders have led the way in terms of number of loci uncovered due to extensive collaborative efforts plus the nature of the genetics of the diseases, i.e. type 1 diabetes (Citation25,Citation27,Citation55–59) and inflammatory bowel disease (Citation18,Citation20–22,Citation24,Citation27,Citation60). Both these heavily investigated complex diseases have robustly yielded in excess of 40 loci each.

One could suggest that there has been a degree of luck in terms of which disease one elects to investigate, with some diseases throwing up tens of common variants conferring modest risk, while others, like hypertension, still have to deliver a totally convincing signal despite similar levels of effort.

Identifying the correct phenotype

Many phenotypes are related to others, or one disease confers risk of another, e.g. T2D puts one at greater risk of presenting with a cardiovascular-related event. As such, when carrying out a GWAS, one has to be sufficiently familiar with the trait to understand what the underlying association actually represents.

A case in point, and in keeping the subject of T2D, a key locus that came out of the early GWAS efforts of this trait was FTO (Citation37). The Wellcome Trust Case Control Consortium (WTCCC) uncovered this variant in their GWAS of type 2 diabetes but were somewhat perplexed when it had not been reported by any other similar GWAS efforts of the same phenotype. It turned out that the other investigative groups had utilized more lean cases, while the British subjects were substantially heavier. As such, it turned out that they had in fact mapped the first bona fide obesity locus, which in turn conferred risk of T2D. To date, FTO clearly remains the locus most strongly associated with obesity (Citation61–64). Full acknowledgment to the authors of the initial report for correctly ascertaining its true association rather than incorrectly reporting it as a primary T2D locus.

Subsequently, other T2D loci have now been shown to be operating through pediatric traits, which then have a knock-on effect in adulthood and thus increase the likelihood of presenting with the disease; i.e. T2D is not the primary trait in these instances either. We and others have shown that CDKAL1 confers risk of lower birth-weight, a known risk factor for T2D in later life (Citation17,Citation65,Citation66). In addition, one of the T2D loci, IDE-HHEX, is associated with increased BMI in childhood, which again increases the risk of developing the disease later in life (Citation67).

Missing heritability

Apart from rare exceptions, GWAS findings have left the genetics community to ponder how to ultimately uncover the full repertoire of the genetic component of given traits in order to explain the ‘missing heritability’ (Citation68).

A good example of this issue is the first GWAS report of height in nearly 5,000 individuals (Citation69), where the investigators observed association to common variation in the HMGA2 oncogene. Follow-up analyses in approximately 19,000 more individuals (both adults and children) revealed compelling replication of this observation. Despite the magnitude of the study and the robustness of the association, the gene variant explained only a minute portion of the population variation in height, i.e. approximately 0.3%. However, many additional loci have been now uncovered for this trait through increasingly larger meta-analyses (Citation70–74), with substantial progress in explaining more and more of the genetic component of height. It has been proposed that increasing sample size will eventually lead to the plateauing of such capture, but in reality this does not seem to be the case; indeed the latest height GWAS meta-analysis estimated that an eventual sample size of 500,000 would detect 99.6% of contributing loci at the genome-wide significance level (Citation74).

The proportion of the estimated genetic contribution uncovered to date with GWAS in the various complex traits is still very much in the minority, with diseases such as autism, T2D, and obesity still in the single digits. These current outcomes have also thrown doubt on many direct-to-consumer disease prediction tests, as they may well only be useful once the bulk of the genetic contribution to a given disease is truly figured out.

There are exceptions to these observations, with previously highlighted inflammatory bowel disease (IBD) and T1D revealing much more of the genetic component, in particular T1D, which already has major histocompatibility complex (MHC) as well established as contributing 50% of the genetic risk well before GWAS.

Additional study designs may also aid in uncovering some of the missing heritability, including leveraging factors such as families, population isolates, and population-specific samples, which could reveal more common variants along with rare variants and the contribution of imprinting. Two recent studies suggest that rare variants are in fact ‘loosely’ tagged by the respective common variant through synthetic (hitch-hiking) association, thereby potentially under-estimating the actual effect size at the associating loci (Citation75,Citation76). Thus, it is clear that in addition to larger and larger cohorts combined into meta-analyses, new whole-genome sequencing technologies will be a large part of the solution—both of which are discussed further below.

Functional context

The design of GWAS is not to directly interrogate causal variants in the first instance. It is important to emphasize that ‘risk variants’ identified in GWAS are normally close to but are not themselves the true disease-causing variants; in other words, it is highly established that GWAS does not attempt to identify functional SNPs but rather ‘tag’ the approximate location of disease variants (Citation77–79), typically down to a few 100 kb.

The discoveries of genetic factors involved in the pathogenesis of complex disease present the first step in a much longer process. We expect that genes uncovered using the GWAS approach are indeed fundamental to disease biology. Ultimately, one of the main challenges going forward is to determine how these recently uncovered variants affect the expression and function of the gene products through key molecular biology approaches. Only by uncovering the functional context of these genetic variants can these findings be translated into meaningful benefits for patient care. No doubt, the in-vitro and in-vivo biology of the genes identified by such studies will be fascinating areas of exploration for many scientists.

One of only a few examples of adding functional context to a locus uncovered by GWAS is with FTO and obesity. This gene has been the intense focus of obesity genomics ever since it was discovered, but at the point of discovery no one really knew even what it did. A British group reported in Science that it was in fact a 2-oxoglutarate-dependent nucleic acid demethylase (Citation80). Subsequently, a German group knocked out the gene in a mouse and reported their findings in Nature (Citation81). Fortunately the mouse was viable and appeared normal at birth. However, very soon it was apparent that it consumed more food than its wild-type counterparts and was less active in its cage (detected via infra-red sensors). Despite the apparent ‘couch-potato’ life-style the mouse remained slim; it turned out the culprit mechanism was elevated levels of adrenaline. What is exciting about this finding is that the non-hypothesis GWAS approach has delivered a novel locus which indeed plays a central role in the trait of interest. In addition, despite the fact that it is not particularly diagnostically useful because it explains such a tiny portion of the genetic contribution to obesity, it could certainly serve as a novel drug target for better treatment of obesity as a whole.

Curious patterns

As the body of literature grows describing outcomes of GWAS in various traits, unexpected patterns are now forming. Indeed it appears that there is some commonality to some disparate diseases. These are aspects that GWAS investigators must keep in mind when analyzing their data sets.

It is also becoming apparent that there is cross-talk between genes influencing autoimmune diseases. For instance it has been shown that our T1D locus, CLEC16A, has also yielded association to multiple sclerosis in a GWAS of that disease (Citation18) and that PTPN22 has been similarly implicated in Crohn's disease, rheumatoid arthritis, systemic lupus erythematosus, and autoimmune thyroiditis (Citation82,Citation83). In addition, from our comparative genetic analyses of IBD and T1D, we have intriguingly implicated multiple loci with opposite effects (Citation84).

The T2D locus, TCF7L2, has also been linked to cancer risk previously (Citation85,Citation86). Indeed, this connection intensified following reports that the 8q24 locus revealed by genome-wide association studies of a number of cancers, including colorectal carcinomas, was due to an extreme upstream TCF7L2-binding element driving the transcription of MYC (Citation87,Citation88). As such, TCF7L2 now appears to be a key player in both type 2 diabetes and cancer. Indeed, many of the T2D GWAS-derived risk-conferring alleles have now been shown to protect against prostate cancer (Citation89).

T1D and T2D appear to be due to distinct biological mechanisms, with none of the genes identified to date in each of these given disorders having an association with the other disease (Citation90,Citation91). For instance, no conclusive support for a role of the insulin gene (INS) has been reported in T2D (Citation92), and there is no evidence for association with the major T2D gene, TCF7L2 (Citation48), in T1D (Citation93,Citation94); however, and somewhat contrary to what one might expect, TCF7L2 variants are strongly associated with latent autoimmune diabetes in adults (LADA) (Citation95,Citation96) which is a disease thought to be closer to T1D in its etiology. This finding is making diabetes researchers rethink the nature of LADA as a consequence.

A region on chromosome 9p21, near the CDKN2A and CDKN2B genes, has been shown to be consistently associated with coronary heart disease (Citation42,Citation43,Citation97). Interestingly, an independent variant at the same locus is well established to be associated with T2D (Citation29–31), leading to Francis Collins, then director of the National Human Genome Research Institute, to refer to the region as ‘like the seat of the soul of the genome’ (http://www.nytimes.com/2007/05/04/health/ 04heart.html). However, the relationship between these two variants has still to be fully elucidated.

As more loci associated with disease are established, one might predict that many more patterns and connections will emerge in a similar fashion to the examples highlighted above.

Pathway analyses

While the results from GWAS have not met expectations with respect to disease prediction, once a certain threshold is reached with respect to the proportion of the genetic component explained, then individualized disease risk assessment using whole-genome data may be successful. This is dependent on the heritability of the disease under study, the proportion of the genetic risk that is known, and on the right set of markers and right algorithms being used, as exemplified by our work with T1D (Citation98).

It is also now clear that when we combine our genome-wide genotype data for Crohn's disease with pathway analyses, additional loci can be uncovered that are subsequently readily replicated in independent cohorts. It turns out that these new loci make biological sense with respect to their role in the pathogenesis of the disease (Citation99).

Copy number variation

Current advances in single-base extension (SBE) biochemistry and hybridization/detection using synthetic oligonucleotides now make it possible accurately to genotype and quantitate allelic copy number variation (CNV) genome-wide (Citation9,Citation100,Citation101).

Neurological disorders have proven the most challenging complex diseases to address using genome-wide SNP approaches, primarily as a consequence of the need for strict, uniform phenotyping across very large, multi-center cohorts. However, they have led the way in the uncovering of CNVs in common disorders such as autism (Citation102–105), attention-deficit hyperactivity disorder (Citation106), and schizophrenia (Citation107–109).

Sequencing

GWAS only leverages a portion of the genome to make its inferences, and although successful in uncovering disease-associated genetic variants, as highlighted above, the majority of the genetic component of most disorders has still to be uncovered (Citation68). Larger and larger meta-analyses will probably reveal more common variants conferring modest risk for complex traits. However, it has been predicted that there are a myriad of rare variants contributing to disease that cannot be detected on current genotyping platforms.

It is now clear that one will need to pursue these rare variants, many of which will be highly penetrant, with high-throughput whole-genome sequencing methods. Indeed, to uncover the remaining ‘missing heritability’ in complex diseases (Citation68), investigators in the near future will need to work on large, high-throughput sequencing efforts involving thousands of DNA samples from affected subjects and a similar number of controls. Such approaches will initially complement previously executed GWAS reports, but very soon they supersede this previous approach altogether as sequencing generates both the same information as existing genotyping arrays plus a much more comprehensive picture of genetic variation in the genome.

Sanger sequencing revolutionized the field of genetics by becoming the standard approach to appraise a given region of the genome at base-level resolution. However, the relatively recent need to sequence entire genomes has driven innovative developments within the market-place to allow for sequencing technology to be faster, cheaper, and more accurate. The first decade of the twenty-first century has seen an explosion of advances in the field of sequencing with the currently available high-throughput technologies including Roche's 454, Illumina's Genome Analyzer, Applied BioSystem's SOLiD, Complete Genomics, Helios, Pacific Biosciences, and IonTorrent. These technologies offer unprecedented opportunities to increase our understanding of the functions and dynamics of the human genome in the near future.

Discussion

GWAS has clearly revolutionized the field of complex disease genetics. The advent of genome-wide platforms to capture SNP variation has enhanced our understanding of the genetic basis of complex disorders. For the first time there is real consensus on the role of specific genetic factors underpinning common traits.

It is clear now that many of the these genes are novel and were mainly not on any investigator's radar when candidate gene studies were being designed in the past; in addition, most of these common variants do not coincide with previously established linkage peaks from family studies of given disorders.

While GWAS has importantly transformed the ability of investigators to discover key genetic factors involved in the pathogenesis of complex disease, this is only the first step in a much longer process. The key players of the moment are the geneticists who reported these signals, but the new stars will be the molecular and cell biologists who can determine how these recently uncovered variants affect the expression and function of the gene products through key molecular biology approaches. Only by uncovering the functional context of these genetic variants can these findings be translated into meaningful benefits for patient care such that therapeutic agents can be eventually raised to these targets that lead in turn to more efficacious treatments.

Ultimately, determining the variants influencing variable pharmacological response using GWAS will revolutionize the drug market which may well lead to a greater partitioned market but one that offers more tailored drug therapy.

Advances in sequencing technology have made personal genomics a realistic goal in the near future, which will take health care towards personalized medicine in the years to come. Beyond whole-genome sequencing in clinics, the application of high-throughput sequencing in basic research, including transcriptome, epigenetics, and variant discovery, offers great opportunities for the understanding of the function and dynamics of genomes in the future. After all, unlike SNP chip-based technologies, nothing can be more complete than working with full-genome data, so sequencing is here to stay.

Declaration of interest: The authors state no conflict of interest and have received no payment in preparation of this manuscript.

References

  • Clayton DG. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 2009;5: e1000540.
  • Rich SS. Mapping genes in diabetes. Genetic epidemiological perspective. Diabetes. 1990;39:1315–9.
  • Redondo MJ, Yu L, Hawa M, Mackenzie T, Pyke DA, Eisenbarth GS, . Heterogeneity of type I diabetes: analysis of monozygotic twins in Great Britain and the United States. Diabetologia. 2001;44:354–62.
  • Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003;33:177–82.
  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, . The sequence of the human genome. Science. 2001;291:1304–51.
  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, . Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.
  • International HapMap Consortium. The International HapMap Project. Nature. 2003;426:789–96.
  • International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–320.
  • Steemers FJ, Chang W, Lee G, Barker DL, Shen R, Gunderson KL. Whole-genome genotyping with the single-base extension assay. Nat Methods. 2006;3:31–3.
  • Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406.
  • Wang K, Zhang H, Ma D, Bucan M, Glessner JT, Abrahams BS, . Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature. 2009;459: 528–33.
  • Moffatt MF, Kabesch M, Liang L, Dixon AL, Strachan D, Heath S, . Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature. 2007;448:470–3.
  • Sleiman PM, Flory J, Imielinski M, Bradfield JP, Annaiah K, Willis-Owen SA, . Variants of DENND1B associated with asthma in children. N Engl J Med. 2010;362:36–44.
  • Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, . Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007;39:989–94.
  • Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, . Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007; 39:645–9.
  • Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, . Multiple regions within 8q24 independently affect risk for prostate cancer. Nat Genet. 2007;39:638–44.
  • Freathy RM, Mook-Kanamori DO, Sovio U, Prokopenko I, Timpson NJ, Berry DJ, . Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight. Nat Genet. 2010;42:430–5.
  • Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH, Rioux JD, . Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet. 2008;40:955–62.
  • Fisher SA, Tremelling M, Anderson CA, Gwilliam R, Bumpstead S, Prescott NJ, . Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease. Nat Genet. 2008;40:710–2.
  • Hampe J, Franke A, Rosenstiel P, Till A, Teuber M, Huse K, . A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1. Nat Genet. 2007;39:207–11.
  • Imielinski M, Baldassano RN, Griffiths A, Russell RK, Annese V, Dubinsky M, . Common variants at five new loci associated with early-onset inflammatory bowel disease. Nat Genet. 2009;41:1335–40.
  • Kugathasan S, Baldassano RN, Bradfield JP, Sleiman PM, Imielinski M, Guthery SL, . Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease. Nat Genet. 2008;40:1211–5.
  • Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, . Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat Genet. 2007;39: 830–2.
  • Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, Huett A, . Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet. 2007;39:596–604.
  • Todd JA, Walker NM, Cooper JD, Smyth DJ, Downes K, Plagnol V, . Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet. 2007;39:857–64.
  • Hakonarson H, Grant SFA, Bradfield JP, Marchand L, Kim CE, Glessner JT, . A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature. 2007;448:591–4.
  • Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78.
  • Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, . A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–5.
  • Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, . Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–6.
  • Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, . Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–41.
  • Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, . A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–5.
  • Steinthorsdottir V, Thorleifsson G, Reynisdottir I, Benediktsson R, Jonsdottir T, Walters GB, . A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet. 2007;39:770–5.
  • Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, Hu T, . Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet. 2008;40:638–45.
  • Unoki H, Takahashi A, Kawaguchi T, Hara K, Horikoshi M, Andersen G, . SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in East Asian and European populations. Nat Genet. 2008;40:1098–102.
  • Yasuda K, Miyake K, Horikawa Y, Hara K, Osawa H, Furuta H, . Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitus. Nat Genet. 2008; 40:1092–7.
  • Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A, Illig T, . A common genetic variant is associated with adult and childhood obesity. Science. 2006;312: 279–83.
  • Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, . A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–94.
  • Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, Prokopenko I, . Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet. 2008; 40:768–75.
  • Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A, . Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet. 2009; 41:18–24.
  • Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, Heid IM, . Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet. 2009;41:25–34.
  • Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, . Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–9.
  • Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, . A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–3.
  • McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, . A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316: 1488–91.
  • Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, . Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087–93.
  • Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, . Genes mirror geography within Europe. Nature. 2008;456:98–101.
  • Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004.
  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
  • Grant SF, Thorleifsson G, Reynisdottir I, Benediktsson R, Manolescu A, Sainz J, . Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet. 2006;38:320–3.
  • Helgason A, Palsson S, Thorleifsson G, Grant SF, Emilsson V, Gunnarsdottir S, . Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nat Genet. 2007;39:218–25.
  • Zeggini E, McCarthy MI. TCF7L2: the biggest story in diabetes genetics since HLA? Diabetologia. 2007;50:1–4.
  • Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, . Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study. Lancet. 2008;371:1505–12.
  • Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, Ingvarsson T, . Multiple genetic loci for bone mineral density and fractures. N Engl J Med. 2008;358:2355–65.
  • Styrkarsdottir U, Halldorsson BV, Gretarsdottir S, Gudbjartsson DF, Walters GB, Ingvarsson T, . New sequence variants associated with bone mineral density. Nat Genet. 2009;41:15–17.
  • Rivadeneira F, Styrkarsdottir U, Estrada K, Halldorsson BV, Hsu YH, Richards JB, . Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nat Genet. 2009;41: 1199–206.
  • Hakonarson H, Grant SF, Bradfield JP, Marchand L, Kim CE, Glessner JT, . A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature. 2007;448:591–4.
  • Hakonarson H, Qu HQ, Bradfield JP, Marchand L, Kim CE, Glessner JT, . A novel susceptibility locus for type 1 diabetes on Chr12q13 identified by a genome-wide association study. Diabetes. 2008;57:1143–6.
  • Cooper JD, Smyth DJ, Smiles AM, Plagnol V, Walker NM, Allen JE, . Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet. 2008;40:1399–401.
  • Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, . Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet. 2009;41:703–7.
  • Grant SF, Qu HQ, Bradfield JP, Marchand L, Kim CE, Glessner JT, . Follow-up analysis of genome-wide association data identifies novel loci for type 1 diabetes. Diabetes. 2009;58:290–5.
  • Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, . A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–3.
  • Grant SF, Li M, Bradfield JP, Kim CE, Annaiah K, Santa E, . Association analysis of the FTO gene with obesity in children of Caucasian and African ancestry reveals a common tagging SNP. PLoS ONE. 2008;3:e1746.
  • Hinney A, Nguyen TT, Scherag A, Friedel S, Bronner G, Muller TD, . Genome wide association (GWA) study for early onset extreme obesity supports the role of fat mass and obesity associated gene (FTO) variants. PLoS ONE. 2007;2:e1361.
  • Dina C, Meyre D, Gallina S, Durand E, Korner A, Jacobson P, . Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet. 2007;39: 724–6.
  • Scuteri A, Sanna S, Chen WM, Uda M, Albai G, Strait J, . Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 2007;3:e115.
  • Freathy RM, Bennett AJ, Ring SM, Shields B, Groves CJ, Timpson NJ, . Type 2 diabetes risk alleles are associated with reduced size at birth. Diabetes. 2009;58:1428–33.
  • Zhao J, Li M, Bradfield JP, Wang K, Zhang H, Sleiman P, . Examination of type 2 diabetes loci implicates CDKAL1 as a birth weight gene. Diabetes. 2009;58: 2414–8.
  • Zhao J, Bradfield JP, Zhang H, Annaiah K, Wang K, Kim CE, . Examination of all type 2 diabetes GWAS loci reveals HHEX-IDE as a locus influencing pediatric BMI. Diabetes. 2010;59:751–5.
  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, . Finding the missing heritability of complex diseases. Nature. 2009;461:747–53.
  • Weedon MN, Lettre G, Freathy RM, Lindgren CM, Voight BF, Perry JR, . A common variant of HMGA2 is associated with adult and childhood height in the general population. Nat Genet. 2007;39:1245–1250.
  • Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, Zusmanovich P, . Many sequence variants affecting diversity of adult human height. Nat Genet. 2008;40:609–15.
  • Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, . Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet. 2008;40:584–91.
  • Sanna S, Jackson AU, Nagaraja R, Willer CJ, Chen WM, Bonnycastle LL, . Common variants in the GDF5-UQCC region are associated with variation in human height. Nat Genet. 2008;40:198–203.
  • Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, . Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet. 2008;40:575–83.
  • Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, . Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–8.
  • Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB. Rare variants create synthetic genome-wide associations. PLoS Biol. 2010;8:e1000294.
  • Wang K, Dickson SP, Stolle CA, Krantz ID, Goldstein DB, Hakonarson H. Interpretation of association signals and identification of causal variants from genome-wide association studies. Am J Hum Genet. 2010;86:730–42.
  • McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, . Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–69.
  • Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–605.
  • Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322:881–8.
  • Gerken T, Girard CA, Tung YC, Webby CJ, Saudek V, Hewitson KS, . The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science. 2007;318:1469–72.
  • Fischer J, Koch L, Emmerling C, Vierkotten J, Peters T, Bruning JC, . Inactivation of the Fto gene protects from obesity. Nature. 2009;458:894–8.
  • Hafler DA, Compston A, Sawcer S, Lander ES, Daly MJ, De Jager PL, . Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med. 2007;357:851–62.
  • Bottini N, Vang T, Cucca F, Mustelin T. Role of PTPN22 in type 1 diabetes and other autoimmune diseases. Semin Immunol. 2006;18:207–13.
  • Wang K, Baldassano R, Zhang H, Qu HQ, Imielinski M, Kugathasan S, . Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects. Hum Mol Genet. 2010; 19:2059–67.
  • Yochum GS, McWeeney S, Rajaraman V, Cleland R, Peters S, Goodman RH. Serial analysis of chromatin occupancy identifies beta-catenin target genes in colorectal carcinoma cells. Proc Natl Acad Sci USA. 2007; 104: 3324–9.
  • Duval A, Busson-Leconiat M, Berger R, Hamelin R. Assignment of the TCF-4 gene (TCF7L2) to human chromosome band 10q25.3. Cytogenet Cell Genet. 2000;88: 264–5.
  • Pomerantz MM, Ahmadiyeh N, Jia L, Herman P, Verzi MP, Doddapaneni H, . The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet. 2009;41:882–4.
  • Tuupanen S, Turunen M, Lehtonen R, Hallikas O, Vanharanta S, Kivioja T, . The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet. 2009;41:885–90.
  • Frayling TM, Colhoun H, Florez JC. A genetic link between type 2 diabetes and prostate cancer. Diabetologia. 2008;51: 1757–60.
  • Schernthaner G, Hink S, Kopp HP, Muzyka B, Streit G, Kroiss A. Progress in the characterization of slowly progressive autoimmune diabetes in adult patients (LADA or type 1.5 diabetes). Exp Clin Endocrinol Diabetes. 2001;109 Suppl 2:S94–108.
  • Qu HQ, Grant SF, Bradfield JP, Kim C, Frackelton E, Hakonarson H, . Association analysis of type 2 diabetes Loci in type 1 diabetes. Diabetes. 2008;57: 1983–6.
  • Huxtable SJ, Saker PJ, Haddad L, Walker M, Frayling TM, Levy JC, . Analysis of parent-offspring trios provides evidence for linkage and association between the insulin gene and type 2 diabetes mediated exclusively through paternally transmitted class III variable number tandem repeat alleles. Diabetes. 2000;49:126–30.
  • Qu HQ, Polychronakos C. The TCF7L2 locus and type 1 diabetes. BMC Med Genet. 2007;8:51.
  • Field SF, Howson JM, Smyth DJ, Walker NM, Dunger DB, Todd JA. Analysis of the type 2 diabetes gene, TCF7L2, in 13,795 type 1 diabetes cases and control subjects. Diabetologia. 2007;50:212–3.
  • Cervin C, Lyssenko V, Bakhtadze E, Lindholm E, Nilsson P, Tuomi T, . Genetic similarities between latent autoimmune diabetes in adults, type 1 diabetes, and type 2 diabetes. Diabetes. 2008;57:1433–7.
  • Szepietowska B, Moczulski D, Wawrusiewicz-Kurylonek N, Grzeszczak W, Gorska M, Szelachowska M. Transcription factor 7-like 2-gene polymorphism is related to fasting C peptide in latent autoimmune diabetes in adults (LADA). Acta Diabetol. 2010;47:83–6.
  • Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, . Genomewide association analysis of coronary artery disease. N Engl J Med. 2007;357:443–53.
  • Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C, . From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet. 2009;5:e1000678.
  • Wang K, Zhang H, Kugathasan S, Annese V, Bradfield JP, Russell RK, . Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease. Am J Hum Genet. 2009;84:399–405.
  • Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, Tandon A, . A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet. 2005;37:1113–8.
  • Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS. A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet. 2005;37:549–54.
  • Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, . Strong association of de novo copy number mutations with autism. Science. 2007;316:445–9.
  • Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, . Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet. 2008;82:477–88.
  • Weiss LA, Shen Y, Korn JM, Arking DE, Miller DT, Fossdal R, . Association between microdeletion and microduplication at 16p11.2 and autism. N Engl J Med. 2008;358: 667–75.
  • Glessner JT, Wang K, Cai G, Korvatska O, Kim CE, Wood S, . Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature. 2009;459:569–73.
  • Elia J, Gai X, Xie HM, Perin JC, Geiger E, Glessner JT, . Rare structural variants found in attention-deficit hyperactivity disorder are preferentially associated with neurodevelopmental genes. Mol Psychiatry. 2010;15:637–46.
  • Stefansson H, Rujescu D, Cichon S, Pietilainen OP, Ingason A, Steinberg S, . Large recurrent microdeletions associated with schizophrenia. Nature. 2008;455:232–6.
  • Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, . Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science. 2008;320:539–43.
  • Glessner JT, Reilly MP, Kim CE, Takahashi N, Albano A, Hou C, . Strong synaptic transmission impact by copy number variations in schizophrenia. Proc Natl Acad Sci U S A. 2010;107:10584–9.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.