1,020
Views
0
CrossRef citations to date
0
Altmetric
Editorial

The 1000 Genomes Project: Paving the way for Personalized Genomic Medicine

, &
Pages 321-324 | Published online: 06 Jun 2013

In recent years, we have seen the costs associated with DNA sequencing fall at a dramatic rate (five orders of magnitude over the last decade; see Citation[101]) and this phenomenon shows no signs of slowing down. Next-generation sequencing Citation[1,2] has revolutionized biology and allowed ‘omics‘ projects to be pursued with rapid turnaround time and digitized precision. In this editorial, we describe the ongoing International 1000 Genomes Project Citation[102], provide some thoughts on its potential medical applications, explain why population sequencing has a direct and immediate impact on personalized medicine, and reflect on possible ethical concerns related to pervasive genome sequencing.

Immediately following the pivotal Human Genome Project Citation[3,4], the International HapMap Project Citation[5–7] characterized segregation patterns of common human genomic variation consisting of millions of SNPs in hundreds of individuals from a dozen populations. This project was instrumental both to our understanding of the linkage disequilibrium patterns across the genome and to the development of high-density SNP arrays that enabled the genome-wide association studies (GWAS) era. As of this writing, 1525 GWAS publications involving 8675 SNPs and cataloging associations with many complex diseases, including many cancers, autoimmune diseases, diabetes, hypertension, and autisms, have been produced Citation[103]. Nevertheless, the heyday of GWAS (using common SNPs as landmarks) is now behind us. This is due in large part to the realizations that, first, each individual association only marginally elevates risk of a particular trait, making it minimally informative in clinical settings; and second, the identified associations for a particular complex trait in aggregate can only explain a relatively small proportion of heritable variability Citation[8]. It is now broadly recognized that multiple additional factors are responsible for heredity, including rare variants, gene–gene interactions, genomic structural variation, epigenetic effects and even interactions with the microbiome.

The 1000 Genomes Project emerged from this background with a clear purpose: to “provide a deep characterization of human genome sequence variation for investigating the relationship between genotype and phenotype” Citation[9]. The project emphasizes rare variants with population frequencies lower than 5%, as well as structural variants. These variants can now, for the first time, be surveyed effectively using the latest next-generation sequencing technologies. As of November 2012, with the completion of Phase I Citation[10], the project had generated data totaling over 19 terabases of information. Highly collaborative in scope, the sequencing is carried out at centers in four countries, with many more countries participating in the sample accruement, and many groups actively leading the analysis effort. Along the way, the project has developed new bioinformatic techniques and validated the algorithms used to detect variants using both low-coverage and targeted approaches (for the detection of genome-wide variation and the genotyping of rare variation in functionally significant regions, respectively).

Phase I has identified 38 million SNPs, 1.4 million insertion-deletions and 14,000 structural variations from 1092 individuals and 14 ethnicities, in conjunction with improved haplotype estimation and genotype likelihoods when compared with the earlier pilot phase. This phase has implications for our understanding of demographics (by increasing our ability to trace historical ancestry and population migration patterns), as well as genome evolution and selection in both the positive Citation[11] and purifying directions. For instance, Phase I showed that each individual carries hundreds of putatively deleterious loss-of-function variants in annotated gene regions Citation[10]. Moreover, because the 1000 Genomes Project has greatly extended our knowledge of genome-wide variation, it has increased the efficiency of the identification of functional variants by providing a high-quality baseline database, allowing true positive signals to be discovered more easily. For example, it is now a common practice in personal genomics to filter out the variants in the 1000 Genomes dataset, leaving the remaining variants significantly enriched in biological and medical significance.

The 1000 Genomes data also improve our imputation space into rare SNPs and structural variations Citation[12], increasing our odds of finding meaningful associations with complex traits; this also aids the design of both genotyping arrays and sequencing-based disease studies. For example, Sudmant et al. tested read-depth profiles against genomic regions where the copy number is already known, demonstrating that there is a strong association between regions that show population-level stratification of copy number variants and disease (perhaps due to genomic instability) Citation[13].

Nonetheless, the most striking recent development in the field of genomics has been the recognition of the importance of rare functional variation. Rare variation (in contrast to common variants, which tend to be present throughout the world) is recent in origin and often limited to small populations or subpopulations, making it a central target for the later phases of the 1000 Genomes Project. A recent publication showed that around three-quarters of coding variation is of less than 1% allele frequency across seven populations Citation[14]. Much of this variation is also novel. However, despite the great progress that has already been made, we have even now only scratched the surface of the total variation present across the entire human population.

In medical practice, whole-exome analysis has firmly established itself as a leading factor driving the burgeoning era of personalized medicine and has been successfully incorporated into clinical diagnoses. Exomes presently offer a cost–effective way for clinicians to diagnose disease where single-gene analyses can often fail (e.g., see Citation[15]) and may increasingly facilitate personalized treatment options by revealing specific mutations in individual patients. A canonical example of such personalized medicine comes from the characterization of the HER2 locus in somatic tumor cells (present in 20% of breast cancer patients and associated with poor prognosis Citation[16]). This has made accurate testing of breast cancer patients possible, allowing treatment with trastuzumab to be prescribed appropriately. Trastuzumab is an antibody that targets the HER2 receptor and significantly improves outcomes in HER2-positive patients (24% increase in life expectancy after 8 years of observation Citation[104]). Because approximately one in eight women will be affected by invasive breast cancer and trastuzumab is an expensive treatment with potentially severe side effects, limiting its use to patients in which it can be effective is of tremendous importance. Consequently, HER2 workups have now become routine in patients with invasive breast cancer. Other examples include:

  • ▪ The leukocyte-surface protein CCR5 is a chemokine receptor that is the conduit used by most forms of HIV to infect the host cell. The CCR5-Δ32 deletion, found in a small number of individuals (disproportionately in those of Northern European ancestry) is protective against CCR5-tropic HIV variants due to its destruction of T-cell receptor function. Personalized medicines (antiretrovirals) based on this mechanism, such as maraviroc, reduce viral load in patients with CCR5-tropic HIV Citation[17];

  • ▪ Administration of the anticoagulant warfarin is now routinely preceded by a genetic workup to determine optimal dosage in specific patients; this is much more satisfactory than the risky trial-and-error approach formerly employed. Likewise, PCSK9, which encodes an enzyme involved in LDL cholesterol syntheses, is undergoing investigation as a potential target for hypercholesterolemia. Clinical trials of drugs targeting PCSK9 indicate that they may be a superior alternative to statins in specificity and efficacy Citation[18];

  • ▪ Our own example was published immediately prior to the time of writing and utilized the entire Phase I dataset to provide a detailed characterization of the highly polymorphic vWF gene across several populations Citation[19]. From the large number of novel variants that were discovered, of particular note were population-level differences in variants that were identified by previous studies as von Willebrand disease (vWD) mutations. Specifically, three vWD mutations (R2185Q, H817Q and M740I) that were found to be rare in Europeans segregate quite commonly in Africans (where the minor allele frequency is >13%). Thus, these mutations when present in individuals of European descent can be involved in vWD (a relatively common hereditary abnormality, affecting more than 1% of individuals), whereas this is not necessarily the case in other populations;

  • ▪ Secondary traumatic brain injury – resulting from hemorrhage, ischemia and free radical relaxation following primary brain trauma – is another candidate for amelioration by targeted therapy. The Toll-like receptor-mediated pathway was recently revealed to be significantly upregulated in traumatic brain injury rats Citation[20]. The development of personalized treatments could be based on the modification of this pathway; this is supported by profuse genomic, proteomic and metabolomic data Citation[21].

These examples and many others like them highlight the need for the application of population-specific and individual genomic information to personalized medicine. When integrated with other epidemiology and ‘omics‘ data, this will guide diagnostics and treatment in a more precise manner in future medical practice. As demonstrated in the above examples, personalized medicine utilizes the genetic background of individuals to facilitate the optimized application of medical treatment. This both maximizes favorable patient outcomes and minimizes the economic and health burdens imposed on the general population.

As we discussed above, the 1000 Genomes Project has been instrumental in clarifying the ubiquity of rare variants and their importance to the study of human disease (for review, see Citation[22]), and this will become a primary focus of genomics over the next few years. Since a major fraction of variants are limited to small populations or extended family groups, this will require a huge expansion in both sample sizes and depth of coverage for this variation to become genetically tractable. Although the final phase of the 1000 Genomes Project will increase the diversity of individuals studied (sequencing a further 1500 people from 12 additional populations), future large-scale genomics projects will need to build on the 1000 Genomes data, each likely focusing their efforts on a single specific group of interest. A good example is provided by the work of the CHARGE consortium Citation[23], which has sequenced large, well-phenotyped cohorts for complex trait associations.

Several other related projects will also be of key importance for personalized medicine. The NIH Human Microbiome Project Citation[24] and similar research projects have already begun to reveal the extent to which our health is influenced by the flora inside our bodies. Everything from normal metabolic function to digestive and dental problems to pathogenic infection and response to antibiotics can be affected by an individual‘s unique microbiome. Other current work is examining genome-wide epigenetic modifications (histone modifications and DNA methylation, among others). Epigenetic mechanisms have broad relevance in areas such as common diseases, pharmacology and cancer Citation[25], and, thus, epigenomics, alongside microbiome and human genomics projects, will also have a large role to play in personalized medicine.

We can now anticipate a time in the near future where many individuals‘ genomes are sequenced to high depth shortly after birth. This will have a large impact on medicine, both in terms of research of disease etiology and of clinicians attempting to understand disease in specific cases. Inevitably, this will also produce profound and lasting effects (both positive and negative) more generally. As recently demonstrated in Science, purportedly-anonymous genomes can readily be identified using a combination of publicly available databases, individual-specific short tandem repeats and other available information (such as age) Citation[26]. Public attitudes to universal genomic awareness will vary across the world, perhaps based on systems of healthcare provision and cultural attitudes to individual freedom and privacy. In any case, these ethical concerns need to be seriously debated, preferably prior to becoming of widespread public concern.

Financial&competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

References

  • Shendure J , HanleeJ. Next-generation DNA sequencing. Nat. Biotech.26(10) , 1135–1145 (2008).
  • Metzker ML . Sequencing technologies – the next generation. Nat. Rev. Genet.11(1) , 31–46 (2010).
  • International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature409(6822) , 860–921 (2001).
  • Venter JC , AdamsMD, MyersEW et al. The sequence of the human genome. Science 291(5507) , 1304–1351 (2001).
  • The International HapMap Consortium. A haplotype map of the human genome. Nature437(27) , 1299–1320 (2005).
  • The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature449(18) , 851–862 (2007).
  • The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature467(2) , 52–58 (2010).
  • Manolio TA , CollinsFS, CoxNJ et al. Finding the missing heritability of complex diseases. Nature 461 , 747–753 (2009).
  • The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature467 , 1061–1073 (2010).
  • The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature491 , 56–65 (2012).
  • Grossman SR , AndersenKG, ShlyakhterI et al. Identifying recent adaptations in large-scale genomic data. Cell 152(4) , 703–713 (2013).
  • Lu JT , WangY, GibbsRA, YuF. Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms. Genome Biol.13 , R15 (2012).
  • Sudmant PH , KitzmanJO, AntonacciF et al. Diversity of human copy number variation and multicopy genes. Science 330(6004) , 641–646 (2010).
  • Marth GT , YuF, IndapAR et al. The functional spectrum of low-frequency coding variation. Genome Biol. 12(9) , R84 (2011).
  • Choi M , SchollUI, JiW et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106(45) , 19096–19101 (2009).
  • Slamon DJ , ClarkGM, WongSG, LevinWJ, UllrichA, McGuireWL. Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science235(4785) , 177–182 (1987).
  • Emmelkamp JM , RockstrohJK. CCR5 antagonists: comparison of efficacy, side effects, pharmacokinetics and interactions – review of the literature. Eur. J. Med. Res.12(9) , 409–417 (2007).
  • Crunkhorn S . Trial watch: PCSK9 antibody reduces LDL cholesterol. Nat. Rev. Drug Discov.11(1) , 11 (2012).
  • Wang QY , SongJ, GibbsRA, BoerwinkleE, DongJF, YuF. Characterizing polymorphisms and allelic diversity of von Willebrand factor gene in the 1000 Genomes. J. Thromb. Haemost.11 , 261–269 (2013).
  • Hua F , WangJ, IshratT, WeiW, AtifF, SayeedI, SteinDG. Genomic profile of Toll-like receptor pathways in traumatically brain-injured mice: effect of exogenous progesterone. J. Neuroinflammat.8 , 42 (2011).
  • Manley GT , Diaz-ArrastiaR, BrophyM et al. Common data elements for traumatic brain injury: recommendations from the biospecimens and biomarkers working group. Arch. Phys. Med. Rehabil. 91(11) , 1667–1672 (2010).
  • Lupski JR , BelmontJW, BoerwinkleE, GibbsRA. Clan genomics and the complex architecture of human disease. Cell147(1) , 32–43 (2011).
  • Psaty BM , O‘DonnellCJ, GudnasonV et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium: design of prospective meta-analyses of genome-wide association studies from five cohorts. Circ. Cardiovasc. Genet. 2 , 73–80 (2009).
  • The NIH HMP Working Group: The NIH Human Microbiome Project. Genome Res.19(12) , 2317–2323 (2009).
  • Portela A , EstellerM. Epigenetic modifications and human disease. Nat. Biotechnol.28 , 1057–1068 (2010).
  • Gymrek M , McGuireAL, GolanD, HalperinE, ErlichY. Identifying personal genomes by surname inference. Science339(6177) , 321–324 (2013).

▪ Websites

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.