427
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Great expectations: using massively parallel sequencing to solve inherited disorders

&
Pages 833-836 | Published online: 09 Jan 2014

The mutation discovery process for monogenic disorders is undergoing an exciting revolution. Previously dependent on linkage mapping and carefully selected gene-candidate approaches, identifying disease-causing mutations was both time consuming and frustrating. The advent of the new and emerging massively parallel sequencing technologies is driving unbiased discovery of novel disease-causing mutations at an unprecedented rate. How long will it be before the process is fully automated and the job of molecular geneticist is changed forever? We comment on the mutation-discovery strategies compatible with this technology and demonstrate that while the modern gene jockey might need to trade in the horse for something a bit faster, the finish line for interpreting human genetic variation is still a long way ahead.

A recent comprehensive and highly recommended review covers the technical capabilities of massively parallel (also called next-generation) sequencing for the field of genetics Citation[1]. Mutation-discovery applications using massively parallel sequencing survey whole-genome, whole-exome, region- or candidate-gene group-specific DNA templates. Each strategy has strengths and weaknesses that must be considered and matched with the genetics and biology of the clinical phenotype to be studied.

Whole-exome sequencing is already the method of choice for most mutation discovery efforts. Commercially available chip-based and in-solution array enrichment platforms make this protocol amenable to high-throughput automation and standardization. There are companies that offer this protocol as a fee for service, including downstream sequence analysis Citation[101]. Currently, 86% of the 100,000 mutations cataloged in the human genome mutation database affect coding regions Citation[2], which are captured by the whole-exome arrays. Even considering the biases, which have driven the discovery of these mutations so far, it is predicted that exome sequencing will be successful most of the time. Exome sequencing is already having an impact on patient care; in one study, mutations in SLC26A3 that are normally associated with congenital chloride-losing diarrhea (Online Mendelian Inheritance in Man [OMIM] 214700) were identified in a patient with a diagnosis of an atypical Bartter syndrome (OMIM 607364) Citation[3]. These findings led to the re-examination of the patient to confirm the diagnosis of chloride-losing diarrhea and the subsequent identification of five additional mutations in SLC26A3 in a cohort of 39 individuals diagnosed with Bartter syndrome. In a similar study, 11 novel mutations in DHODH associated with the recessive disorder Miller syndrome (OMIM 263750) were identified by initially sequencing the exomes of a brother pair and two affected individuals from unrelated pedigrees Citation[4]. This study also identified a ‘second-hit’ mutation in the brother pair in DNAH5, accounting for additional clinical observations consistent with primary ciliary dyskinesia (OMIM 608644). An innovative extension to this study used additional whole-exome sequencing data from the parents of the brother pair to perform linkage mapping Citation[5]. This highlights how this technology can be used to go from identifying an affected family to identifying the disease-causing mutation in one experiment. For most exome-capture projects published so far, copy-number variants have been separately assessed by array comparative genome hybridization. At least one massively parallel sequencing study has demonstrated that it is possible to determine copy-number variants based on sequence depth using array-captured material Citation[6].

There are occasions when the exome-capture strategy will fail to detect a mutation. Using this technology alone, the mutations that account for the lion’s share of causes for X-linked intellectual disability at the FRAXA (OMIM 309550) and FRAXE (OMIM 300806) loci and the polyalanine tract mutations in ARX (OMIM 300382) are unlikely to be identified, since these sequences are not surveyed by exon-array designs. A recent series of articles in Cell identified novel insertions of L1 and Alu transposable elements associated with cancer Citation[7] and also intellectual disability Citation[8]. The contribution of this type of mutation to human disease is yet to be quantified. These insertions locate to intronic and intergenic sequences and would be missed by an exome-specific capture strategy.

Whole-genome sequencing can create a nearly complete map of DNA variation in a single individual, including copy-number variants and other more complex genomic rearrangements. The ability to detect balanced translocations and inversions, and insertions of novel sequences using paired-end sequencing, gives whole-genome sequencing a significant advantage over array comparative genomic hybridization Citation[9]. However, to detect a single pathogenic variant there is a lot of noise to screen out. Whole-genome sequences of ‘healthy’ individuals tell us that each human genome has approximately 3–4 × 106 single nucleotide variations, between 2 and 7 × 105 of these will not be present in the Single Nucleotide Polymorphism Database (dbSNP) and a little under 2 × 104 of the total number of variants will be located in a coding sequence Citation[10–13]. Thus far, projects employing whole-genome sequencing have discovered mutations in exons of known genes Citation[10,14]. This suggests that alternative methods that enrich for the sequences that are most likely to harbor the disease-causing mutation (such as exome resequencing) should be the first choice for a mutation scan until whole-genome sequencing costs reduce and analysis of (particularly noncoding) DNA variation improves.

Region-specific genome enrichment will target either a broad selection of candidate gene sequences dispersed throughout the genome Citation[15] or entire linkage intervals Citation[16–18]. DNA templates can be derived from array-captured Citation[19,20] or PCR-amplified material Citation[21]. In situations of high genetic confidence (based on logarithm of odds scores), the targeted approach is recommended because it has the potential to reveal both coding and noncoding disease-relevant variants. The first example of region-specific genome enrichment was targeted to a linkage interval and identified mutations in PYCR1 as the cause of a rare Mendelian disorder – cutis laxa (OMIM 612940) Citation[16]. There are additional success stories of using this method for mutation-discovery efforts; one of these mapped the causative mutation for Clericuzio-type poikiloderma with neutropenia (OMIM 604173) to an uncharacterized gene, C16orf57. This was achieved by massively parallel sequencing of only the 3.4-Mb linkage interval of a single affected individual and led to the discovery of another mutation in the same gene in an unrelated family Citation[17]. Region-specific captures have also been employed for identifying somatic mutations on the X chromosome in childhood T-cell acute lymphoblastic leukemia (OMIM 613065) where there is an increased incidence of affected males Citation[22], refining the break points of genomic rearrangements Citation[23] and searching for second-hit recessive mutations within heterozygous deletions Citation[24]. Epigenetic variation has also been mapped and even quantified by specifically targeting CpG islands and performing bisulfite sequencing Citation[25], although interpreting these data must be viewed in light of the original tissue that the DNA was extracted from and whether it is relevant to the disease phenotype.

The discovery of all of the novel mutations mentioned in the aforementioned examples employ a filtering strategy to sort out benign from pathogenic variants. The most common method is to assume that a disease-causing variant of large effect is rare and unlikely to have been surveyed before. Typically, dbSNP and the collection of variations accumulating from whole-genome and whole-exome sequencing projects are assumed to be benign and are therefore excluded. With the inclusion of data from the 1000 genomes project, dbSNP now harbors variants that have allele frequencies of less than 1%. Unfortunately, this database also contains known pathogenic variants and errors due to poor sequence data or inaccurate alignment to the reference sequence Citation[26,27,102]. Thus, unsupervised filtering based on this database alone imparts some risk that a true pathogenic variant may be missed. Yet to be reported are the numbers of studies that have attempted to use any of the three main sequencing strategies and have thus far failed to identify a plausible mutation. A recent study of three pairs of monozygotic twins discordant for multiple sclerosis examined genome-wide variation at the genetic, epigenetic and transcriptome levels, but this failed to explain the cause of the discordance Citation[28]. Our experience with high-throughput Sanger sequencing of the X chromosome exome immediately resolved only 25% of cases (53 out of 208) Citation[29]. It is likely that the exome-capture strategies being employed now will have a similar success rate.

This is a revolutionary time for molecular genetics but the sequence result is just the beginning. There are bridges to cross before massively parallel sequencing can be contemplated as an application for routine diagnostic testing Citation[30]. Perhaps the largest barrier is the interpretation of the biological outcome of rare variation. For mutations occurring in functional domains of well-characterized genes that are known to be associated with disease, the task is not too difficult. When the variant occurs in a gene that is not well characterized (from our experience this is most of the time), the task becomes more difficult. For each individual there may exist more than one plausible mutation for which the pathogenicity must be established Citation[4]. Bioinformatic tools such as PMUT Citation[31], Sorting Intolerant From Tolerant (SIFT) Citation[32] and Polymorphism Phenotyping (PolyPhen) Citation[33], which assess the probability that a variant might have a damaging outcome on an expressed protein, are a good starting point but do not provide conclusive proof. When a plausible mutation is not found in a translated sequence or a splice site, how do you examine the multitude of intronic and intergenic variants you will discover in an entire genome? At some point, the pathogenicity of novel variants must be established using a combination of molecular biology, cell culture and animal models where appropriate. It is inevitable that comprehensive sequencing and assembly of individual genomes will be at the front line of genetic diagnosis and will inform treatment strategy. The question is: how long will it take?

Acknowledgements

The authors wish to thank Alison Gardner and John Mulley for critical reading of the manuscript.

Financial & competing interests disclosure

This work was supported by program grant 400121, and a principal research fellowship (to Jozef Gecz) awarded by the NH and MRC. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

References

  • Metzker ML. Sequencing technologies – the next generation. Nat. Rev. Genet.11, 31–46 (2010).
  • Cooper DN, Chen J, Ball EV et al. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum. Mutat.31, 631–655 (2010).
  • Choi M, Scholl UI, Ji W et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA106, 19096–19101 (2009).
  • Ng SB, Buckingham KJ, Lee C et al. Exome sequencing identifies the cause of a Mendelian disorder. Nat. Genet.42, 30–35 (2010).
  • Roach JC, Glusman G, Smit AFA et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science328, 636–639 (2010).
  • Herman DS, Hovingh GK, Iartchouk O et al. Filter-based hybridization capture of subgenomes enables resequencing and copy-number detection. Nat. Methods6, 507–510 (2009).
  • Iskow RC, McCabe MT, Mills RE et al. Natural mutagenesis of human genomes by endogenous retrotransposons. Cell141, 1253–1261 (2010).
  • Huang CRL, Schneider AM, Lu Y et al. Mobile interspersed repeats are major structural variants in the human genome. Cell141, 1171–1182 (2010).
  • Alkan C, Kidd JM, Marques-Bonet T et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet.41, 1061–1067 (2009).
  • Lupski JR, Reid JG, Gonzaga-Jauregui C et al. Whole-genome sequencing in a patient with Charcot–Marie–Tooth neuropathy. N. Engl. J. Med.362, 1181–1191 (2010).
  • Ng SB, Turner EH, Robertson PD et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature461, 272–276 (2009).
  • Schuster SC, Miller W, Ratan A et al. Complete Khoisan and Bantu genomes from southern Africa. Nature463, 943–947 (2010).
  • Bentley DR, Balasubramanian S, Swerdlow HP et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature456, 53–59 (2008).
  • Sobreira NLM, Cirulli ET, Avramopoulos D et al. Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet.6, e1000991 (2010).
  • Hoischen A, Gilissen C, Arts P et al. Massively parallel sequencing of ataxia genes after array-based enrichment. Hum. Mutat.31, 494–499 (2010).
  • Reversade B, Escande-Beillard N, Dimopoulou A et al. Mutations in PYCR1 cause cutis laxa with progeroid features. Nat. Genet.41, 1016–1021 (2009).
  • Volpi L, Roversi G, Colombo EA et al. Targeted next-generation sequencing appoints C16orf57 as Clericuzio-type poikiloderma with neutropenia gene. Am. J. Hum. Genet.86, 72–76 (2010).
  • Corbett MA, Bahlo M, Jolly L et al. A focal epilepsy and intellectual disability syndrome is due to a mutation in TBC1D24. Am. J. Hum. Genet.87, 371–375 (2010).
  • Albert TJ, Molla MN, Muzny DM et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods4, 903–905 (2007).
  • Gnirke A, Melnikov A, Maguire J et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol.27, 182–189 (2009).
  • Tewhey R, Warner JB, Nakano M et al. Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nat. Biotechnol.27, 1025–1031 (2009).
  • Van Vlierberghe P, Palomero T, Khiabanian H et al.PHF6 mutations in T-cell acute lymphoblastic leukemia. Nat. Genet.42, 338–342 (2010).
  • Conrad DF, Bird C, Blackburne B et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet.42, 385–391 (2010).
  • Heinzen EL, Radtke RA, Urban TJ et al. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet.86, 707–718 (2010).
  • Hodges E, Smith AD, Kendall J et al. High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Res.19, 1593–1605 (2009).
  • Mitchell AA, Zwick ME, Chakravarti A Cutler DJ. Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns. Bioinformatics20, 1022–1032 (2004).
  • Musumeci L, Arthur JW, Cheung FSG et al. Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies. Hum. Mutat.31, 67–73 (2010).
  • Baranzini SE, Mudge J, Van Velkinburgh JC et al. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature464, 1351–1356 (2010).
  • Tarpey P, Smith R, Pleasance E et al. A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat. Genet.41, 535–543 (2009).
  • Tucker T, Marra M, Friedman JM. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet.85, 142–154 (2009).
  • Ferrer-Costa C, Gelpi JL, Zamakola L et al. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics21, 3176–3178 (2005).
  • Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc.4, 1073–1081 (2009).
  • Sunyaev S, Ramensky V, Koch I et al. Prediction of deleterious human alleles. Hum. Mol. Genet.10, 591–597 (2001).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.