1,292
Views
6
CrossRef citations to date
0
Altmetric
Editorial

The role of clinical genomic testing in diagnosis and discovery of pathogenic mutations

&

Abstract

Next-generation sequencing in clinical practice allows for a critical review of the literature to evaluate disease relatedness of specific genes and pathogenicity of individual mutations, while providing an important discovery tool for new disease genes and disease-causing mutations. Data obtained from large panels, whole exome or whole genome sequencing, performed for constitutional or cancer cases, need to be managed in a transparent, yet powerful analytical framework. Assessment of reported pathogenic potential of a variant or disease association of a gene requires careful consideration of population allele frequency, variant data from parents, and precise, yet concise phenotypic description of the entire family and other individuals or families that have the same variant. The full potential for discovery can only be realized if there is data sharing between clinicians performing the interpretation worldwide and structural biologists, analytical chemists and cell biologists interested and knowledgeable of the structure and function of the genes involved.

Review of the current variant detection & interpretation pipelines: their strengths & their limitations

Hardware

The currently dominant platforms for next-generation sequencing are manufactured by Illumina and use a modified and highly multiplexed variant of Sanger sequencing. Sequential incorporation of fluorescently labeled nucleotides into a templated synthesis process is detected using an optical system. This technology is robust, relatively fast and inexpensive. Artifacts generated by this methodology make detection of variants present in less than 5% of the DNA alleles analyzed cumbersome and detection of those present in less than 1% practically impossible without some additional indexing approach Citation[1]. Other competing platforms include Life Technologies’ Proton machine, which uses pH measurements to detect incorporation of the four unlabeled nucleotide species (G, A, T, C) injected one at a time in successive waves. The increased efficiency of using a chemical detection system is offset by a loss of accuracy around homopolymer regions Citation[2]. For targeted, well-characterized regions, this approach works well and is indeed used as the primary tool in many institutions, but its use for discovery of novel variants is limited. The other currently available, but clinically seldom used platform is from Pacific Biosystems. It allows single-molecule sequencing using real-time detection of labeled nucleotide incorporation into long template (up to 10 kb) molecules by a single, highly processive DNA polymerase Citation[3]. Use of this platform has been limited to de novo sequencing of microorganisms, and some special cases of human samples where long repeat containing regions need to be accurately sized. Indeed, there is great expectation that as the technology matures, it will allow routine detection of disease-causing nucleotide repeat expansions and help with the discovery of thus far unrecognized ones. Our experience is with the Illumina technology, specifically the MiSeq and HiSeq machines, which are the workhorses of most clinical next-generation sequencing laboratories. We find that constitutional variants present above an allele ratio of 0.25 are always confirmed by an orthologous method such as Sanger sequencing, but below this cutoff, Sanger sequencing is required before a variant can be reported.

DNA capture

Until the cost of whole genome sequencing and ability to manage the resulting data and variants becomes manageable, most clinical testing will utilize some methodology to enrich regions of special interest. There are three, may be four, major capture methods that are currently used. Selection of the most suitable method is based on the level of certainty relating to the predicted cause of the disorder in constitutional genetics or the number of actionable results in cancer genetics, as well as the number of cases to be tested and other considerations relating to workflow, cost and reimbursement. Long-range PCR amplification is predominantly used for regions with a high number of very closely related sequences scattered throughout the genome. In this situation, it is important to capture the target of interest with high specificity, because mutation detection would otherwise be unreliable due to large number of variations in highly similar, but irrelevant genomic regions. At Columbia Laboratory of Personalized Genomic Medicine (PGM), we use this approach for detecting mutations in the mitochondrial genome, thus excluding the possibility of interpreting variation in the many mitochondrial pseudogenes as true mitochondrial mutations. Another example for this type of approach is molecular testing of PKD1, the polycystic kidney disease gene with multiple, highly similar pseudogenes in the genome: long-range PCR may also be used for confirmation of variants detected on exome or large targeted panel sequencing, or to supplement hybridization-based panels Citation[4]. The limitation of this approach is the size of the region that can be thus interrogated and the increased hands-on time for the library preparation. However, the variants obtained from such test are highly reliable. Multiplexed PCR approaches are used for amplifying relatively small regions (less than a megabase) where availability of the starting material is limited. Examples include the TruSeq amplicon reagents of Illumina, the HaloPlex method of Agilent, Ion AmpliSeq panels of Life Technologies and compartmentalized PCR reaction approach represented by Raindance. Various laboratories choose between these methods based on local needs, hardware and workflow issues. We found TruSeq amplicon to be most suited to our workflow needs and use this target capture where the sample size is limited. Results from these capture methods have an elevated level of background noise and it is important that variants generated by these methods are sequenced at great depths, several 100-fold, so that allele ratios can approach the theoretically predicted value. It is important to be aware that duplicate reads are harder to identify because of the defined beginning and endpoint of the amplified fragments, and forward and reverse allele ratios are sometimes biased due to the same reason. Hybridization-based selection of target regions is the mainstay for large constitutional and cancer panels including whole exome sequencing, but this approach requires at least 100 ng DNA to perform. We use Agilent Sureselect reagents both for our custom constitutional and cancer panels (covering 1300 and 500 genes, respectively) and whole exome testing (v5 with UTR). There are other comparable reagents on the market from Nimblegen and other manufacturers. Great effort is spent by these companies to increase the performance of these reagents in hard-to-capture, high GC content areas of the coding genome Citation[5]. Since the input into these assays is randomly fragmented DNA, duplicate reads are easier to identify and forward reverse allele ratios are more reliable to be used for quality control purposes. There are, however, some important limitations to the hybridization-based capture method. It is sensitive to midsize (from 20 to hundreds of nucleotides) deletions and duplications. Since many exons are less than 100-bp long, a deletion of 40–50 nucleotides will result in inability to capture the mutant alleles, or will result in a significant bias in representation in the library generated from the captured material. With low representation, the chances that the mutation is filtered out as noise increase. This problem is compounded by gaps and inaccuracies in the available reference genome, as well as incomplete representation of human diversity in the reference genomes used to generate the capture reagents. Indeed, improving read length that can be obtained would alleviate this problem to some extent, but we will have to switch to whole genome sequencing to fully eliminate this problem. A parallel way of obtaining sequence information from the most important segment of the genome in any cell type is through sequencing the transcriptome. This is essential for cancer diagnosis since it provides an integrated output of the actual living state of the cell and the transcripts present at any given point, which could be very difficult and expensive to establish from even extensive analysis of whole genome sequencing. For this reason – and the additional benefit of identifying fusion transcripts – tumor transcriptome sequencing is an integral part of our Cancer Whole Exome testing, while for constitutional disorders, this approach is limited by the lack of availability of RNA from the tissue predicted to be affected (e.g., brain) Citation[6]. Since transcriptome data carry transcript-level information in addition to variant information, much further work needs to be done to unlock all the information it carries. The most immediate and interesting challenge is to develop methodologies to interpret the clinical significance of variants in untranslated regions.

Mutation calling & interpretation

Sequence alignment and mutation calling algorithms have matured and are standardized to a significant degree. Most labs use burrows-wheeler aligner for alignment and the genome analysis toolkit pipeline as described in the best practices guidelines for mutation calling. These programs transform fastq files (individual reads with quality scores for each base called) generated by the sequencer into vcf files describing variants identified at specific chromosomal positions. The real difference between laboratories in the specificity and sensitivity of disease-causing variant identification comes from the laboratories’ ability to generate a database informing mutation selection. The many off-the-shelf software packages available to assist with this process are often limited by the lack of clinical thinking shaping their design and clinical validation verifying their effectiveness. Most large centers have developed their own analysis pipeline/database, because that allows a higher level of understanding and control over the filtering process. Generally, there is a trade-off between sensitivity in identifying disease-causing mutations and the length of the list generated by the automated filtering of the pipeline. A highly trained human reviewer cannot yet be replaced by any computer algorithm. There are some general guidelines, however, that are based on medical knowledge about disease frequencies and modes of inheritance. Filtering the variants based on their established pathogenicity (ClinVar, HGMD, OMIM, COSMIC) and frequency in the general population (1000 genomes project, exome variant server, EXAC and internal databases) is the first step Citation[7]. Subsequently, variants are stratified based on zygosity (homozygous, compound heterozygous), disruptive nature (nonsense, frameshift, splice site, predicted damaging/disruptive) and presence or absence in affected or healthy parents (whether inherited or de novo). It cannot be emphasized enough that any genetic testing center is as good as the database it draws its information from. Over-interpretation of variants is a grave problem and is mostly due to the literature and reference datasets carrying a lot of incorrect information. Many rare ethnic variants appear in databases as pathogenic, simply because assumed ethnicity-matched controls could not and cannot be accurately chosen for outbred populations. Even having tens of thousands of datasets with precise, well-curated phenotype information is not sufficient to address the level of human genetic diversity. This makes it imperative that centers share data to allow for optimal assignment of significance and, thus, accelerate discovery. There have been many efforts to this purpose, but much more needs to be done Citation[8]. One current limitation that is almost universal to all ‘matchmaking’ efforts is the limitation in the number of variants that one can submit for evaluation. Sharing large sets of variants of unknown significance, especially in non-disease associated genes, is discouraged. This is an important limitation and there are various approaches to overcome it, but there are many formidable regulatory and logistic barriers that need to be overcome to unlock the full potential of data sharing or combined analysis.

How to maximize short-term & long-term benefit from clinical next-generation sequencing in patient care

The short-term benefit from clinical genomic analysis is the establishment of a clinical diagnosis and identification of actionable mutations. The long-term benefit would be discoveries that allow for development of novel diagnostic, prognostic and therapeutic modalities. It is important to find the right balance and approach to optimize short-term and long-term benefits.

Retrospective analyses have estimated that whole exome sequencing of trios can result in significant cost savings when used to test individuals with putative genetic syndromes who do not receive a clinical diagnosis following the first clinic visit Citation[9]. Once the syndrome is known, proper preventive care can be administered for known complications of the disorder and interventions that are futile or harmful can be stopped. This is especially important for syndromes with variable effects on immune competency and risk for various malignancies. Indeed, in our experience, the greatest impact of constitutional genetic testing is in families with young children with malignancies. For example, identification of a homozygous mutation in a child with autosomal recessive constitutional mismatch repair deficiency identifies the parents as carriers of Lynch syndrome mutations with implications for cancer screening and their future children as being at risk for both Lynch syndrome (heterozygous mutations) and constitutional mismatch repair deficiency (homozygous mutations). Identification of an inherited cancer-predisposing mutation in an affected individual can inform testing of potential donor siblings who may be at risk for the same condition. Rarely, constitutional exome testing can establish a diagnosis where a therapeutic intervention is available – for example, the correction of an enzyme deficiency by increasing the substrate for the reaction, or the provision of an alternative substrate. We are able to provide a definitive molecular diagnosis in about one-third of clinical constitutional cases. In about 3% of the cases, some important secondary finding is discovered and reported based on the recommendations of American College of Medical Genetics. The rest of the variations detected go unreported and often fail to make it to a data pool where they can be reinterpreted and used for discovery purposes. It is imperative that during obtaining consent from the patients for genetic testing, they are presented with the opportunity to allow that their data and phenotype information to be used to help with the diagnosis of others. On the other hand, it is also very important that research interpretation of their data is handled with special care and not returned to them until its clinical significance is clearly established.

Identification of somatic mutations in cancer samples has a more direct impact. Although the number of cancer types with established targeted therapies is currently fewer than 10, one study found that with whole exome or large targeted panels, up to 75% of tumors show somatic variants in genes that have a potential targeted treatment Citation[10]. In addition, there are numerous clinical trials that require molecular characterization for enrollment and it is expected that such trials will increase in number. In turn, the results of such trials will increase the number of mutations that dictate or inform medical management of cancers. Another important aspect of somatic mutation mapping is the potential for using genomic and transcriptome information together to guide newly emerging approaches of immunotherapy Citation[11]. An integrated database of the genetic makeup of tumors based on response to various therapies would be of great benefit both to the patients and for understanding the biology of malignancies.

Opportunities for collaborative innovation

Next-generation sequencing has allowed a previously unimaginable level of insight into the genetic cause of congenital defects and cancer. Mutational data from patients have and will continue to provide a wealth of information about the biological function of proteins and the role of specific protein domains and conserved residues in these domains. The most important and easiest way to improve diagnostic and discovery yield is simply by combining more well-curated datasets to increase the power of detection of shared mutations in people with similar phenotypes. Mapping human phenotype-associated mutations onto the functional model of individual genes and proteins as well as multi-molecular complexes will provide an additional level of insight Citation[12]. Instead of looking for mutations shared between individuals in specific proteins, we will look for shared disruption of molecular assemblies and pathways. Correct assessment of transcript levels and transcript heterogeneity used in combination with measurements of protein levels and posttranslational protein modifications using mass spectrometry will allow better overall understanding of the metabolic homeostasis of cells and tissues Citation[13]. The development and incorporation of a structured vocabulary for patient signs and symptoms, such as the human Phenotype Ontology Citation[14], structured patient histories incorporating this vocabulary, as well as laboratory data organized in a ‘molecular systems’-based manner into the data analysis pipelines will greatly enhance the correct identification of disease-causing mutations. This has been achieved with some success for known disease-causing genes Citation[15]. Work needs to be done to adapt such approaches for extended disease phenotypes, and even to genes not previously linked to disease. Confirmatory functional assessment using the most appropriate model organisms in a timely matter will be of crucial importance. Ongoing collaboration between clinicians and diagnostic laboratories is needed for updating the existing standards Citation[16]; reporting of sequence variants, especially variants of uncertain significance Citation[17]; reporting of secondary or incidental findings Citation[18,19]; and proficiency testing and inter-laboratory comparisons Citation[20,21]. In addition to improving local analytical and interpretive capabilities, there is an urgent need to develop centralized interpretation and functional and experimental follow-up capabilities. Much more work needs to be done to create the infrastructure to support such concerted effort.

Conclusion

The future is already here. Many thousands of genomes are sequenced daily and this number will continue to grow. Within 5 years, our understanding of the human genome will be so deep that the reference human genome and the variation databases will be cleaned of clerical errors and misannotations. Variants will be described for most genomic coordinates in various populations that would be compatible with an individual to be ‘healthy’ at a reasonably advanced age. This will render identification of disease-causing single gene mutations automatable and, thus, render the interpretation instantaneous and decrease the interpretation costs. Cracking the problem of multigenic disorders will require a much more extensive understanding of the molecular systems in cells, tissues and the body, so that mutations in various genes can be properly weighed in their contribution to specific phenotypes. A wonderfully encouraging example of this type of approach is the molecular study of blood pressure abnormalities. There will be similar success stories emerging and common diseases will be reclassified and managed based on the relative contributions of mutations in genes involved in their pathogenesis. For cancer/somatic mutations, the cataloguing phase of cancer-causing or associated mutations will be close to complete. This will allow for simpler, faster and non-sequencing based point-of-care–type diagnostic tests to enter the clinic and inform at least the initial diagnosis of patients. As for treatment, because of the stochastic element in cancer cell evolution, there might always persist a need for sequencing entire tumor genomes to guide therapy. The heterogeneity of environmentally induced malignancies will probably also support this trend. In summary, for the next 5 years, clinical genomics will be a discovery and mapping effort for the most part and the greatest benefits will predominantly materialize in the coming decade.

Financial & competing interests disclosure

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

No writing assistance was utilized in the production of this manuscript.

References

  • Kennedy SR, Schmitt MW, Fox EJ, et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nature Protoc 2014;9(11):2586-606
  • Boland JF, Chung CC, Roberson D, et al. The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet 2013;132(10):1153-63
  • English AC, Salerno WJ, Hampton OA, et al. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 2015;16(1):286
  • Mandelker D, Amr SS, Pugh T, et al. Comprehensive diagnostic testing for stereocilin: an approach for analyzing medically important genes with high homology. J Mol Diagn 2014;16(6):639-47
  • Bodi K, Perera AG, Adams PS, et al. Comparison of commercially available target enrichment methods for next-generation sequencing. J Biomol Tech 2013;24(2):73-86
  • Hofvander J, Tayebwa J, Nilsson J, et al. RNA sequencing of sarcomas with simple karyotypes: identification and enrichment of fusion transcripts. Lab Invest 2015;95(6):603-9
  • Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010;38(16):e164
  • Topol EJ. The big medical data miss: challenges in establishing an open medical resource. Nat Rev Genet 2015;16(5):253-4
  • Shashi V, McConkie-Rosell A, Rosell B, et al. The utility of the traditional medical genetics diagnostic evaluation in the context of next-generation sequencing for undiagnosed genetic disorders. Genet Med 2014;16(2):176-82
  • Jones S, Anagnostou V, Lytle K, et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci Transl Med 2015;7(283):283ra53
  • Bluestone JA, Tang Q. Immunotherapy: Making the case for precision medicine. Sci Transl Med 2015;7(280):280ed3
  • Luu TD, Rusu AM, Walter V, et al. MSV3d: database of human MisSense Variants mapped to 3D protein structure. Database(Oxford) 2012;2012:bas018
  • Alfaro JA, Sinha A, Kislinger T, Boutros PC. Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Methods 2014;11(11):1107-13
  • Kohler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 2014;42(Database issue):D966-74
  • Zemojtel T, Kohler S, Mackenroth L, et al. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med 2014;6(252):252ra123
  • Aziz N, Zhao Q, Bry L, et al. College of american pathologists’ laboratory standards for next-generation sequencing clinical tests. Arch Pathol Lab Med 2015;139(4):481-93
  • Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17(5):405-24
  • Green RC, Berg JS, Grody WW, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 2013;15(7):565-74
  • Hegde M, Bale S, Bayrak-Toydemir P, et al. Reporting incidental findings in genomic scale clinical sequencing–a clinical laboratory perspective: a report of the Association for Molecular Pathology. J Mol Diagn 2015;17(2):107-17
  • Schrijver I, Aziz N, Jennings LJ, et al. Methods-based proficiency testing in molecular genetic pathology. J Mol Diagn 2014;16(3):283-7
  • Vrijenhoek T, Kraaijeveld K, Elferink M, et al. Next-generation sequencing-based genome diagnostics across clinical genetics centers: implementation choices and their effects. Eur J Hum Genet 2015. [Epub ahead of print]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.