1,909
Views
3
CrossRef citations to date
0
Altmetric
Editorial

How can mate-pair sequencing be utilized for cancer patients?

&
Pages 1-3 | Received 31 Aug 2016, Accepted 08 Nov 2016, Published online: 24 Nov 2016

1. Next generation sequencing (NGS)

NGS utilizes the power of massively parallel sequencing to generate large numbers of sequences simultaneously. The first NGS platform (454) could generate 20 million base pairs of sequence from a single run [Citation1] which was several orders of magnitude greater than what could be obtained with Sanger sequencing using capillary electrophoresis [Citation2]. Improvements on this platform resulted in increased sequence output so that eventually this platform could produce 500 Mbs of sequence data per run. This platform decreased the cost of sequencing a single human genome from several hundred million dollars to just one million dollars.

The second NGS platform to be developed was from the company Illumina and initially it could produce 48 million short sequence reads for a total output of about 1 gigabase (Gb) [Citation3]. This platform had much greater potential for significant improvements and over the next 10 years the output increased over three orders of magnitude. The current high output Illumina machines can generate over 1.5 billion independent sequence reads for a total output of 1.8 terrabases (Tb) per run. On these machines the cost for generating sufficient sequence data for whole genome sequencing (WGS) is now less than $1000.

There are a number of other NGS platforms that are now commercially available utilizing different sequencing platforms including ones available from Ion Torrent [Citation4], Pacific Biosciences [Citation5], and Oxford Nanopore [Citation6]. Each of these platforms offers different strengths and weaknesses, but none of these currently generate sufficient sequence for WGS.

2. Whole genome sequencing

To generate the full genome sequence of a human requires about 100 Gbs of sequencing. WGS is a comprehensive sequencing tool that can be utilized to characterize alterations that occur during the development of cancer including point mutations, insertions, deletions, translocations and amplifications. There are, however, a number of severe limitations to utilizing WGS in routine cancer care. The main limitations are that the full cost of WGS is considerably greater than just sequencing, as it includes library preparation, post-sequencing analysis and costs for storing the resulting sequence data. In addition, the assembly of short 150 base pair reads into a comprehensive genome sequence is challenging with a genome where 45% of the sequences are highly repetitive. Furthermore, since only 2% of the genome actually codes for protein it is very difficult to determine the significance of alterations that are detected outside of the coding sequences.

3. Mate-pair next generation sequencing (MP-Seq)

Traditional genomics libraries for WGS are based upon fragmentation of the genome into 300–500 base pair pieces that are then amplified on the Illumina slide surface with bridge amplification. Fragment ends are then sequenced with paired end sequencing and the resulting sequences are assembled into an integrated genome sequence. In contrast, mate-pair libraries are produced by generating much larger kilobase-sized fragments and adding selectable markers (The Illumina Nextera MP library kits utilize biotin) to the ends. These fragments are circularized and ligated so that the ends of the fragments are immediately adjacent. Non-circularized DNA molecules are removed with exonucleases. Circularized molecules are then sheared into ~500 bp fragments and the two ligated ends can be purified with streptavidin-coated magnetic beads. The resulting library of purified end fragments are ideal substrates for paired end NGS. The resulting 100–150 base pairs of sequence determined for these two end fragments can then be aligned to a reference genome [Citation7]. Even though only 200–300 base pairs of sequence is generated, the mate-pairs provide contextual information about the entire region between the two mate-pairs: known as bridged coverage.

There are also several other commercial available Mate Pair Library kits that have their own unique features. For example, the NxSeq Long Mate from Lucigen has shown that they could improve assembly quality [Citation8]. Sage Science has improved the MP method using the SageELF, which lowers the needed sample input [Citation9]. RIKEN Center of Life Science Technologies in Japan modified the Illumina Nextera MP protocol to reduce the total cost for per library construction [Citation10].

MP-Seq offers two powerful tools that facilitate genome characterization. The first is that with considerably less overall sequencing one can obtain detailed information about genome organization due to the bridged coverage. With MP-seq one obtains a short sequence from the two original ends of a 5 Kb or even larger size fragments, thus, with just 5 Gbs of sequencing (1/20–1/10 what is needed for whole genome sequencing) one can ascertain insertions, deletions, amplifications and translocations of a genome in question. MP-seq has the advantage for detecting large genomic chromosomal rearrangements which play important roles in the cancer development. While structure variants smaller than the MP insert size might be missed, this problem can be resolved by increasing sequencing depth to ultimately obtain nucleotide level resolution. MP-seq offers a much more powerful tool for genome characterization than array comparative genomic hybridization (aCGH) [Citation11] because no prior information about genome sequence is required and this technology can detect balanced reciprocal translocations which aCGH cannot. The second important advantage of MP-Seq is that the paired mate-pairs also facilitate the mapping of genomes that are rife with repetitive sequences as the mate-pairs can effectively jump over shorter stretches of repetitive sequences. Thus, MP-Seq, and especially considerably longer MP-Seq libraries, has proven very powerful for finishing genome sequences by linking together disconnected sequences obtained with WGS [Citation8,Citation12].

4. Mp-seq to characterize cancer genomes

MP-Seq of tumors is a powerful tool for the characterization of the alterations that have occurred within a tumor genome. Most cancer genomes contain multiple alterations that can be easily detectable with just 5 Gbs of sequencing of an MP-Seq library. For tumor samples that have a low percentage/purity of tumor cells or for highly heterogeneous samples, one can still obtain useful information but additional sequencing is required. Alternatively, it has been shown that MP-seq could be applied to the limited number of cells obtained from laser capture microdissection for the characterization of the structural alterations [Citation13].

Feldman et al. used MP-seq to identify recurrent t (Citation6, Citation7) (p25.3:q32.2) translocations in ALK-negative anaplastic large cell lymphomas [Citation7]. We previously utilized MP-Seq to study HPV integrations in oropharyngeal squamous cell carcinomas and found that HPV integrations into the human genome occur much less frequently than they do in cervical cancer [Citation14]. In our analysis of a number of oropharyngeal squamous cell carcinomas, we found that there were multiple cancer-specific alterations (insertions, deletions and/or translocations) present in each of these cancers. MP-seq revealed that genomic rearrangement events are significant drivers of pancreatic ductal adenocarcinoma [Citation15]. Integrated analysis of MP-Seq and RNA seq also identified the VAV1 fusion, ITK–FER and IKZF2-ERBB4 fusion in peripheral T-cell lymphoma [Citation16]. All these have highlighted the potential for MP-Seq to guide individualized therapy approaches for cancer patient treatment.

5. Monitoring cancer patients with a liquid biopsy

Metastatic cancers result in cancer cells being present in the blood and when these lyse produce circulating tumor-free DNA (ctfDNA) [Citation17]. The half-life of ctfDNA is quite short [Citation18] hence the concentration of ctfDNA is directly related to the amount of invasive tumor present in a patient. CtfDNA has been reported as a promising marker in multiple different types of cancers which could be used to monitor disease progression [Citation19,Citation20]. We propose that the cancer-specific alterations identified by MP-seq of the primary tumor can be utilized as the biomarkers in the ctfDNA to monitor cancer progression during the course of clinical treatment for each individual patient. In a recent report Harris et al used this strategy with patients with ovarian tumors [Citation21]. They identified novel junctions from the MP-Seq data on primary tumors and then demonstrated that they could detect and quantify those cancer-specific junctions with real-time PCR in pre-surgical blood. The presence of the ctf-DNA in post-surgery blood is also found consistent with documented tumor recurrence [Citation21] indicating the prognostic value of these biomarkers. The tumors from ovarian cancers are quite large and can generate considerable amounts of ctfDNA. However, for cancers that are considerably smaller, the powerful technique of digital droplet PCR (ddPCR) generates from 50,000 (on the BioRad platform) to 10 million droplets (on the RainDance platform) [Citation22] in a lipid interface. With ddPCR it is thus possible to detect and digitally quantify the proportion of tumor DNA to normal DNA within the blood. Previously, picodroplet digital PCR has been successfully utilized to detect KRAS mutations in circulating DNA from the plasma of colorectal cancer patients [Citation23].

6. MP-Seq as a powerful clinical tool in oncology

MP-Seq is therefore a very powerful technique that could very quickly be applied as a clinical tool in oncology. An MP-Seq library is prepared from the primary tumor. With just 5 Gbs of sequence data the effective bridged coverage of that cancer genome provides over 10× genome coverage to detect most insertions, deletions, translocations and regions of amplification. For most cancers the MP-Seq data would identify novel tumor-specific junctions and these could be used to facilitate the liquid biopsy to monitor that cancers response to clinical regimens. MP-Seq data solves many of the problems currently plaguing traditional WGS and could help in the eventual adoption of WGS as the primary tool for characterizing cancers and determining optimal personalized treatment options.

Declaration of interest

G. Gao and D.I. Smith have received financial support from Experimental Pathology Development Fund from Department of Laboratory Medicine and Pathology at Mayo Clinic. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Additional information

Funding

This paper was not funded.

References

  • Rothberg JM, Leamon JH. The development and impact of 454 sequencing. Nat Biotechnol. 2008;26(10):1117–1124.
  • Shendure JA, Porreca GJ, Church GM, et al. Overview of DNA sequencing strategies. Curr Protocol Mol Biol. 2011;96:7.1.1–7.1.23.
  • Genomics BN. The personal side of genomics. Nature. 2007;449(7162):627–630.
  • Rothberg JM, Hinz W, Rearick TM, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475(7356):348–352.
  • Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–138.
  • Mikheyev AS, Tin MM. A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour. 2014;14(6):1097–1102.
  • Feldman AL, Dogan A, Smith DI, et al. Discovery of recurrent t(6;7)(p25.3;q32.3) translocations in ALK-negative anaplastic large cell lymphomas by massively parallel genomic sequencing. Blood. 2011;117(7):915–919.
  • Brumm PJ, Monsma S, Keough B, et al. Complete genome sequence of thermus aquaticus Y51MC23. PLoS One. 2015;10(10):e0138674.
  • Heavens D, Accinelli GG, Clavijo B, et al. A method to simultaneously construct up to 12 differently sized illumina nextera long mate pair libraries with reduced DNA input, time, and cost. Bio Techniques. 2015;59:42–45.
  • Tatsumi K1, Nishimura O1, Itomi K1, et al. Optimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing. Bio Techniques. 2015;58(5):253–257.
  • Pinkel D, Segraves R, Sudar D, et al. High resolution analysis of DNA copy number variation using comparative enome hybridization to microarrays. Nat Genet. 1998;20(2):207–211.
  • Davis RW, Brannen AD, Hossain MJ, et al. Complete genome of staphylococus aureus tager 104 provides evidence of its relation to modern systemic hospital-acquired strains. BMC Genomics. 2016;17:179–194.
  • Murphy SJ, Cheville JC, Zarei S, et al. Mate pair sequencing of whole-genome-amplified DNA following laser capture microdissection of prostate cancer. DNA Res. 2012;19:395–406.
  • Gao G, Johnson SH, Kasperbauer JL, et al. Mate pair sequencing of oropharyngeal squamous cell carcinomas reveals that HPV integration occurs much less frequently than in cervical cancer. J Clin Virol. 2014;59(3):195–200.
  • Murphy SJ, Hart SN, Halling GC, et al. Integrated genomic analysis of pancreatic ductal adenocarcinomas reveals genomic rearrangement events as significant drivers of disease. Cancer Res. 2016;76:749–761.
  • Boddicker RL, Razidlo GL, Dasari S, et al. Integrated mate-pair and RNA sequencing identifies novel, targetable gene fusions in peripheral T-cell lymphoma. Blood. 2016;128(9):1234–1245.
  • Aarthy R, Mani S, Velusami S, et al. Role of circulating cell-free DNA in cancers. Mol Diagn Ther. 2015;19(6):339–350.
  • Diehl F, Schmidt K, Choti MA, et al. Circulating mutant DNA to assess tumor dynamics. Nat Med. 2008;14:985–990.
  • Bettegowda C, Sausen M, Rj L, et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med. 2014;6(224):224ra24.
  • Agostini M, Enzo MV, Bedin C, et al. Circulating cell-free DNA: a promising marker of regional lymphonode metastasis in breast cancer patients. Cancer Biomark. 2012;11(2–3):89–98.
  • Harris FR, Kotvn IV, Smadbeck J, et al. Quantification of somatic chromosomal rearrangements in circulating cell-free DNA from ovarian cancers. Sci Rep. 2016 Jul;20:6:29831.
  • Brouzes E, Medkova M, Savenelli N, et al. Droplet microfluidic technology for single-cell high-throughput screening. Proc Natl Acad Sci USA. 2009;106(34):14195–14200.
  • Taly V, Pekin D, Benhaim L, et al. Multiplex picodroplet digital PCR to detect KRAS mutations in circulating DNA from the plasma of colorectal cancer patients. Clin Chem. 2013;59:815–823.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.