918
Views
0
CrossRef citations to date
0
Altmetric
Editorial

High-Throughput Long Noncoding RNA Profiling for Diagnostic and Prognostic Markers in Cancer: Opportunities and Challenges

Pages 1075-1078 | Published online: 06 Nov 2015

Long noncoding RNAs (lncRNAs) are a large class of gene transcripts discovered or better characterized in recent years, mostly as a result of high throughput technologies like RNA sequencing [Citation1–3]. They account for roughly 68% of transcribed genes from about 70% of human genome with transcription activity [Citation3,Citation4]. The largely unexplored desert has attracted great interest in the research community and publications in this topic have grown dramatically for the past few years, particularly for cancer. The increasing discoveries will help to better understand the deadly disease and find more promising diagnostic and prognostic markers. On the other hand, as one of the most poorly understood and annotated classes of gene transcripts, we are facing many challenges that need to be adequately addressed so that novel and clinical translational discoveries can be made.

The unique features of lncRNAs make them more favorable for potential diagnostic and prognostic markers (or treatment targets); however, they are more complex and difficult to study than protein coding genes. lncRNAs do not code proteins but play regulatory roles. This regulatory function is often executed through sequence match to its targets, which can be blocked as a treatment target. lncRNAs are usually shorter than protein coding genes with mostly two to three exons [Citation2]. Although most lncRNAs are polyadenylated so that they can be profiled through the common polyA enrichment-based RNA sequencing methods, some without the polyadenylation would need non-polyA methods to be measured. Unlike mRNAs, which are transported into cytoplasm for protein translation, most lncRNAs are localized to the nucleus for their actions [Citation2]. RNA extraction methods or RNAs isolated from which compartment of cells (cytoplasm vs nucleus) can make a significant difference in lncRNAs that are quantified and studied [Citation5]. Finally, lncRNA expressions seem more tissue specific at lower expression levels [Citation2], which makes them excellent differential diagnostic markers. However, many at very low expression are difficult to distinguish from detecting noise in gene expression microarray or need higher depth sequencing and may not be a good candidate for clinical testing.

According to their genomic position relative to adjacent protein-coding genes, lncRNAs are classified into: antisense RNAs, long intergenic noncoding RNAs (lincRNAs), sense overlapping intronic, sense nonoverlapping intronic and processed transcripts without ORF [Citation2]. Except lincRNAs that do not overlap with protein-coding genes, others are intertwined with protein-coding genes and are difficult to distinguish from the overlapping genes without strand-specific RNA sequencing. In this case, lncRNA studies mostly focus on lincRNAs. The latest ENSEMBL (Version 80) and GENCODE (Version 22) annotations have 14,863 and 15,900 lncRNAs defined; however, vast majority are still putative without a standard name approved by the HUGO Gene Nomenclature Committee (only 191 lncRNAs have). With defined guidelines [Citation6,Citation7] and more functional annotations, this will be expected to improve.

The genome-wise search for cancer lncRNA biomarkers is mainly carried out through gene-expression microarray, either lncRNA-specific array [Citation8] or remapping of existing microarrays to limited noncoding genes [Citation9,Citation10], and RNA sequencing. While the former profiles known lncRNAs, the latter identifies many novel ones. The largest study so far mined over 7000 RNA sequencing libraries consisting of various tissue types and cancers and discovered over 7000 lineage- or cancer-associated lncRNAs, with PCA3 and SChLAP1, for example, as specific markers for prostate cancer [Citation4]. Furthermore, SChLAP1 was demonstrated as a consistent predictor for metastasis or recurrence [Citation11]. The detection of this lncRNA in urine can potentially make it a noninvasive biomarker for clinical practice. The promising findings in other cancer types have also been reported. Analysis of 567 lung adeno- and squamous-cell carcinomas identified a 100 of novel lincRNAs that were differentially regulated between tumors and normal lung tissues and ‘LCAL1’ was found to be associated with tumor growth with prognostic value [Citation12]. In profiling lncRNAs in over 200 patients with acute myeloid leukemia, lncRNAs were found to be associated with gene mutation status and to predict treatment response and patient survival [Citation13]. A six-lncRNA panel in glioblastoma multiforme was reported to provide an independent survival prediction benefit in addition to age and MGMT promoter methylation status [Citation10]. Several studies in colorectal cancer through lncRNA-specific microarray or mining the existing data for a small number of lncRNAs present on the array platforms found that a subset of lncRNAs may not only classify colorectal cancer subtypes with clinical relevance [Citation14] but also have an improved prediction value for survival [Citation9].

The impressive advances within a short time period are a big step forward for potential lncRNA biomarkers. However, we need to realize the significant challenges in this evolving and fast growing field. The tissue-specific expression and incomplete annotation of lncRNAs often justify novel prediction in a study. The process involves many complex bioinformatics steps and there is no standard way to discover and nominate novel lncRNAs. The transcriptome assembly is required and various programs like Cufflinks, Scripture or StringTie can be used but their performances can vary [Citation15]. The defining feature of lncRNAs is lack of protein coding capability and the current prediction is mostly through machine learning or sequence feature examination using programs such as Coding Potential Assessment Tool [Citation16], iSeeRNA [Citation17], Coding Potential Calculator [Citation18] or checking the presence of a known Pfam domain. Even for well-characterized protein-coding RNAs and lncRNAs, there is a small percentage (up to 5%) of transcripts which often gets misclassified [Citation16,Citation17]. Whether to take lncRNA expression level or filter out low confidence or false mapping into consideration in nomination of a novel lncRNA also makes differences [Citation19]. Furthermore, most RNA sequencing data are nonstrand specific and polyA selected. lncRNAs that overlap with protein coding are not easily investigated; lncRNAs that are not polyadenylated need to be captured by total RNA rRNA removal protocols. RNA extraction by the TRIzol method tends to capture more nuclear unprocessed transcripts and lncRNAs than the Qiagen RNeasy Mini Kit method [Citation5]. These variations of sequencing and lncRNA prediction algorithms need to be factored in comparing and judging the results of significance across different studies. Independent validation of results becomes increasingly important in such a heterogeneous environment.

The functions of most lncRNAs are largely unknown and result interpretation from lncRNA studies is very challenging. lncRNAs may interact with mRNAs and modulate their translation either in a positive or negative way [Citation20]. lncRNAs appear more positively correlated with the expression of antisense protein coding genes [Citation2]. LincRNAs may act on their nearby protein-coding genes or remote genes. Such interactions may be predicted digitally [Citation21,Citation22] or from experimental data of binding or correlation between lncRNAs and proteins [Citation23]. When interaction relationships lack, a common approach is to conduct correlative analysis with protein-coding genes genome-wide and then use the correlated protein-coding genes for pathway or network-enrichment analysis where canonical pathways or protein–protein interaction networks are well known. A caveat is with so many combinations between lncRNAs and protein-coding genes, it is inevitable to incorporate random correlations and true interactions need to be further validated.

A good alternative biomarker needs to pass several tests: it demonstrates a better performance than existing ones; it can be easily tested clinically; and it can be reliably tested. lncRNAs are essentially no different from protein-coding mRNAs as clinical biomarkers. Experience from gene-expression microarray or the markers derived from the technology for the past decade should provide us good guidance to find more promising markers [Citation24,Citation25]. There is no surprise that lncRNA expression patterns in tumors are distinct from their normal counterpart tissues as tumors change gene-expression programs dramatically. The fact that the patterns may define molecular subtypes of tumors is commonly seen in the mRNA expression. The harder questions are whether the new classification is better than protein-coding expression patterns or existing classification methods. mRNAs and lncRNAs are likely correlated, which may be further correlated with histologic features or subtypes. Evaluation of molecular signatures needs to take those known prognostic factors into consideration so that the molecular predictors can provide an added clinical value [Citation26,Citation27]. Not many RNA-based biomarkers or panels have moved to clinical practice. Oncotype DX®, PAM50 and MammaPrint are the most promising gene signatures to predict breast cancer recurrence and guide adjuvant chemotherapy selection [Citation28]. However, their clear clinical benefits are still being evaluated through large clinical trials.

The financial constraint limits most studies to a small scale that lacks a sufficient power for an outcome association study. Incomplete clinical phenotypic information for some large public data repositories may hinder the identification of a good prognostic marker or treatment target. Rigorous evaluation of such markers is needed before these candidates can be moved to further development as clinical markers. With more attention and resources invested, promising findings from lncRNA research are only a matter of time.

Financial & competing interests disclosure

This work is partially supported by the independent research funds from the Mayo Clinic Center for Individualized Medicine. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

Additional information

Funding

This work is partially supported by the independent research funds from the Mayo Clinic Center for Individualized Medicine. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. No writing assistance was utilized in the production of this manuscript.

References

  • Jia H , OsakM , BoguGK , StantonLW , JohnsonR , LipovichL . Genome-wide computational identification and manual annotation of human long noncoding RNA genes . RNA16 ( 8 ), 1478 – 1487 ( 2010 ).
  • Derrien T , JohnsonR , BussottiGet al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression . Genome Res.22 ( 9 ), 1775 – 1789 ( 2012 ).
  • Djebali S , DavisCA , MerkelAet al. Landscape of transcription in human cells . Nature489 ( 7414 ), 101 – 108 ( 2012 ).
  • Iyer MK , NiknafsYS , MalikRet al. The landscape of long noncoding RNAs in the human transcriptome . Nat. Genet.47 ( 3 ), 199 – 208 ( 2015 ).
  • Sultan M , AmstislavskiyV , RischTet al. Influence of RNA extraction methods and library selection schemes on RNA-seq data . BMC Genomics15 , 675 ( 2014 ).
  • Wright MW . A short guide to long non-coding RNA gene nomenclature . Hum. Genomics8 , 7 ( 2014 ).
  • Mattick JS , RinnJL . Discovery and annotation of long noncoding RNAs . Nat. Struct. Mol. Biol.22 ( 1 ), 5 – 7 ( 2015 ).
  • Xue Y , MaG , GuDet al. Genome-wide analysis of long noncoding RNA signature in human colorectal cancer . Gene556 ( 2 ), 227 – 234 ( 2015 ).
  • Hu Y , ChenHY , YuCYet al. A long non-coding RNA signature to improve prognosis prediction of colorectal cancer . Oncotarget5 ( 8 ), 2230 – 2242 ( 2014 ).
  • Zhang XQ , SunS , LamKFet al. A long non-coding RNA signature in glioblastoma multiforme predicts survival . Neurobiol. Dis.58 , 123 – 131 ( 2013 ).
  • Prensner JR , ZhaoS , ErhoNet al. RNA biomarkers associated with metastatic progression in prostate cancer: a multi-institutional high-throughput analysis of SChLAP1 . Lancet Oncol.15 ( 13 ), 1469 – 1480 ( 2014 ).
  • White NM , CabanskiCR , Silva-FisherJM , DangHX , GovindanR , MaherCA . Transcriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer . Genome Biol.15 ( 8 ), 429 ( 2014 ).
  • Garzon R , VoliniaS , PapaioannouDet al. Expression and prognostic impact of lncRNAs in acute myeloid leukemia . Proc. Natl Acad. Sci. USA111 ( 52 ), 18679 – 18684 ( 2014 ).
  • Chen H , XuJ , HongJ , TangR , ZhangX , FangJY . Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance . Mol. Oncol.8 ( 8 ), 1393 – 1403 ( 2014 ).
  • Pertea M , PerteaGM , AntonescuCM , ChangTC , MendellJT , SalzbergSL . StringTie enables improved reconstruction of a transcriptome from RNA-seq reads . Nat. Biotechnol.33 ( 3 ), 290 – 295 ( 2015 ).
  • Wang L , ParkHJ , DasariS , WangS , KocherJP , LiW . CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model . Nucleic Acids Res.41 ( 6 ), e74 ( 2013 ).
  • Sun K , ChenX , JiangP , SongX , WangH , SunH . iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data . BMC Genomics14 ( Suppl. 2 ), S7 ( 2013 ).
  • Kong L , ZhangY , YeZQet al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine . Nucleic Acids Res.35 , W345 – W349 ( 2007 ).
  • Sun K , ZhaoY , WangH , SunH . Sebnif: an integrated bioinformatics pipeline for the identification of novel large intergenic noncoding RNAs (lincRNAs) – application in human skeletal muscle cells . PLoS ONE9 ( 1 ), e84500 ( 2014 ).
  • Gong C , MaquatLE . lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements . Nature470 ( 7333 ), 284 – 288 ( 2011 ).
  • Lu Q , RenS , LuMet al. Computational prediction of associations between long non-coding RNAs and proteins . BMC Genomics14 , 651 ( 2013 ).
  • Bellucci M , AgostiniF , MasinM , TartagliaGG . Predicting protein associations with long noncoding RNAs . Nat. Methods8 ( 6 ), 444 – 445 ( 2011 ).
  • Li JH , LiuS , ZhengLLet al. Discovery of protein-lncRNA interactions by integrating large-scale CLIP-Seq and RNA-Seq datasets . Front. Bioeng. Biotechnol.2 , 88 ( 2014 ).
  • Simon R . Analysis of DNA microarray expression data . Best Pract. Res. Clin. Haematol.22 ( 2 ), 271 – 282 ( 2009 ).
  • Cheng J , GreshockJ , ShiL , ZhengS , MeniusA , LeeK . Good practice guidelines for biomarker discovery from array data: a case study for breast cancer prognosis . BMC Syst. Biol.7 ( Suppl. 4 ), S2 ( 2013 ).
  • Sun Z , YangP . Gene expression profiling on lung cancer outcome prediction: present clinical value and future premise . Cancer Epidemiol. Biomarkers Prev.15 ( 11 ), 2063 – 2068 ( 2006 ).
  • Yang P , SunZ . Gene-expression profiling in lung cancer: still early days . Pharmacogenomics8 ( 2 ), 129 – 132 ( 2007 ).
  • Goncalves R , BoseR . Using multigene tests to select treatment for early-stage breast cancer . J. Natl Compr. Canc. Netw.11 ( 2 ), 174 – 182 ( 2013 ).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.