Abstract
Purpose
Colorectal cancer (CRC) is one of the most common malignant tumors worldwide. This study aimed to explore the prognostic value of lncRNAs in CRC.
Material and methods
We performed gene expression profiling to identify differentially expressed lncRNAs between 51 normal and 646 tumor tissues from The Cancer Genome Atlas database. Cox regression and robust likelihood-based survival models were used to find prognosis-related lncRNAs. A lncRNA signature was developed to predict the overall survival of patients with CRC. In addition, a receiver operating characteristic curve analysis was performed to identify the optimal cutoff with the best Youden index to divide patients into different groups based on risk level.
Results
Eighty survival-related lncRNAs were identified and a 15-lncRNA signature was developed on the basis of a risk score to comprehensively predict the overall survival of patients with CRC. The prognostic value of the 15-lncRNA risk score was validated using the internal testing set and total set. The risk indicator was shown to be an independent prognostic factor (hazard ratio =2.92; 95% CI: 1.73–4.94; P<0.001). Notably, all 15 lncRNAs (AC024581.1, FOXD3-AS1, AC012531.1, AC003101.2, LINC01219, AC083967.1, AL590483.1, AC105118.1, AC010789.1, AC067930.5, AC105219.2, LINC01354, LINC02474, LINC02257, and AC079612.1) were newly found to correlate with the prognosis of patients with CRC. Furthermore, the function of 15 lncRNAs was explored through the ceRNA network. These lncRNAs regulated coding genes that were involved in many key cancer pathways.
Conclusion
A 15-lncRNA expression signature was discovered as a prognostic indicator for patients with CRC, which may act as competing endogenous RNA (ceRNAs) to play a crucial role in the modulation of cancer-related pathways. These findings may allow a better understanding of the prognostic value of lncRNAs.
Introduction
Colorectal cancer (CRC) is one of the most common malignant tumors of the gastrointestinal tract worldwide, as well as the fourth leading cause of cancer-related death owing to its prevalence and mortality.Citation1 Studies have shown that CRC is caused by several genetic factors, including changes in chromosomal copy number, aberrant gene methylation, and dysregulated gene expression.Citation2,Citation3 Considerable progress has been made in the diagnosis and treatment of CRC in the last several decades. However, the current prognostic factors for patients with CRC do not meet clinical needs, making it necessary to identify novel biomarkers in a sensitive and accurate way to better predict overall survival.
lncRNAs, usually >200 nucleotides in length, are a class of RNAs that do not code for proteins.Citation4 lncRNAs used to be considered “transcript junk,” but have recently emerged as key molecules in multiple complex biological processes (BP),Citation4,Citation5 including proliferation, cell cycle progression, and survival.Citation6 Several reports have shown that lncRNAs serve as modulators of carcinogenesis and affect the rates of invasion and metastasis in several types of cancer.Citation6 However, the biological function and prognostic value of many lncRNAs remain unknown. Interestingly, it has been shown that numerous lncRNAs can act as competing endogenous RNAs (ceRNAs) to regulate the expression of coding genesCitation7 that have common miRNA response elements (MREs). In this study, the predictive value of lncRNAs in patients with CRC was explored. Furthermore, the function of these lncRNAs was investigated using the ceRNA network.
Materials and methods
Data processing and computational analysis
shows the overall workflow of this study. The data of 697 RNA expression profiles (level 3), including 51 normal tissues and 646 tumor tissues, were downloaded from The Cancer Genome Atlas (TCGA) data portal (dated to September 18, 2017). This study met the publication guidelines provided by TCGA (http://cancergenome.nih.gov/publications/publicationguidelines). According to TCGA guidelines, RNA expression profiles can be studied in three forms: HT-seq raw read count, Fragments per Kilobase of transcript per Million mapped reads (FPKM), and FPKM-UQ (upper quartile normalization). Here, HT-seq raw read count was chosen. lncRNAs general feature format file (Gencode.v27) was used as the lncRNA annotation reference.Citation8 The expression profiles of lncRNAs were analyzed by edgeR.Citation9,Citation10 Differentially expressed lncRNAs were selected according to P-value (≤0.01) and absolute fold change (≥2).
Identification of lncRNAs related to patient prognosis
Samples were filtered by removing cases without complete survival data to yield 616 samples that were included in our analysis. All samples were randomly divided into either training set (308 samples) or validation set (308 samples) groups. The clinical and demographic characteristics of the study population are shown in . There was no statistical difference between the two sets. To determine the feasibility and reliability of survival-associated lncRNAs as prognostic markers in CRC, univariate Cox proportional hazards regression was applied to identify overall survival-related lncRNAs. The robust likelihood-based survival model, using the R package analysis method (Rbsurv), was then applied to further identify prognosis-related lncRNAs.Citation11 The protocol of this method was as follows: first, the model randomly put N(1 − p) samples into the training set and Np cases into the validation set. Here, we chose p=1/3. Second, the model added a gene to the training set and obtained the parameter for the gene. The loglik was evaluated for each parameter and validated within the internal validation samples. The procedure was repeated 1,000 times to select the best prognosis-related lncRNAs with the smallest mean negative loglik. Next, the Akaike information criterion (AIC) was computed and used as an estimator of the relative quality of statistical models for a given set of data, and the optimal model was chosen with the smallest AIC. P<0.05 was considered statistically significant.
Establishment and validation of the risk formula
lncRNAs chosen from the previous step were inserted into the multiply Cox proportional model to calculate the coefficients in the training set, thereby establishing the risk formula. Risk scores for each sample were calculated using this formula. All patients were classified into either the high-risk or the low-risk group on the basis of the median of their risk score. The Kaplan–Meier method and the log-rank test were applied to analyze the overall survival of the two groups using the R package survival analysis.Citation12,Citation13 A time-dependent receiver operating characteristic curve (ROC) was constructed to evaluate the prediction value of the model (version 1.0.3),Citation14 and the figures were plotted by ggplot2 (version 2.2.1)Citation15 and ggfortify (version 0.4.1).Citation16,Citation17 All data were processed and analyzed by perl 5 version 24, excel 2010, and R (version 3.4.1).
Determination of lncRNA function
The function of the lncRNAs was explored using the triple ceRNA (lncRNA–miRNA–mRNA) network. The sequences of the identified lncRNAs were obtained from EnsemblCitation18 and inputted into the miRDBCitation19,Citation20 database to predict their miRNA targets. The corresponding coding genes were then identified using miRDB,Citation19,Citation20 miRTarBase,Citation21 and TargetScan.Citation22 The triple ceRNA network was visualized and constructed by Cytoscape v3.5.1.Citation23 The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis coding genes were annotated by the R package of clusterProfiler.Citation24 The cutoff P-value was 0.05.
Results
Differential expression of lncRNAs
A total of 1,103 differentially expressed lncRNAs were identified in patients with CRC. These lncRNAs are listed in Table S1. Eighty lncRNAs that were associated with overall survival were identified through our univariate Cox regression analysis in the total set (Table S2).
Identification of a 15-lncRNA signature
The 20 lncRNAs with the lowest P-value were selected () and analyzed with the robust likelihood-based survival model. Fifteen lncRNAs were selected with the lowest AIC values. The risk coefficients for these lncRNAs were calculated using the multivariable Cox proportional hazards model. The risk formula used to calculate the risk score was as follows: (0.238*AC024581.1)+(0.053*FOXD3-AS1)+(0.067*AC0 12531.1)+(0.221*AC003101.2)+(0.357*LINC01219)+(0.0 82*AC083967.1)+(−0.113*AL590483.1)+(0.060*AC1051-18.1)+(0.031*AC010789.1)+(0.126*AC067930.5)+(0.161*AC105219.2)+(0.317*LINC01354)+(0.139*L INC02474)+(−0.131*LINC02257)+(−0.269*AC079612.1). Additionally, the risk scores were calculated for each patient in the training set. The patients were divided into two groups on the basis of the median of the risk scores (). shows the distribution of patient survival status and survival time. Survival, assessed with the Kaplan–Meier method and log-rank test, indicated that patients with a high-risk score had a shorter survival time (P<0.001) (). In our analysis, survival time was negatively correlated with risk score.
Validation of the prognostic value of the lncRNAs
To assess prognostic value, ROC was conducted for the 15-lncRNA signature (). For our analysis, the area under curve was 0.708. 2.027 was chosen as the best optimal cutoff, taking into account the maximal sensitivity and specificity of our survival prediction. Patients from the data sets (total set and validating set) were further divided into high-risk or low-risk groups. shows the Kaplan–Meier survival curves for the testing set and the total set, respectively, where the results were all consistent with our model.
Determination of lncRNA function
The 15 lncRNAs identified in our study were inputted into the miRDB database to predict their miRNA targets (yielding a total of 222 miRNAs), and the coding genes for these miRNAs were then predicted (yielding 1,179 genes). shows an overview of the triple ceRNA (lncRNA–miRNA–mRNA) network. The detailed interactions of the ceRNA network are shown in Table S3. The functional enrichment assay identified 691 GO terms in BP, 46 GO terms in cellular components, 81 GO terms in molecular function (Table S4), and 46 pathways (Table S5). It also showed that these genes are involved in multiple BP, such as regulation of cell morphogenesis, and Wnt-mediated cell signaling. The top ten GO results are shown in . The top 20 KEGG pathways are shown in . KEGG was enriched in several cancer-related pathways, including the p53 and Wnt signaling pathways. lncRNA AC012531.1 was not only related to the mTOR signal pathway by regulating hsa-mir-424-5p, and hsa-mir-16-5p, has-mir-410-3p, which targeted ATK3, SEH1L, and GSK3B, respectively, but also took part in the MAPK signal pathway. lncRNA LINC01354 participated in the TP53 signal pathway by hsa-mir-107 and hsa-mir-497-5 p, which regulated CDK6 and CCNE1, respectively. lncRNA LINC02257, indirectly regulating ROCK2 through hsa-mir-138-5p, played an important role in the Wnt signal pathway. lncRNA AC079612.1 interacted with hsa-mir-760 targeting PPIP5K1 to involve in the phosphatidylinositol signal. Furthermore, these four lncRNAs were also involved in other pathways. However, the rest of the lncRNAs in this study have not been found involved in pathways through interaction with miRNAs.
Discussion
Recently, much attention has been given to the clinical significance of lncRNAs, which account for the majority of transcriptional products in the cell.Citation25,Citation26 Many lncRNAs have tissue-specific expression patterns and play crucial roles in the progression of diseases,Citation27 such as gastric cancerCitation28 and breast cancer.Citation29
Those lncRNAs expressed in CRC were comprehensively analyzed, and 1,103 differentially expressed lncRNAs were identified. Then, 80 lncRNAs that were correlated with the overall survival of patients with CRC were selected using the univariate Cox regression model. The robust likelihood-based survival model was then applied, and the 20 lncRNAs with the lowest P-value selected to identify a 15-lncRNA signature that predicts the 5-year overall survival of patients with CRC. This model showed excellent performance and consistency throughout the training set, testing set, and total set. These results imply that the 15-lncRNA signature identified in our study may be used as a biomarker to predict patient prognosis in clinical practice. A literature search in PubMed and Google Scholar indicates this is the first time these 15 lncRNAs are reported to be correlated with CRC.
Previous studies have shown that there is signaling “crosstalk” between different transcriptional products.Citation30,Citation31 Many cancer-related phenotypes are driven by lncRNAs,Citation25 either directly or indirectly, by modulating the stability of various molecules, including DNA, proteins, and miRNAs. The hypothesis of ceRNA is that transcriptional products that share common MREs with target genes communicate with different genes through miRNAs.Citation7 Furthermore, any transcriptional product that has MREs can act as a ceRNA. These transcriptional products, which share common MREs, including lncRNAs, circular RNAs, and pseudogenes, regulate corresponding genes through miRNAs that function in RNA posttranscriptional silencing by binding the 3′-untranslated region to influence transcript stability. Thus, lncRNAs may act as ceRNAs to indirectly regulate coding genes through miRNAs. It is therefore necessary to explore the role of lncRNAs as ceRNAs.
In this study, a triple ceRNA (lncRNA–miRNA–mRNA) network was constructed. Bioinformatics analyses of this ceRNA network revealed that 15 lncRNAs may function as ceRNAs to regulate genes that participate in cancer-associated signaling, including p53 and Wnt signaling.Citation32 Furthermore, this ceRNA network may be involved in other types of cancer because KEGG analysis results of ceRNA showed this network was associated with many cancer-related pathways. For example, the TP53 signaling pathway participates in multiple tumor genesis.Citation33–Citation35
Taken together, these results suggest that 15 differentially expressed lncRNAs play an important role in oncogenesis and may be used as a prognostic biomarker in clinical practice. However, there were still some limits to our study. Our results are based on a bioinformatics analysis and were validated using in vitro or in vivo experimentation. In addition, as the binding affinity between miRNAs and their RNA targets is influenced by the matching between MRE and the seeds regions (as well as other factors), we could not adequately assess the exact function of each ceRNA. Future studies will assess the biological functions of these lncRNAs by measuring their effects on cell proliferation and apoptosis and will further evaluate these lncRNAs as prognostic biomarkers.
Conclusion
In summary, we identified 1,103 lncRNAs that were differentially expressed in CRC.
A 15-lncRNAs’ risk formula was developed that correlated with the overall survival of patients with CRC using a robust likelihood-based survival model, and the function of these newly identified survival-associated lncRNAs was explored. Our results justify further study of the transcriptional regulatory network of lncRNAs in CRC and provide a new resource to discover novel prognostic biomarkers.
Acknowledgments
We thank everyone who supported this study. We also thank Dr Yuan Tang for giving us important advice pertaining to this article.
Disclosure
The authors report no conflicts of interest in this work.
References
- BrennerHKloorMPoxCPColorectal cancerLancet201438399271490150224225001
- Cancer Genome Atlas NetworkComprehensive molecular characterization of human colon and rectal cancerNature2012487740733033722810696
- WoodLDParsonsDWJonesSThe genomic landscapes of human breast and colorectal cancersScience200731858531108111317932254
- BatistaPJChangHYLong noncoding RNAs: cellular address codes in development and diseaseCell201315261298130723498938
- BeermannJPiccoliMTViereckJThumTNon-coding RNAs in development and disease: background, mechanisms, and therapeutic approachesPhysiol Rev20169641297132527535639
- LiangWCFuWMWongCWThe lncRNA H19 promotes epithelial to mesenchymal transition by functioning as miRNA sponges in colorectal cancerOncotarget2015626225132252526068968
- SalmenaLPolisenoLTayYKatsLPandolfiPPA ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?Cell2011146335335821802130
- AkenBLAylingSBarrellDThe Ensembl gene annotation systemDatabase20162016baw09327337980
- RobinsonMDMccarthyDJSmythGKedgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics201026113914019910308
- MccarthyDJChenYSmythGKDifferential expression analysis of multifactor RNA-Seq experiments with respect to biological variationNucleic Acids Res201240104288429722287627
- ChoHJYuAKimSKangJRobust likelihood-based survival modeling with microarray dataJ Stat Software2008291116
- TherneauTMA Package for Survival Analysis in S. version 238201538 Available from: https://CRAN.R-project.org/package=survivalAccessed October 29, 2018
- TherneauTMGrambschPMModeling survival data: extending the Cox modelTechnometrics20004418586
- HeagertyPJLumleyTPepeMSTime-dependent ROC curves for censored survival data and a diagnostic markerBiometrics200056233734410877287
- WickhamHggplot2: Elegant Graphics for Data AnalysisNew York, NYSpringer Publishing Company Incorporated2009
- TangYHorikoshiMLiWGgfortify: Unified interface to visualize statistical results of popular r packages2016 Available from: https://journal.r-project.org/archive/2016/RJ-2016-060/RJ-2016-060.pdfAccessed October 16, 2018
- YuanTGgfortify: Data Visualization Tools for Statistical Analysis Results2017 Available from: https://rdrr.io/cran/ggfortify/Accessed October 16, 2018
- ZerbinoDRAchuthanPAkanniWEnsembl 2018Nucleic Acids Res201846D1D754D76129155950
- WangXImproving microRNA target prediction by modeling with unambiguously identified microRNA-target pairs from CLIP-ligation studiesBioinformatics20163291316132226743510
- WongNWangXmiRDB: an online resource for microRNA target prediction and functional annotationsNucleic Acids Res201543D1D146D15225378301
- ChouCHChangNWShresthaSmiRTarBase 2016: updates to the experimentally validated miRNA-target interactions databaseNucleic Acids Res201644D1D239D24726590260
- AgarwalVBellGWNamJWBartelDPPredicting effective microRNA target sites in mammalian mRNAsElife20154
- ShannonPMarkielAOzierOCytoscape: a software environment for integrated models of biomolecular interaction networksGenome Res200313112498250414597658
- YuGWangLGHanYHeQYQyHclusterProfiler: an R package for comparing biological themes among gene clustersOMICS201216528428722455463
- SchmittAMChangHYLong noncoding RNAs in cancer pathwaysCancer Cell201629445246327070700
- LuCYangMLuoFPrediction of lncRNA-disease associations based on inductive matrix completionBioinformatics201834193357336429718113
- WangZYangBZhangMlncRNA epigenetic landscape analysis identifies EPIC1 as an oncogenic lncRNA that interacts with MYC and promotes cell-cycle progression in cancerCancer Cell201833470672029622465
- SongYXSunJXZhaoJHNon-coding RNAs participate in the regulatory network of CLDN4 via ceRNA mediated miRNA evasionNat Commun20178128928819095
- BergerACKorkutAKanchiRSA comprehensive pan-cancer molecular study of gynecologic and breast cancersCancer Cell201833469070529622464
- LiMJZhangJLiangQExploring genetic associations with ceRNA regulation in the human genomeNucleic Acids Res201745105653566528472449
- KarrethFAPandolfiPPceRNA cross-talk in cancer: when ce-bling rivalries go awryCancer Discov20133101113112124072616
- CarethersJMJungBHGenetics and genetic biomarkers in sporadic colorectal cancerGastroenterology2015149511771190.e117326216840
- BykovVJNErikssonSEBianchiJWimanKGTargeting mutant p53 for efficient cancer therapyNat Rev Cancer20181828910229242642
- KastenhuberERLoweSWPutting p53 in contextCell201717061062107828886379
- Zucman-RossiJVillanuevaANaultJCLlovetJMGenetic landscape and biomarkers of hepatocellular carcinomaGastroenterology2015149512261239.e426099527