69
Views
9
CrossRef citations to date
0
Altmetric
Original Research

A ten-long non-coding RNA signature for predicting prognosis of patients with cervical cancer

, , , , , & show all
Pages 6317-6326 | Published online: 28 Sep 2018

Abstract

Purpose

The aim of the present study was to construct a novel long non-coding RNA (lncRNA) signature to predict the prognosis of patients with cervical cancer (CC).

Materials and methods

We downloaded lncRNA expression profiles and clinical characteristics from The Cancer Genome Atlas database and randomly divided them into a training dataset (n=200) and a testing dataset (n=87). Using a Cox-based iterative sure independence screening procedure combined with a resampling technique, a lncRNA signature was calculated from prognostic lncRNAs in the training dataset and was independently verified in the testing and the entire datasets. In addition, multivariate Cox regression and further stratified analyses were performed, taking into consideration the lncRNA signature as well as other clinical characteristics. Finally, we predicted the underlying functional effects of the prognostic lncRNAs by using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses.

Results

We constructed a promising ten-lncRNA signature that was significantly associated with the prognosis of CC on the basis of a risk score formula. The risk score was used to classify patients into high-risk and low-risk groups with different overall survival in the training dataset, and was confirmed in the testing and entire datasets. Compared with the clinical factors, the ten-lncRNA signature was found to be an independent prognostic indicator and displayed robust prognostic performance. A functional analysis indicated that these ten lncRNAs were enriched in immune response, cell adhesion molecules and nuclear factor kappa B signaling.

Conclusion

Our results demonstrated that this ten-lncRNA signature may serve as a prognostic biomarker for patients with CC.

Introduction

Cervical cancer (CC) remains the fourth leading cause of cancer-related mortality among women worldwide.Citation1 Approximately 527,600 cases of CC were diagnosed, resulting in over 265,700 deaths in 2012 globally.Citation2 Despite the fact that the 5-year overall survival (OS) rate for patients with early-stage CC iŝ80%, the 5-year OS for stages IIIA, IIIB and IVA disease is 40%, 42% and 22%, respectively.Citation3 Postoperative adjuvant chemo-radiotherapy may improve local control, reduce distant metastasis and prolong OS in high-risk patients.Citation4 However, adjuvant chemoradiotherapy may also cause side effects that adversely affect the patients’ quality of life. Hence, prognostic markers able to predict survival in patients with CC may prove valuable for individualized treatment.

Long non-coding RNAs (lncRNAs) are a type of non-coding transcript with a length of more than 200 nucleotides.Citation5 Accumulating evidence indicates that lncRNAs greatly affect gene expression through chromatin modification, transcriptional and post-transcriptional regulation.Citation6,Citation7 Aberrant expression of lncRNAs has been extensively demonstrated in several types of cancer.Citation8,Citation9 LncRNAs have been attracting attention over the past decade, and have prompted a series of studies due to their regulation of multiple CC-related cellular processes, including proliferation, invasion, apoptosis, metastasis and radioresistance.Citation10Citation14 Several researchers have focused on detecting prognostic lncRNAs in CC,Citation15Citation17 and a number of studies have described several prognostic lncRNAs in CC, such as GAS5,Citation18 PANDAR,Citation19 TUG1,Citation20 MEG3Citation21 and MALAT1.Citation22 However, the comprehensive strength of a potential lncRNA signature for predicting the prognosis of CC has not been clearly determined. Although a 15-lncRNA signature has been developed to predict the prognosis of patient with CC, that study was only focused on cervical squamous cell carcinoma without including cervical adenocarcinoma, which accounts for ~25% of CC cases.Citation23

In the present study, we screened out the prognostic lncRNAs through investigating lncRNA expression profiles in 287 CC patients from The Cancer Genome Atlas database (TCGA),Citation33 and constructed a ten-lncRNA signature to effectively predict survival in CC.

Materials and methods

Patient datasets

Clinical information of 287 patients with CC were obtained from TCGA on December 10, 2017. The TCGA CC patients were randomly classified into a 200-sample training dataset and an 87-sample testing dataset. The training samples were used to identify lncRNAs whose expression levels are significantly associated with patients’ survival and to construct a prognostic signature (risk score), while the testing samples were used to verify the efficiency of the constructed signature. The detailed clinical information for CC is listed in .

Table 1 Clinical information of patients with cervical cancer

LncRNA profile mining

The lncRNAs extracted from TCGA and GENCODE were cross-referenced with Ensembl IDs in order to refine the number of lncRNAs. We then normalized the lncRNA expression profiles by log2 transformation. Finally, the expression profiles of 7,923 lncRNAs were obtained.

Generation of a prognostic lncRNA signature

Cox’s proportional hazards model, which is commonly employed in survival analysis, was used to model the dependence of survival time on lncRNA expression. Since the number of lncRNAs (7,923) was notably higher than the sample size (200), we adopted the iterative sure independence screening (ISIS) procedure for Cox’s modelCitation24 to detect the most significant lncRNAs. ISIS starts with ranking covariates by the absolute value of their marginal correlation with the response variable and selecting the top ranked covariates, and then it adjusts the selected covariates according to the regression residual iteratively. This is a very efficient variable selection method in an ultra-high dimensional scenario.Citation24 In this study, the ISIS procedure was implemented using R package “SIS”.Citation25

It needed to be emphasized that only the training dataset was used to screen the survival-associated lncRNAs and to construct the expression-based lncRNA signature. Thus, 200 samples in the entire dataset would be distributed to the training dataset and 87 samples would constitute the hold-out sample which would affect the results of lncRNA selection. How to reduce the effect of this random ‘hold-out’ on the final lncRNA selection is challenging and has been rather overlooked. In the present study, we adopted a sample partitioning strategy inspired by the Jackknife method,Citation26 which was proposed to estimate the properties of an estimator derived from a full sample by systematic partitions of the dataset, whereas what was required here was to find a relatively robust result. Based on similar considerations, we repeatedly (100 times) conducted the Cox-based ISIS procedure described previously and only used a random subset (n=150) of the training data in each repeat. After 100 repeats, we obtained 100 groups of significant lncRNAs. The lncRNAs that appeared in at least one of the groups were listed and then sorted by their frequencies of appearance in the 100 groups. By setting the minimum frequency (count of appearance in the 100 groups ≥3), and the maximum P-value (P<0.1) in Cox’s univariate regression, some candidate prognostic lncRNAs were selected. Then, with the entire training dataset (n=200), the association of the expression level of these lncRNAs with patient survival was further analyzed through a stepwise multivariate Cox regression. Subsequently, prognostic lncRNAs associated with patient survival were obtained.

Finally, we constructed a prognostic lncRNA signature (risk score) by a linear combination of the expression levels of the prognostic lncRNAs with the multivariate Cox regression coefficients estimated previously as the weights:

Riskscore=i1Nexpri×coefi
where N stands for the number of selected prognostic lncRNAs, expri stands for the expression level of the ith prognostic lncRNA in each patient, and coefi is the corresponding regression coefficient estimated by the multivariate Cox regression using training data. With these fixed coefi (i=1, …, N), risk scores could be calculated for all patients. From the form of the Cox proportional hazard model, it is readily observed that patients with a higher risk score are more likely to have poor rates of survival. Using the median risk score in the training dataset as a threshold, the patients were classified into low-risk and high-risk groups. A flow chart was depicted to show the framework of this part ().

Figure 1 Flow chart of the prognostic lncRNA signature generation.

Notes: N stands for the number of selected prognostic lncRNAs, expri stands for the expression level of the ith prognostic lncRNA in each patient, and coefi is the corre sponding regression coefficient estimated by the multivariate Cox regression using training data. With these fixed coefi (i=1, …, N), risk scores could be calculated for all patients.
Abbreviations: ISIS, iterative sure independence screening; lncRNAs, long non-coding RNAs.
Figure 1 Flow chart of the prognostic lncRNA signature generation.

Statistical analysis

Kaplan-Meier survival curves and two-sided log-rank tests were employed to compare the survival differences between the high-risk and low-risk groups by using the R package ‘survival’. To further investigate whether the lncRNA signature predicts the OS of CC independently of other clinical factors, multivariate Cox regression and stratified analysis were performed. HRs and 95% CIs were computed. The receiver operating characteristic (ROC) curve analysis was performed to assess the predictive power of the lncRNA signature using the R package ‘survivalROC’.Citation27 The statistical analyses were conducted using R packages(version 3.4.3).

Functional enrichment analysis

The Spearman’s rank correlation coefficient between the expression value of each prognostic lncRNA and that of protein-coding genes was computed to infer the potential functional characteristics of the prognostic lncRNAs. Functional enrichment analysis was conducted using DAVID Bioinformatics Resources (version 6.8).Citation28 Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were set on the cut-off criteria of P<0.05 and enrichment score >1.0.

Results

Identification of prognostic lncRNAs associated with OS of patients with CC in the training dataset

The 287 samples were randomly classified into a training dataset (n=200) and a testing dataset (n=87) (). In the training dataset, we used a Cox-based ISIS procedure as previously described in “Materials and methods” to identify the most significant lncRNAs associated with the OS of CC patients.Citation24 By setting the minimum appearing frequency (count ≥3) in the 100 selected groups, and the maximum P-value (P<0.1) in Cox’s univariate regression, a total of 23 lncR-NAs, including three up-regulated (coefficient >0) and 20 down-regulated (coefficient <0) lncRNAs, were identified as candidates. After the stepwise multivariate Cox regression, ten lncRNAs were selected; among those, eight lncRNAs were down-regulated, indicating that their high expression was correlated with better survival, while the remaining two lncRNAs were up-regulated, suggesting that their high expression was correlated with poor survival. The univariate Cox-regression results (P-value and HR) and multivariate Cox-regression results (coefficient) of these ten prognostic lncRNAs using the whole training set are listed in .

Table 2 Detailed information of ten prognostic lncRNAs significantly associated with overall survival in patients with cervical cancer

Derivation of a ten-lncRNA signature for predicting survival from the training dataset

We constructed a novel lncRNA signature for survival prediction by using the expression value of each lncRNA weighted by their estimated regression coefficients as follows: risk score = (−0.285× expression level of AC005906.2) + (−0.290× expression level of LINC01727) + (−0.255× expression level of AC108868.1) + (−0.547× expression level of AC020978.4) + (0.160× expression level of GNAS. AS1) + (0.570× expression level of AC012306.2) + (−0.351× expression level of AC024270.5) + (−0.410 × expression level of AC122694.1) + (−0.356× expression level of AC015819.2) + (−0.716× expression level of AL590068.1). Risk scores were calculated and ranked for the 200 samples in the training dataset. The median of the risk scores (0.447957) was used as a threshold to stratify the 200 training samples into a low-risk group (n=100) and a high-risk group (n=100). The Kaplan-Meier analysis revealed that survival of patients in the low-risk group was longer than the high-risk group (31.82 vs 17.62 months, respectively; P=1.71E-10; ). Moreover, the ROC curve analysis acquired an area under the curve (AUC) of 0.852 at 5 years, indicating their significant prognostic performance (). The distribution of the risk scores, survival status and expression profiles of the prognostic lncRNAs in the training dataset are presented in . It can be seen that patients with high-risk scores have poorer prognosis than patients with low-risk scores. The results of the univariate Cox regression analysis revealed that the ten-lncRNA risk score was significantly associated with patients’ OS in the training dataset (P=1.74E-08, HR =7.149, 95% CI =3.607–14.170; ).

Table 3 Univariate and multivariate Cox regression analyses in each dataset

Figure 2 Association between the ten-lncRNA expression signature and overall survival of patients with CC in the training dataset.

Notes: (A) Kaplan-Meier curve of the survival of the CC patients within high-risk and low-risk groups. (B) ROC analysis for the performance of the risk score in survival prediction in the training dataset. The AUC was calculated for the ROC curve. (C) The ten-lncRNA based risk score distribution, patients’ survival status and heatmap of the ten lncRNA expression profiles.
Abbreviations: AUC, area under the curve; cc, cervical cancer; lncRNA, long non-coding RNA; ROC, receiver operating characteristic.
Figure 2 Association between the ten-lncRNA expression signature and overall survival of patients with CC in the training dataset.

Further validation of the ten-lncRNA signature for survival prediction in the testing and the entire datasets

To validate our findings, risk scores were calculated for each of the 87 patients in the testing dataset, dividing them into a high-risk group (n=56) and a low-risk group (n=31), according to the previously mentioned risk score model and the threshold derived from the training dataset. In accordance with the findings in the training dataset, patients with high-risk scores had markedly worse OS compared with those with low-risk scores (P=1.06E-02, median 20.63 vs 38.97 months, respectively; ). As shown in , the ROC curve analysis yielded an AUC of 0.743 at 5 years in the testing dataset. The HR of the high-risk vs low-risk group for OS was 3.820 (P=1.68E-02, 95% CI =1.273–11.460; ), suggesting that the association of the ten-lncRNA risk score and OS was also significant. Similar findings were observed in the entire dataset, which consisted of 156 high-risk patients with a median OS of 18.5 months and 131 low-risk patients with a median OS of 32.87 months (P=1.13E-10; ). The ROC analysis for the ten-lncRNA achieved an AUC of 0.837 (). As shown in , the ten-lncRNA signature was found to be significantly correlated with patients’ survival in the entire dataset (P=5.54E-09, HR =5.448, 95% CI =3.081–9.633).

Figure 3 The ten-lncRNA related risk score predicts overall survival of patients with CC in the testing dataset and the entire dataset.

Notes: (A) Kaplan-Meier curve of the survival of the CC patients using the ten-lncRNA signature in the testing dataset (n=87). (B) Kaplan-Meier curve of the survival of the CC patients using the ten-lncRNA-signature in the entire dataset (n=287). (C) The ROC analysis for the performance of the risk score in survival prediction in the testing dataset. The AUC was calculated for ROC curves. (D) The ROC analysis for the performance of the risk score in survival prediction in the entire dataset. The AUC was calculated for ROC curves.
Abbreviations: AUC, area under the curve; CC, cervical cancer; lncRNA, long non-coding RNA; ROC, receiver operating characteristic.
Figure 3 The ten-lncRNA related risk score predicts overall survival of patients with CC in the testing dataset and the entire dataset.

Independence of the prognostic power of the ten-lncRNA signature from other clinical factors

As shown in , the results demonstrated that the risk score obtained from the ten-lncRNA signature maintained a significant association with OS, with the other three clinical factors serving as covariates in each dataset. However, we also found that stage was significantly associated with OS in all three datasets. Therefore, stratification analysis was required to determine the prognostic power of the ten-lncRNA signature for the CC stage; the entire dataset was stratified into an early-stage group (I and II, n=223) and a late-stage group (III and IV, n=64). Moreover, the ten-lncRNA signature subdivided CC patients into high-risk and low-risk subgroups in each stage. This analysis demonstrated that patients in the high-risk subgroups had significantly shorter OS than those in the low-risk subgroup for both early-stage (P=1.31E-09) and late-stage disease (P=1.25E-03) ().

Figure 4 Survival prediction, stratified by stage, of the lncRNA signature in patients.

Note: Kaplan-Meier estimators of the overall survival for patients in two groups with early stage (A) and late stage (B) cervical cancer.
Figure 4 Survival prediction, stratified by stage, of the lncRNA signature in patients.

Functional roles of the ten prognostic lncRNAs

In order to understand the functional implication of the ten prognostic lncRNAs in the development of CC, we carried out a functional enrichment analysis to elucidate their roles. We identified 347 protein-coding genes that were significantly correlated with at least one of the ten prognostic lncRNAs (Spearman |R| >0.4). GO and KEGG pathway enrichment analyses were performed with these genes to identify their associated KEGG pathways and GO annotations. GO analysis consisted of three domains, including biological process, molecular function and cellular component. It was demonstrated that these genes were mainly associated with immune response in biological process, receptor activity and binding in molecular function, and plasma membrane in cellular component (). These genes were also significantly enriched for cell adhesion molecules (CAMs) and nuclear factor kappa B (NF-κB) signaling ().

Figure 5 Gene Ontology and pathway analysis of protein-coding genes correlated with prognostic lncRNAs in the signature.

Notes: (AC) Top 10 Gene Ontology terms in three domains of the protein-coding genes. The pie plot presents the number of genes in each term. (D) Top 20 pathways of protein-coding genes. Risk factors = enrichment levels. The magnitude of the pots = numbers of genes. The color classification = P-value.
Abbreviation: lncRNAs, long non-coding RNA.
Figure 5 Gene Ontology and pathway analysis of protein-coding genes correlated with prognostic lncRNAs in the signature.

Discussion

In this study, we identified ten lncRNAs that were significantly associated with OS in CC patients. The ten-lncRNA risk score signature demonstrated superior ability to divide CC patients into high-risk and low-risk groups with significantly different OS in each dataset. Further studies indicated that the ten-lncRNA signature is an independent predictor of OS with other clinical factors, including age, stage and histology, taken into account simultaneously. Therefore, we demonstrated that this ten-lncRNA signature is a promising prognostic biomarker in the progression of CC.

To the best of our knowledge, none of the ten prognostic lncRNAs have been reported in literature to date. Therefore, we screened out protein-coding genes that are intensively correlated with the ten lncRNAs (Spearman |R| >0.4) in the entire database. We conducted an integrated analysis to predict the potential biological roles of the ten lncRNAs through those correlating genes in CC. The results demonstrated that the lncRNA may exert their effects through several known GO annotations and KEGG pathways. The biological processes of the genes were mainly associated with immune suppression. Infection by human papillomavirus (HPV), a causative factor of CC, induces a cellular immune response with regulatory T cells and maintains local immune suppression in HPV-associated CC.Citation29 These genes were also significantly enriched in CAMs and the NF-κB signaling pathway. Recent studies have found that CAMs were associated with adhesion or signaling status of tumor cells, promoting acquisition of a more invasive phenotype. It is reported that L1CAM may be a helpful prognostic marker to predict locoregional recurrences in CC.Citation30 The NF-κB signaling pathway has been found to play a critical role in the pathogenesis and progression of CC.Citation31 It was demonstrated that MAFIP may also act as a suppressor in CC by restraining the activation of the NF-κB pathway.Citation32

There were several limitations to this study. First, metastasis was not included, as this information was not available for predicting survival. Second, data from TCGA were built on the RNA-Seq technique, and more experimental methods were needed to confirm the results. Third, the exact mechanisms of action of the ten lncRNAs in CC remain to be fully elucidated; further experiments should be designed to explore these mechanisms in future.

Conclusion

In the present study we identified a ten-lncRNA signature that may prove to be a critical prognostic tool for patients with CC. These lncRNAs modulate genes associated with immune response, CAMs and NF-κB signaling, which has previously been associated with CC tumorigenesis. Ultimately, we expect this lncRNA signature to be helpful for predicting prognosis and uncovering the mechanisms underlying CC development.

Acknowledgments

This study was supported in part by the Natural Science Foundation of Shandong Province (grant nos 2R2013HM060 and ZR2015AL014) and the Fundamental Research Funds for the Central Universities (grant no 15CX02064A).

Disclosure

The authors report no conflicts of interest in this work.

References

  • SiegelRLMillerKDJemalACancer statistics, 2018CA Cancer J Clin201868173029313949
  • TorreLABrayFSiegelRLFerlayJLortet-TieulentJJemalAGlobal cancer statistics, 2012CA Cancer J Clin20156528710825651787
  • QuinnMABenedetJLOdicinoFCarcinoma of the cervix uteri. FIGO 26th Annual Report on the Results of Treatment in Gynecological CancerInt J Gynaecol Obstet200695Suppl 1S43S103
  • ChenKGeJJYanSXKeSBMicroarray gene expression profiling for identifying different responses to radiotherapy and chemoradiotherapy in patients with cervical cancerEur J Gynaecol Oncol201738110611229767875
  • KongDWangYKnockdown of lncRNA HULC inhibits proliferation, migration, invasion, and promotes apoptosis by sponging miR-122 in osteosarcomaJ Cell Biochem201811911050106128688193
  • LiangWCRenJLWongCWLncRNA-NEF antagonized epithelial to mesenchymal transition and cancer metastasis via cis-regulating FOXA2 and inactivating Wnt/β-catenin signalingOncogene201837111445145629311643
  • ZhangYLunLLiHThe value of lncRNA NEAT1 as a prognostic factor for survival of cancer outcome: a meta-analysisSci Rep2017711308029026116
  • BianZJinLZhangJLncRNA-UCA1 enhances cell proliferation and 5-fluorouracil resistance in colorectal cancer by inhibiting miR-204-5pSci Rep201662389227046651
  • GoodingAJZhangBJahanbaniFKThe lncRNA BORG drives breast cancer metastasis and disease recurrenceSci Rep2017711269828983112
  • YanQTianYHaoFDownregulation of lncRNA UCA1 inhibits proliferation and invasion of cervical cancer cells through miR-206 expressionOncol Res Epub20183928409548
  • ShanDShangYHuTLong noncoding RNA BLACAT1 promotes cell proliferation and invasion in human cervical cancerOncol Lett20181533490349529456724
  • ZhangMSongYZhaiFARFHPV E7 oncogene, lncRNA HOTAIR, miR-331-3p and its target, NRP2, form a negative feedback loop to regulate the apoptosis in the tumorigenesis in HPV positive cervical cancerJ Cell Biochem201811964397440729130509
  • FengLi QChaoYXHOTAIR contributes to cell proliferation and metastasis of cervical cancer via targetting miR-23b/MAPK1 axisBiosci Rep2018381BSR2017156329335299
  • HanDWangJChengGLncRNA NEAT1 enhances the radio- resistance of cervical cancer via miR-193b-3p/CCND1 axisOncotarget2018922395240929416780
  • LiYWanYPBaiYCorrelation between long strand non-coding RNA GASS expression and prognosis of cervical cancer patientsEur Rev Med Pharmacol Sci201822494394929509242
  • WangLZhuHLong non-coding nuclear paraspeckle assembly transcript 1 acts as prognosis biomarker and increases cell growth and invasion in cervical cancer by sequestering microRNA-101Mol Med Rep20181722771277729207151
  • YangJPYangXJXiaoLWangYLong noncoding RNA PVT1 as a novel serum biomarker for detection of cervical cancerEur Rev Med Pharmacol Sci201620193980398627775803
  • CaoSLiuWLiFZhaoWQinCDecreased expression of lncRNA GAS5 predicts a poor prognosis in cervical cancerInt J Clin Exp Pathol20147106776678325400758
  • HuangHWXieHMaXZhaoFGaoYUpregulation of LncRNA PANDAR predicts poor prognosis and promotes cell proliferation in cervical cancerEur Rev Med Pharmacol Sci201721204529453529131264
  • HuYSunXMaoCUpregulation of long noncoding RNA TUG1 promotes cervical cancer cell proliferation and migrationCancer Med20176247148228088836
  • ZhangJLinZGaoYYaoTDownregulation of long noncoding RNA MEG3 is associated with poor prognosis and promoter hypermethylation in cervical cancerJ Exp Clin Cancer Res2017361528057015
  • YangLBaiHSDengYFanLHigh MALAT1 expression predicts a poor prognosis of cervical cancer and promotes cancer cell growth and invasionEur Rev Med Pharmacol Sci201519173187319326400521
  • MaoXQinXLiLA 15-long non-coding RNA signature to improve prognosis prediction of cervical squamous cell carcinomaGynecol Oncol2018149118118729525275
  • FanJFengYWuYHigh-dimensional variable selection for Cox’s proportional hazards modelStatistics201067086
  • SaldanaDFFengYSIS: An R package for sure independence screening in ultrahigh-dimensional statistical modelsJ Stat Softw2018832
  • GentleJEElements of computational statistics Publications of the American Statistical AssociationNYSpringer-Verlag New York, Inc2002
  • HeagertyPJLumleyTPepeMSTime-dependent ROC curves for censored survival data and a diagnostic markerBiometrics200056233734410877287
  • HuangDawShermanBTLempickiRASystematic and integrative analysis of large gene lists using DAVID bioinformatics resourcesNat Protoc200941445719131956
  • van HedeDLangersIDelvennePJacobsNOrigin and immunoescape of uterine cervical cancerPresse Med20144312 Pt 2e413e42125448124
  • SchrevelMCorverWEVegterMEL1 cell adhesion molecule (L1CAM) is a strong predictor for locoregional recurrences in cervical cancerOncotarget2017850875688758129152102
  • TilborghsSCorthoutsJVerhoevenYThe role of nuclear factor-kappa B signaling in human cervical cancerCrit Rev Oncol Hematol201712014115029198328
  • LiYYuYZhangYMAFIP is a tumor suppressor in cervical cancer that inhibits activation of the nuclear factor-kappa B pathwayCancer Sci2011102112043205021834855
  • National Cancer Institute [home page on the Internet]The Cancer Genome Atlas Available from: https://cancergenome.nih.gov/Accessed September 17, 2018