43
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Predicting locus-specific DNA methylation levels in cancer and paracancer tissues

, ORCID Icon, , , , & show all
Pages 549-570 | Received 03 Apr 2023, Accepted 20 Feb 2024, Published online: 13 Mar 2024
 

Abstract

Aim: To predict base-resolution DNA methylation in cancerous and paracancerous tissues. Material & methods: We collected six cancer DNA methylation datasets from The Cancer Genome Atlas and five cancer datasets from Gene Expression Omnibus and established machine learning models using paired cancerous and paracancerous tissues. Tenfold cross-validation and independent validation were performed to demonstrate the effectiveness of the proposed method. Results: The developed cross-tissue prediction models can substantially increase the accuracy at more than 68% of CpG sites and contribute to enhancing the statistical power of differential methylation analyses. An XGBoost model leveraging multiple correlating CpGs may elevate the prediction accuracy. Conclusion: This study provides a powerful tool for DNA methylation analysis and has the potential to gain new insights into cancer research from epigenetics.

Summary points
  • The authors employed machine learning models to predict genome-wide DNA methylation (DNAm) levels in cancerous tissues (CTs) and paracancerous tissues (PTs) when one of them is difficult to obtain.

  • The proposed model based on a single CpG site achieves an improvement of mean absolute error at more than 68% of CpGs.

  • A multiple-CpG-based XGBoost model can further improve the predictive performance when there is considerable variability between individuals.

  • The detected CpG sites in differential methylation analysis are statistically more significant by combining the measured and predicted PTs to enlarge the sample size.

  • When using CTs as predictors instead of PTs, the prediction models have better performance.

  • The aggressiveness of cancers and patient outcome may be predictable using well-predicted DNAm profiles in CT/PT.

  • Functional enrichment analysis based on highly correlated CpG sites identified important pathways involved in cancer progression.

  • The cross-tumor DNAm prediction model has the potential to be applied to an external cancer dataset for a subset of probes with high correlation in both cancers.

Author contributions

Conceptualization: B Ma, S Liu, F Song and S Zhang; methodology: B Ma and S Zhang; investigation: B Ma, F Song, S Zhang and Y Liu; visualization: S Zhang; supervision: B Ma, S Liu and F Song; writing – original draft: S Zhang; writing – review and editing: B Ma, S Liu, F Song, S Zhang, Y Liu, Y Shen and D Li. All authors read and approved the final manuscript.

Financial disclosure

This work was supported by the Chinese National Key Research and Development Project (no. 2021YFC2500400) and the National Natural Science Foundation of China (no. 61471078). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Competing interests disclosure

The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending or royalties.

Writing disclosure

No writing assistance was utilized in the production of this manuscript.

Data sharing statement

The source code and demo data have been deposited at: https://github.com/lab319/DNAm_prediction_CT_PT.

Additional information

Funding

This work was supported by the Chinese National Key Research and Development Project (no. 2021YFC2500400) and the National Natural Science Foundation of China (no. 61471078). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.