819
Views
1
CrossRef citations to date
0
Altmetric
Research Paper

CUE: CpG impUtation ensemble for DNA methylation levels across the human methylation450 (HM450) and EPIC (HM850) BeadChip platforms

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 851-861 | Received 20 May 2020, Accepted 09 Sep 2020, Published online: 04 Oct 2020
 

ABSTRACT

DNA methylation at CpG dinucleotides is one of the most extensively studied epigenetic marks. With technological advancements, geneticists can profile DNA methylation with multiple reliable approaches. However, profiling platforms can differ substantially in the CpGs they assess, consequently hindering integrated analysis across platforms. Here, we present CpG impUtation Ensemble (CUE), which leverages multiple classical statistical and modern machine learning methods, to impute from the Illumina HumanMethylation450 (HM450) BeadChip to the Illumina HumanMethylationEPIC (HM850) BeadChip. Data were analysed from two population cohorts with methylation measured both by HM450 and HM850: the Extremely Low Gestational Age Newborns (ELGAN) study (n = 127, placenta) and the VA Boston Posttraumatic Stress Disorder (PTSD) genetics repository (n = 144, whole blood). Cross-validation results show that CUE achieves the lowest predicted root-mean-square error (RMSE) (0.026 in PTSD) and the highest accuracy (99.97% in PTSD) compared with five individual methods tested, including k-nearest-neighbours, logistic regression, penalized functional regression, random forest, and XGBoost. Finally, among all 339,033 HM850-only CpG sites shared between ELGAN and PTSD, CUE successfully imputed 289,604 (85.4%) sites, where success was defined as RMSE < 0.05 and accuracy >95% in PTSD. In summary, CUE is a valuable tool for imputing CpG methylation from the HM450 to HM850 platform.

Disclosure statement

No potential conflict of interest was reported by the authors.

Supplemental material

Supplemental data for this article can be accessed here.

Additional information

Funding

This work was supported by the National Institute of Health under grants including 5U01NS040069-05 (AL), 2R01NS040069-06A2 (KCK), UH3OD023348 (TMO and RCF), R01HD092374 (TMO and RCF), K23NR017898 (HS), T32HL129982 (LMR), 5R01HL129132-04 (YL) and from VA BLR&D Merit Award I01BX003477 (MWL).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.