ABSTRACT
Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers that can predict mortality risk among PLWH.
Methods: 1,081 HIV-positive participants from the Veterans Ageing Cohort Study (VACS) were divided into training (N = 460), validation (N = 114), and testing (N = 507) sets. VACS index was used as a measure of mortality risk among PLWH. Model training and fine-tuning were conducted using the ensemble method in the training and validation sets and prediction performance was assessed in the testing set. The survival analysis comparing the predicted high and low mortality risk groups and the Gene Ontology enrichment analysis of the predictive CpG sites were performed.
Results: We selected a panel of 393 CpGs for the ensemble prediction model that showed excellent performance in predicting high mortality risk with an auROC of 0.809 (95%CI: 0.767,0.851) and a balanced accuracy of 0.653 (95%CI: 0.611, 0.693) in the testing set. The high mortality risk group was significantly associated with 10-year mortality (hazard ratio = 1.79, p = 4E-05) compared with low risk group. These 393 CpGs were located in 280 genes enriched in immune and inflammation response pathways.
Conclusions: We identified a panel of DNAm features associated with mortality risk in PLWH. These DNAm features may serve as predictive biomarkers for mortality risk among PLWH.
Abbreviations: AUC: Area Under Curve; CI: Confidence interval; DMR: differentially methylated region; DNA: Deoxyribonucleic acid; DNAm: DNA methylation; DAVID: Database for Annotation, Visualization, and Integrated Discovery; EWA: epigenome-wide association; FDR: False discovery rate; FWER: Family-wise error rate; GLMNET: elastic-net-regularized generalized linear models; GO: Gene ontology; HIV: Human immunodeficiency virus; HM450K: Human Methylation 450 K BeadChip; k-NN: k-nearest neighbours; NK: Natural killer; PC: Principal component; PLWH: people living with HIV; QC: Quality control; SVM: Support Vector Machines; VACS: Veterans Ageing Cohort Study; XGBoost: Extreme Gradient Boosting Tree
Acknowledgments
The authors appreciate the supports of the Veteran Ageing Study Cohort Biomarker Core and the Yale Center of Genomic Analysis.
Authors’ contributions
CS was responsible for data analysis and manuscript preparation. ACJ provided DNA samples and clinical data and contributed to manuscript preparation. XZ was responsible for the bioinformatics data processing. VM was involved in clinical data collection and manuscript preparation. DH and EJ contributed to the analytical approach and to manuscript preparation. KX was responsible for the study design, study protocol, sample preparation, data analysis, interpretation of findings, and manuscript preparation.
Availability of data and materials
Demographic and clinical variables and DNAm data for the VACS samples were submitted to GEO dataset (GSE117861) and are available to the public. All codes for analysis are also available upon request to the corresponding author.
Disclosure statement
The authors declare that they have no competing interests.
Ethics approval and consent to participate
The study was approved by the Committee of the Human Research Subject Protection at Yale University and the Institutional Research Board Committee of the Connecticut Veteran Healthcare System. All subjects provided written consent.
Supplementary material
Supplemental data for this article can be accessed here.