3,938
Views
5
CrossRef citations to date
0
Altmetric
Research Paper

Performance of Gut Microbiome as an Independent Diagnostic Tool for 20 Diseases: Cross-Cohort Validation of Machine-Learning Classifiers

, , , , , , & ORCID Icon show all
Article: 2205386 | Received 10 Nov 2022, Accepted 17 Apr 2023, Published online: 04 May 2023
 

ABSTRACT

Cross-cohort validation is essential for gut-microbiome-based disease stratification but was only performed for limited diseases. Here, we systematically evaluated the cross-cohort performance of gut microbiome-based machine-learning classifiers for 20 diseases. Using single-cohort classifiers, we obtained high predictive accuracies in intra-cohort validation (~0.77 AUC), but low accuracies in cross-cohort validation, except the intestinal diseases (~0.73 AUC). We then built combined-cohort classifiers trained on samples combined from multiple cohorts to improve the validation of non-intestinal diseases, and estimated the required sample size to achieve validation accuracies of >0.7. In addition, we observed higher validation performance for classifiers using metagenomic data than 16S amplicon data in intestinal diseases. We further quantified the cross-cohort marker consistency using a Marker Similarity Index and observed similar trends. Together, our results supported the gut microbiome as an independent diagnostic tool for intestinal diseases and revealed strategies to improve cross-cohort performance based on identified determinants of consistent cross-cohort gut microbiome alterations.

Disclosure statement

No potential conflict of interest was reported by the authors.

Author contributions

W.H.C and X.M.Z designed and directed the research. J.Z, H.W., C.S. and N.L.G. helped with the sample collection. M.L and J.L analyzed the data, performed modeling and wrote the paper with results from all authors. W.H.C and X.M.Z. polished the manuscript through multiple iterations of discussions with all authors. All authors read and approved the final manuscript.

Data availability statement

The processed data and codes that support the findings of this study are available in GitHub repository at https://github.com/whchenlab/GMModels. These data were derived from the following resources available in the public domain: NCBI (https://www.ncbi.nlm.nih.gov/sra), ENA (https://www.ebi.ac.uk/ena/browser/), MGnify (https://www.ebi.ac.uk/metagenomics/), GMrepo v2 (https://gmrepo.humangut.info), and the accession codes were in TableS1.

Ethics approval

This study did not receive nor require ethics approval, as it reused the publicly available data.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19490976.2023.2205386.

Additional information

Funding

This research is supported by National Key Research and Development Program of China (2019YFA0905600 to W.H.C, 2020YFA0712403 to X.M.Z), National Natural Science Foundation of China (32070660 to W.H.C; T2225015, 61932008 to X.M.Z), NNSF-VR Sino-Swedish Joint Research Programme (82161138017), Greater Bay Area Institute of Precision Medicine (Guangzhou) (Grant No. IPM21C008), and Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab.