3,125
Views
4
CrossRef citations to date
0
Altmetric
Research Paper

Microbial genes outperform species and SNVs as diagnostic markers for Crohn’s disease on multicohort fecal metagenomes empowered by artificial intelligence

ORCID Icon, , ORCID Icon, , , , , ORCID Icon, , & show all
Article: 2221428 | Received 08 Feb 2023, Accepted 16 May 2023, Published online: 06 Jun 2023
 

ABSTRACT

Dysbiosis of gut microbial community is associated with the pathogenesis of CD and may serve as a promising noninvasive diagnostic tool. We aimed to compare the performances of the microbial markers of different biological levels by conducting a multidimensional analysis on the microbial metagenomes of CD. We collected fecal metagenomic datasets generated from eight cohorts that altogether include 870 CD patients and 548 healthy controls. Microbial alterations in CD patients were assessed at multidimensional levels including species, gene, and SNV level, and then diagnostic models were constructed using artificial intelligence algorithm. A total of 227 species, 1047 microbial genes, and 21,877 microbial SNVs were identified that differed between CD and controls. The species, gene, and SNV models achieved an average AUC of 0.97, 0.95, and 0.77, respectively. Notably, the gene model exhibited superior diagnostic capability, achieving an average AUC of 0.89 and 0.91 for internal and external validations, respectively. Moreover, the gene model was specific for CD against other microbiome-related diseases. Furthermore, we found that phosphotransferase system (PTS) contributed substantially to the diagnostic capability of the gene model. The outstanding performance of PTS was mainly explained by genes celB and manY, which demonstrated high predictabilities for CD with metagenomic datasets and was validated in an independent cohort by qRT-PCR analysis. Our global metagenomic analysis unravels the multidimensional alterations of the microbial communities in CD and identifies microbial genes as robust diagnostic biomarkers across geographically and culturally distinct cohorts.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (82170542 to RZ, 92251307 to RZ, 32200529 to DW, 82000536 to NJ), the National Key Research and Development Program of China (2021YFF0703700/2021YFF0703702 to RZ), Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project (2019ZT08Y464 to LZ), the program of Guangdong Provincial Clinical Research Center for Digestive Diseases (2020B1111170004), and National Key Clinical Discipline. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Abbreviations

ABC.PE.P: peptide/nickel transport system permease protein; agaF: N-acetylgalactosamine PTS system EIIA component; AKR1A1: alcohol dehydrogenase (NADP+); ALDH: aldehyde dehydrogenase (NAD+); allA: ureidoglycolate lyase; AUC: area under the ROC curve; CD: Crohn’s disease; celB: cellobiose PTS system EIIC component; CRC: colorectal cancer; EIIC: enzyme IIC component; ENA: European Nucleotide Archive; fliC: flagellin; FNN: Feedforward neural network; GSEA: gene set enrichment analysis; IBD: inflammatory bowel disease; impB: type VI secretion system protein ImpB; KO: KEGG Orthology; LC: liver cirrhosis; LOCO: leave-one-cohort-out; maeB: malate dehydrogenase (oxaloacetate-decarboxylating) (NADP+); manY: mannose PTS system EIIC component; nirK: nitrite reductase (NO-forming); pckA: phosphoenolpyruvate carboxykinase (GTP); PD: Parkinson’s disease; PTS: phosphotransferase system; ReLU: rectified linear unit; rfbJ: CDP-abequose synthase; ROC: receiver operating characteristic; SHAP: SHapley Additive exPlanations; SNVs: single nucleotide variants; sucD: succinyl-CoA synthetase alpha subunit; T2D: type-2 diabetes; tcPp: toxin coregulated pilus biosynthesis protein P; tmoC: toluene monooxygenase system ferredoxin subunit; trbJ: type IV secretion system protein TrbJ; ttuC: tartrate dehydrogenase/decarboxylase/D-malate dehydrogenase; UC: ulcerative colitis; WMS: whole-metagenome sequencing

Authors’ contributions

LZ, ZL, QH, and RZ conceived and designed the project. SG performed the public data collection, microbiome analysis, AI modeling, and bioinformatics analysis. XG recruited the participants, collected the fecal samples, and performed the qRT-PCR analysis. SG and XG drafted the manuscript. RZ, DW, ZF, NJ, RS, WG, QH, ZL, and LZ revised the manuscript. All authors read and approved the final manuscript.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

All of the processed data in this study has been uploaded in the National Omics Data Encyclopedia under accession no. OEP003761. The raw metagenomic data are available in the European Nucleotide Archive (https://www.ebi.ac.uk/ena/) under accession Nos. PRJNA398089, PRJNA385949, PRJNA400072, SRP057027, PRJEB15371, PRJNA389280, PRJEB1220, PRJEB27928, PRJEB17784, PRJEB1786, and PRJEB6337. The data relevant to the study are included in the article or uploaded as supplementary information. The code and scripts are available on GitHub (https://github.com/tjcadd2020/Diagnosis-for-CD).

Ethics approval and consent

All participants provided written informed consent prior to data collection. The study was approved by the Institutional Review Board at the Shanghai Tenth People’s Hospital, Tongji University, Shanghai (No. 20KT863).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19490976.2023.2221428

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

The work was supported by the National Natural Science Foundation of China [82170542, 92251307, 32200529, 82000536]; National Key Research and Development Program of China [2021YFF0703700/2021YFF0703702]; Guangdong Province “Pearl River Talent Plan” Innovation and Entrepreneurship Team Project [2019ZT08Y464]; Program of Guangdong Provincial Clinical Research Center for Digestive Diseases [2020B1111170004], and National Key Clinical Discipline.