1,471
Views
2
CrossRef citations to date
0
Altmetric
Research Paper

Ordering taxa in image convolution networks improves microbiome-based machine learning accuracy

, , , &
Article: 2224474 | Received 15 Dec 2022, Accepted 08 Jun 2023, Published online: 21 Jun 2023
 

ABSTRACT

The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine-learning-based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing or shotgun metagenomics. However, several properties of microbial sequence-based studies hinder machine learning (ML), including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of taxa present in a small subset of samples. We show here using a graph representation that the cladogram structure is as informative as the taxa frequency. We then suggest a novel method to combine information from different taxa and improve data representation for ML using microbial taxonomy. iMic (image microbiome) translates the microbiome to images through an iterative ordering scheme, and applies convolutional neural networks to the resulting image. We show that iMic has a higher precision in static microbiome gene sequence-based ML than state-of-the-art methods. iMic also facilitates the interpretation of the classifiers through an explainable artificial intelligence (AI) algorithm to iMic to detect taxa relevant to each condition. iMic is then extended to dynamic microbiome samples by translating them to movies.

Acknowledgments

We thank Miriam Beller for the English editing. OK is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement ERC-2020-COG No. 101001355). We thank Maayan Harel (Maayan Visuals) for her graphical contribution.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

All datasets are available at https://github.com/oshritshtossel/iMic/tree/master/Raw_data.

Contribution

OS developed the methods, implemented them, ran the code, and created all the figures, as well as prepared an initial draft of the text. HI implemented gMic and helped in writing parts of the text. YL and OK supervised the work and wrote parts of the text. OK and ST interpreted the biological relevance of the results. ST wrote parts of the text.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/19490976.2023.2224474

Additional information

Funding

OS was supported by the DSI-BIU grant for outstanding students in data science. YL was supported by ISF 870/20 and the Ministry of Health Preventive medicine 1/20 and OK was supported by the European Union’s Horizon 2020 research and innovation program (Grant agreement ERC-2020-COG No. 101001355).