1,981
Views
0
CrossRef citations to date
0
Altmetric
Research Paper

Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction

& ORCID Icon
Article: 2302076 | Received 17 May 2023, Accepted 02 Jan 2024, Published online: 12 Jan 2024

Figures & data

Table 1. Summary of human gut microbiome datasets used for disease state prediction.

Figure 1. Impacts of the hyperparameters on the MicroKPNN performance for the EW-T2D dataset.

(a) Comparison of the performance of the models built using different numbers of fully-connected hidden nodes (with the taxonomic rank set to “order” for this comparison). In this plot, each bar represents the performance of a model built using a certain number of fully connected hidden nodes (the numbers are shown below the bar). (b) comparison of the performance of models built using different taxonomic ranks (the number of fully connected hidden nodes was set to 10 for this comparison). The taxonomic ranks are shown below the bars in the plot. The standard deviation error bars were computed using results from five different runs.
Figure 1. Impacts of the hyperparameters on the MicroKPNN performance for the EW-T2D dataset.

Table 2. Summary of best performing neural network architecture for each dataset and their average AUC.

Table 3. Comparison of MicroKPNN with different methods including NNs that are fully connected (fc-NN) in averaged AUC and standard deviation (in parenthesis).

Table 4. Comparison of MicroKPNN with DeepMicro in additional metrics (MCC and AUC-PR).

Figure 2. Impacts of the downsampling of samples on the different approaches for selected datasets.

(a) Cirrhosis; (b) Colorectal cancer. We tried three different downsamplings, 75%, 50%, and 25%, and the results (AUCs and standard deviation distribution) are shown in the plots along with the performance when the entire dataset was used. We employed a stratified sampling approach to maintain the distribution balance between control and disease samples when downsampling.
Figure 2. Impacts of the downsampling of samples on the different approaches for selected datasets.

Figure 3. Contributions of the different groups of hidden nodes to the prediction as measured by importance scores. (a) IBD; (b) EW-T2D; (c) C-T2D; (d) obesity; (e) cirrhosis; (f) colorectal cancer. The boxes in different colors with whiskers show the distribution of the importance scores of the hidden nodes in different groups.

Figure 3. Contributions of the different groups of hidden nodes to the prediction as measured by importance scores. (a) IBD; (b) EW-T2D; (c) C-T2D; (d) obesity; (e) cirrhosis; (f) colorectal cancer. The boxes in different colors with whiskers show the distribution of the importance scores of the hidden nodes in different groups.

Table 5. Ranks of taxonomic groups (orders) and metabolic activities that are potentially important for microbiome-based obesity prediction.

Figure 4. The neural network structure used in MicroKPNN. It is composed of three layers (shown on the left). In the input layer, each node is a species, and the hidden layer includes nodes of four different groups: metabolites (red), taxa (blue), communities (green), and fully connected hidden nodes (gray). The links between the input nodes and the nodes in the hidden layer represent different biological meanings (shown on the right).

Figure 4. The neural network structure used in MicroKPNN. It is composed of three layers (shown on the left). In the input layer, each node is a species, and the hidden layer includes nodes of four different groups: metabolites (red), taxa (blue), communities (green), and fully connected hidden nodes (gray). The links between the input nodes and the nodes in the hidden layer represent different biological meanings (shown on the right).
Supplemental material

Data availability statement

MicroKPNN is available as an open source repository at https://github.com/mgtools/MicroKPNN.