2,291
Views
0
CrossRef citations to date
0
Altmetric
Research Paper

A metatranscriptomics strategy for efficient characterization of the microbiome in human tissues with low microbial biomass

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2323235 | Received 22 Sep 2023, Accepted 21 Feb 2024, Published online: 29 Feb 2024

Figures & data

Figure 1. Metatranscriptomics workflow for synthetic host-microbiome samples.

(a) To establish and optimize our metatranscriptomics strategy, four synthetic samples mimicking human samples with distinct host to bacterial cell ratios were created to contain 10%, 70%, 90%, and 97% host cells (SS10, SS70, SS90, and SS97, respectively) by spiking the mock microbial community (100% bacterial cells; Mock) with a human cell line (100% host cells; AGS). To guarantee high RNA yield and quality, the freshly prepared synthetic samples were immediately processed. Total RNA was isolated and treated with DNase I to remove contaminant DNA. For metatranscriptomics, due to the high predominance of rRNAs in total RNA, a double depletion of prokaryotic and eukaryotic rRNA was performed. All samples were sequenced on the Illumina NovaSeq 6000 platform at high depth (15 Gbp). In parallel, next-generation sequencing of the 16S rRNA transcript was performed from total RNA samples. (b) The raw metatranscriptome sequencing data were pre-processed to trim the sequencing adapters, and to remove low-quality reads and rRNA, viruses, and human sequences. Then, mapping-based methods were used for taxonomic and functional analysis of the metatranscriptomes. First, the taxonomic profiling was performed using different computational tools submitted with multiple settings to improve species identification. After in silico removal of potential taxa contaminants, the decontaminated taxonomic profiles from each classifier were integrated into the functional analysis performed by HUMAnN 3, which stratifies community functional profiles according to contributing species.
Figure 1. Metatranscriptomics workflow for synthetic host-microbiome samples.

Table 1. The settings used with different parameters for each taxonomic classifier to improve the identification of species.

Figure 2. Kraken 2/Bracken with optimized settings accurately profiles the metatranscriptome of synthetic samples with low microbial biomass.

Taxonomic analysis at the species-level of the metatranscriptome of synthetic samples (SS) using MetaPhlan 4, mOTUs3, Centrifuge, and Kraken 2/Bracken with different parameter settings. Upper color-coded bar plot showing the relative abundances of the 20 bacterial species of the mock community and of other microbial species that are not present in the mock community (Others) identified by each classifier. Middle bar plots showing the number of bacterial species of the mock community each tool was able to identify and the number of false-positive species detected (Others). Dotted line shows the expected number of species of the mock community in the samples. Spearman’s rank correlation between the number of species identified in the synthetic samples and their percentage of host cells was determined for each classifier. The performance metrics precision, recall, and F1 score for the classifications given by each classifier in the synthetic samples are shown in the bottom panel. The complete names of the species of the mock community are shown in the Supplementary Table S1. * stands for statistically significant at P < .05.
Figure 2. Kraken 2/Bracken with optimized settings accurately profiles the metatranscriptome of synthetic samples with low microbial biomass.

Figure 3. 16S rRNA transcript sequencing validates the metatranscriptome profiles of Kraken 2/Bracken with optimized settings.

(a) Taxonomic analysis at the genus-level of synthetic samples (SS) using metatranscriptomics with optimized Kraken 2/Bracken (left panel) and 16S rRNA transcript sequencing (right panel). Upper color-coded bar plots showing the relative abundances of the 18 bacterial genera of the mock community and of other microbial genera that are not present in the mock community (Others) identified by each method. Bar plots showing the number of bacterial genera of the mock community each method was able to identify and the number of false-positive genera detected (Others). Dotted line shows the expected number of genera of the mock community in the samples. (b) Spearman’s correlation matrix between the taxonomic profiles of metatranscriptomics with optimized Kraken 2/Bracken and 16S rRNA transcript sequencing from synthetic host-microbiome samples. Significance was considered for p < .05 (Supplementary Table S4).
Figure 3. 16S rRNA transcript sequencing validates the metatranscriptome profiles of Kraken 2/Bracken with optimized settings.

Figure 4. Kraken 2/Bracken with optimized settings has high sensitivity to profile the metatranscriptome in low microbial biomass simulated datasets.

(a) Schematic representation of the experimental design to generate simulated datasets (SD) of the metatranscriptomes of synthetic samples with low microbial biomass. Microbial and host single-end reads were randomly selected from the mock community (Mock) and from the SS97 pre-processed datasets, and were combined in different proportions, at a fixed sequencing depth of 100 million reads, to generate five simulated datasets with progressively higher host sequences (90%, 97%, 98%, 99%, and 100%). Then, the taxonomic profiling of the simulated datasets was determined using Kraken 2/Bracken with optimized settings. (b) Taxonomic analysis at the species-level of the metatranscriptomes of synthetic samples (SS; left panel) and simulated datasets (SD; right panel) using Kraken 2/Bracken with optimized settings. Heat map showing the relative abundances of the 20 bacterial species of the mock community and of other microbial species that are not present in the mock community (Others). Data from simulated datasets is shown as the mean relative abundance of species from 3 independent experiments. Bar plots showing the number of bacterial species of the mock community that Kraken 2/Bracken with optimized settings was able to identify and the number of false-positive species detected (Others) on synthetic samples (left panel) and simulated datasets (right panel). Dotted line shows the expected number of species of the mock community in the samples.
Figure 4. Kraken 2/Bracken with optimized settings has high sensitivity to profile the metatranscriptome in low microbial biomass simulated datasets.

Figure 5. Kraken 2/Bracken improves HUMAnN 3 functional analysis of the metatranscriptome of synthetic samples with low microbial biomass.

Functional analysis of the metatranscriptomes of synthetic samples performed using HUMAnN 3 in combination with the taxonomic classifiers MetaPhlAn 4, mOTUs3, Centrifuge, and Kraken 2/Bracken. Upper lollipop plots showing the number of gene families identified in each synthetic sample by HUMAnN 3 combined with different classifiers. Middle bar plots showing the abundance of mapped and unmapped functions as copies per million (CPM) in each synthetic sample. Unmapped represents the reads that failed to map after both HUMAnN 3 alignment steps (nucleotide and translated searches). Bottom heat maps showing the contribution of each search tier for functional analysis as CPM in each synthetic sample.
Figure 5. Kraken 2/Bracken improves HUMAnN 3 functional analysis of the metatranscriptome of synthetic samples with low microbial biomass.

Figure 6. Effective application of the metatranscriptomics method in human clinical samples.

(a) Taxonomic analysis at the species-level of the metatranscriptomes of clinical tissue specimens (TS) using MetaPhlan 4, mOTUs3, Centrifuge, and Kraken 2/Bracken with different parameter settings. Upper color-coded bar plot showing the relative abundances of the top 20 most abundant species in tissue specimens identified by each classifier. Others represent other identified species not shown in the graph (not in the top 20). Unclassified stands for no taxa identified. The number of species each tool/settings was able to identify is shown per tissue specimen (middle bar plot) and as median of all tissue specimens (bottom box plot). Significance was obtained using the Kruskal-Wallis non-parametric test, corrected with Dunn’s test for multiple comparisons. (b) Functional analysis of the metatranscriptomes of clinical tissue specimens using HUMAnN 3 in combination with MetaPhlan 4, mOTUs3, Centrifuge, and Kraken 2/Bracken with different parameter settings. Box plot showing the number of gene families that HUMAnN 3 was able to detect in tissue specimens using each tool/settings. Significance was obtained using the one-way ANOVA followed by Dunnett’s multiple comparisons test. Color-coded bar plot showing the contribution of each search tier for functional analysis of tissue specimens, represented as the proportion of gene families. * stands for significantly different from Kraken 2/Bracken (Settings 6) at p < .05.
Figure 6. Effective application of the metatranscriptomics method in human clinical samples.
Supplemental material

Supplemental Material

Download PDF (868.9 KB)

Data availability statement

The raw sequence data of the datasets generated in the current study are available at the NCBI BioProject under the accession number: PRJNA1003801.