Figures & data
Table 1. Studies from which we collected sRNA-mRNA interacting pairs
Table 2. Training and benchmarking data characteristics
Table 3. Parameters per ML method used for grid-search CV
Table 4. Final benchmarking dataset used for all three programs. The table lists the genome accession used, the number of sRNAs, the number of mRNAs, the number of confirmed interacting pairs (P), and the number of pairs considered non-interacting (N) per bacterial species (from top to bottom: E. coli, Synechocystis and P. multocida)
Table 5. 10-fold CV AUROC for the best model per classifier trained on sequence-derived features (trinucleotide frequency difference and tetra-nucleotide frequency difference) of 1490 sRNA-mRNA pairs
Figure 1. ROC curve for the three programs on Escherichia coli data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.
![Figure 1. ROC curve for the three programs on Escherichia coli data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.](/cms/asset/caa204a5-4356-444b-8156-d59b77791eb5/krnb_a_2012058_f0001_oc.jpg)
Figure 2. ROC curve for the three programs on Synechocystis data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.
![Figure 2. ROC curve for the three programs on Synechocystis data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.](/cms/asset/2a61167c-0dec-43dc-b062-22c6b67a6804/krnb_a_2012058_f0002_oc.jpg)
Figure 3. ROC curve for the three programs on Pasteurella multocida data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.
![Figure 3. ROC curve for the three programs on Pasteurella multocida data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.](/cms/asset/c423f025-d2c8-4da2-9254-1a94a260831a/krnb_a_2012058_f0003_oc.jpg)
Table 6. AUROC obtained on each bacterial species included in the benchmark for all three programs assessed
Figure 4. Rank (lower = better) distribution of 102 Escherichia coli confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.
![Figure 4. Rank (lower = better) distribution of 102 Escherichia coli confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.](/cms/asset/cd9b4bb4-b076-4b9f-92be-c6da08306808/krnb_a_2012058_f0004_oc.jpg)
Figure 5. Rank (lower = better) distribution of 22 Synechocystis confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.
![Figure 5. Rank (lower = better) distribution of 22 Synechocystis confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.](/cms/asset/9a72bc4d-1d34-4ed0-b42b-380407ac61e3/krnb_a_2012058_f0005_oc.jpg)
Figure 6. Rank (lower = better) distribution of 20 Pasteurella multocida confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.
![Figure 6. Rank (lower = better) distribution of 20 Pasteurella multocida confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.](/cms/asset/a6ba4f26-f71f-4a31-929d-415a80a3dd27/krnb_a_2012058_f0006_oc.jpg)
Figure 7. Percentage of Escherichia coli confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.
![Figure 7. Percentage of Escherichia coli confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.](/cms/asset/283e7827-05c5-40fc-90eb-0ac90eaa2bdd/krnb_a_2012058_f0007_oc.jpg)
Figure 8. Percentage of Synechocystis confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.
![Figure 8. Percentage of Synechocystis confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.](/cms/asset/2e374c39-0227-44a9-a2aa-77e7a3020114/krnb_a_2012058_f0008_oc.jpg)
Figure 9. Percentage of Pasteurella multocida confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.
![Figure 9. Percentage of Pasteurella multocida confirmed interacting sRNA-mRNA pairs (recall) as a function of percentage top predicted interacting pairs.](/cms/asset/954f4174-ff25-4bb3-b4a7-df563366f728/krnb_a_2012058_f0009_oc.jpg)
Figure 10. ROC curve for sRNARFTarget and IntaRNA on E. coli and Salmonella data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.
![Figure 10. ROC curve for sRNARFTarget and IntaRNA on E. coli and Salmonella data. The plot shows the sensitivity (also called recall or true positive rate) as a function of the false-positive rate (FPR). The dash line indicates random classifier performance.](/cms/asset/ff6d4112-a477-4443-9930-996c2832ae98/krnb_a_2012058_f0010_oc.jpg)
Figure 11. Rank (lower = better) distribution of 119 E. coli and Salmonella confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.
![Figure 11. Rank (lower = better) distribution of 119 E. coli and Salmonella confirmed interacting pairs. The violin plot for each program shows the data density for different rank values and the horizontal line inside each box indicates the median rank of confirmed interacting pairs.](/cms/asset/4e1d14f3-2380-4caa-b2ee-971e33718679/krnb_a_2012058_f0011_oc.jpg)
Table 7. Execution time for sRNARFTarget and IntaRNA on benchmarking data. Both programs were run on an Intel Core i7 (2.2 GHz) with 4 cores and 16 GB of RAM computer
Table 8. CopraRNA web server job execution time on selected sRNA for each bacterium on the benchmark data
Table 9. Sporulation-associated genes in sRNARFTarget top 10% predicted RCd1 targets. Smaller ranks indicate higher confidence of sRNARFTarget in the corresponding target prediction