Figures & data
Figure 2. Two structures belong to the same Architecture (3-Layer (aba) Sandwich) with different Topologies (Left-to-right: (1) Rossmann fold, PDBID: 1A2B and (2) Ribosomal Protein L9, PDBID: 2HBA). Although the number and length of the secondary structures is quite different, they both share the same overall shape: beta-sheets surrounded by alpha helices. Images created using Mol* (Sehnal et al., Citation2021).
![Figure 2. Two structures belong to the same Architecture (3-Layer (aba) Sandwich) with different Topologies (Left-to-right: (1) Rossmann fold, PDBID: 1A2B and (2) Ribosomal Protein L9, PDBID: 2HBA). Although the number and length of the secondary structures is quite different, they both share the same overall shape: beta-sheets surrounded by alpha helices. Images created using Mol* (Sehnal et al., Citation2021).](/cms/asset/69ed1cac-8b66-4f30-a40e-e99593cb00eb/tbsd_a_2328736_f0002_c.jpg)
Figure 5. Distribution of 28 excluded CATH Architectures amongst the 12,799 domains – This represents Dataset #1 subtracted by Dataset #2.
![Figure 5. Distribution of 28 excluded CATH Architectures amongst the 12,799 domains – This represents Dataset #1 subtracted by Dataset #2.](/cms/asset/27ca7e24-2eda-4f94-875d-2f7464f05c5d/tbsd_a_2328736_f0005_c.jpg)
Figure 6. Flowchart of the Process to create both Datasets: 1 & 2 that contain in each row (domain), the Amino Acid Sequence, Protein Block Sequence and the Architecture ID.
![Figure 6. Flowchart of the Process to create both Datasets: 1 & 2 that contain in each row (domain), the Amino Acid Sequence, Protein Block Sequence and the Architecture ID.](/cms/asset/d3cf82ac-d326-4329-a03e-42e72858bbc3/tbsd_a_2328736_f0006_c.jpg)
Figure 7. F1 Score of three algorithms at three experiments for both datasets. The first column is for Dataset #2 (12 classes) and the second column is for Dataset #1 (40 classes).
![Figure 7. F1 Score of three algorithms at three experiments for both datasets. The first column is for Dataset #2 (12 classes) and the second column is for Dataset #1 (40 classes).](/cms/asset/db4876e3-0ba9-407e-9fdd-cd373ef050d0/tbsd_a_2328736_f0007_c.jpg)
Table 1. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for the main Classes (Dataset 2 – 12 Architectures).
Table 2. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for All Classes (Dataset 1–40 Architectures).
Table 3. Comparison between averages of with .
Table 4. Results (F1 Score) of Extended Experiments 3 (using both features, Sequence of Amino Acids and Sequence of Protein Blocks) for both the main and 40 Classes (Datasets 1 & 2).
Figure 8. From Left to Right, structures of three proteins that belong to architectures 6.10 and 6.20: 2LH0 (6.10), 4GIP (6.10), 2CON (6.20). Images created using Mol* (Sehnal et al., Citation2021).
![Figure 8. From Left to Right, structures of three proteins that belong to architectures 6.10 and 6.20: 2LH0 (6.10), 4GIP (6.10), 2CON (6.20). Images created using Mol* (Sehnal et al., Citation2021).](/cms/asset/f6182d2f-4db7-4ef4-a049-e2ac35468957/tbsd_a_2328736_f0008_c.jpg)
Table 5. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).
Table 6. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).
Table 7. Holdout dataset’s results.
Table 8. Final conclusion that includes the best 4 algorithms out of the study.
Data availability statement
The code and datasets developed in this study can be accessed at https://zenodo.org/records/10203431.