Search in:

Journal of Biomolecular Structure and Dynamics Latest Articles

Journal homepage

Open access

822

Views

CrossRef citations to date

Altmetric

Research Article

Machine learning-based prediction of proteins’ architecture using sequences of amino acids and structural alphabets

Jad Abbassa School of Computer Science and Mathematics, Kingston University, London, UKCorrespondence[email protected]

https://orcid.org/0000-0002-3491-8237 View further author information

Charles Parisia School of Computer Science and Mathematics, Kingston University, London, UK;b Telecom Physique Strasbourg, Strasbourg University, Strasbourg, FranceView further author information

Received 28 Nov 2023, Accepted 05 Mar 2024, Published online: 20 Mar 2024

Cite this article
https://doi.org/10.1080/07391102.2024.2328736
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Figures & data

Figure 1. Schematic comparison between SCOP and CATH.

Figure 2. Two structures belong to the same Architecture (3-Layer (aba) Sandwich) with different Topologies (Left-to-right: (1) Rossmann fold, PDBID: 1A2B and (2) Ribosomal Protein L9, PDBID: 2HBA). Although the number and length of the secondary structures is quite different, they both share the same overall shape: beta-sheets surrounded by alpha helices. Images created using Mol* (Sehnal et al., Citation2021).

Figure 3. Distribution of 40 CATH Architectures amongst the 169,843 domains – Dataset #1.

Figure 4. Distribution of 12 CATH Architectures amongst the 157,044 domains – Dataset #2.

Figure 5. Distribution of 28 excluded CATH Architectures amongst the 12,799 domains – This represents Dataset #1 subtracted by Dataset #2.

Figure 6. Flowchart of the Process to create both Datasets: 1 & 2 that contain in each row (domain), the Amino Acid Sequence, Protein Block Sequence and the Architecture ID.

Figure 7. F1 Score of three algorithms at three experiments for both datasets. The first column is for Dataset #2 (12 classes) and the second column is for Dataset #1 (40 classes).

Table 1. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for the main Classes (Dataset 2 – 12 Architectures).

Download CSV Display Table

Table 2. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for All Classes (Dataset 1–40 Architectures).

Download CSV Display Table

Table 3. Comparison between averages of with .

Download CSV Display Table

Table 4. Results (F1 Score) of Extended Experiments 3 (using both features, Sequence of Amino Acids and Sequence of Protein Blocks) for both the main and 40 Classes (Datasets 1 & 2).

Download CSV Display Table

Figure 8. From Left to Right, structures of three proteins that belong to architectures 6.10 and 6.20: 2LH0 (6.10), 4GIP (6.10), 2CON (6.20). Images created using Mol* (Sehnal et al., Citation2021).

Table 5. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Download CSV Display Table

Table 6. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Download CSV Display Table

Figure 9. F1 Scores of 40-Classes SVM 80 experiments for each Architecture.

Table 7. Holdout dataset’s results.

Download CSV Display Table

Table 8. Final conclusion that includes the best 4 algorithms out of the study.

Download CSV Display Table

Sehnal, D., Bittrich, S., Deshpande, M., Svobodová, R., Berka, K., Bazgier, V., Velankar, S., Burley, S. K., Koča, J., & Rose, A. S. (2021). Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Research, 49(W1), W431–W437. https://doi.org/10.1093/nar/gkab314

PubMed Web of Science ®Google Scholar

Data availability statement

The code and datasets developed in this study can be accessed at https://zenodo.org/records/10203431.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Machine learning-based prediction of proteins’ architecture using sequences of amino acids and structural alphabets

Table 1. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for the main Classes (Dataset 2 – 12 Architectures).

Table 2. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for All Classes (Dataset 1–40 Architectures).

Table 3. Comparison between averages of with .

Table 4. Results (F1 Score) of Extended Experiments 3 (using both features, Sequence of Amino Acids and Sequence of Protein Blocks) for both the main and 40 Classes (Datasets 1 & 2).

Table 5. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Table 6. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Table 7. Holdout dataset’s results.

Table 8. Final conclusion that includes the best 4 algorithms out of the study.

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Machine learning-based prediction of proteins’ architecture using sequences of amino acids and structural alphabets

Figures & data

Table 1. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for the main Classes (Dataset 2 – 12 Architectures).

Table 2. Results (F1 Score) of Experiments 1 (using Sequence of Amino Acids), Experiments 2 (using Sequence of Protein Blocks) and Experiments 3 (Using Both Features) for All Classes (Dataset 1–40 Architectures).

Table 3. Comparison between averages of Table 1 with Table 2.

Table 4. Results (F1 Score) of Extended Experiments 3 (using both features, Sequence of Amino Acids and Sequence of Protein Blocks) for both the main and 40 Classes (Datasets 1 & 2).

Table 5. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Table 6. Detailed F1 Score for SVM – 12 Classes – k = 5 (AA) K = 13 (PB).

Table 7. Holdout dataset’s results.

Table 8. Final conclusion that includes the best 4 algorithms out of the study.

Data availability statement

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 3. Comparison between averages of with .