822
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Machine learning-based prediction of proteins’ architecture using sequences of amino acids and structural alphabets

ORCID Icon &
Received 28 Nov 2023, Accepted 05 Mar 2024, Published online: 20 Mar 2024

References

  • Abbass, J., & Nebel, J.-C. (2015). Customised fragments libraries for protein structure prediction based on structural class annotations. BMC Bioinformatics, 16(1), 136. https://doi.org/10.1186/s12859-015-0576-2
  • Abdennaji, I., Zaied, M., & Girault, J. M. (2021). Prediction of protein structural class based on symmetrical recurrence quantification analysis. Computational Biology and Chemistry, 92, 107450. https://doi.org/10.1016/J.COMPBIOLCHEM.2021.107450
  • Allam, I., Flatters, D., Caumes, G., Regad, L., Delos, V., Nuel, G., & Camproux, A.-C. (2018). SAFlex: A structural alphabet extension to integrate protein structural flexibility and missing data information. PLoS One, 13(7), e0198854. https://doi.org/10.1371/journal.pone.0198854
  • Andreeva, A. (2016). Lessons from making the Structural Classification of Proteins (SCOP) and their implications for protein structure modelling. Biochemical Society Transactions, 44(3), 937–943. https://doi.org/10.1042/BST20160053
  • Andreeva, A., & Murzin, A. G. (2010). Structural classification of proteins and structural genomics: New insights into protein folding and evolution. Acta Crystallographica. Section F, Structural Biology and Crystallization Communications, 66(Pt 10), 1190–1197. doi: 10.1107/S1744309110007177/https://journals.iucr.org/services/termsofuse.html.
  • Andreeva, A., Howorth, D., Chothia, C., Kulesha, E., & Murzin, A. G. (2014). SCOP2 prototype: A new approach to protein structure mining. Nucleic Acids Research., 42(D1), D310–D314.
  • Andreeva, A., Kulesha, E., Gough, J., & Murzin, A. G. (2020). The SCOP database in 2020: Expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Research, 48(D1), D376–D382. https://doi.org/10.1093/NAR/GKZ1064
  • Antipov, D., Raiko, M., Lapidus, A., & Pevzner, P. A. (2020). MetaviralSPAdes: Assembly of viruses from metagenomic data. Bioinformatics, 36(14), 4126–4129. https://doi.org/10.1093/BIOINFORMATICS/BTAA490
  • Barnoud, J., Santuz, H., Craveur, P., Joseph, A. P., Jallu, V., de Brevern, A. G., & Poulain, P. (2017). PBxplore: A tool to analyze local protein structure and deformability with Protein Blocks. PeerJ., 5(11), e4013. https://doi.org/10.7717/peerj.4013
  • Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bye-A-Jee, H., Cukura, A., Denny, P., Dogan, T., Ebenezer, TGod., Fan, J., Garmiri, P., da Costa Gonzales, L. J., Hatton-Ellis, E., Hussein, A., Ignatchenko, A., Insana, G., Ishtiaq, R., & Zhang, J. (2023). UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523–D531. https://doi.org/10.1093/nar/gkac1052
  • Bepler, T., & Berger, B. (2021). Learning the protein language: Evolution, structure, and function. Cell Systems, 12(6), 654–669.e3. https://doi.org/10.1016/J.CELS.2021.05.017
  • Berman, H. M. (2000). The Protein Data Bank. Nucleic Acids Research, 28(1), 235–242. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=102472&tool=pmcentrez&rendertype=abstract https://doi.org/10.1093/nar/28.1.235
  • Bileschi, M. L., Belanger, D., Bryant, D. H., Sanderson, T., Carter, B., Sculley, D., Bateman, A., DePristo, M. A., & Colwell, L. J. (2022). Using deep learning to annotate the protein universe. Nature Biotechnology, 40(6), 932–937. https://doi.org/10.1038/s41587-021-01179-w
  • Bonidia, R. P., Domingues, D. S., Sanches, D. S., & De Carvalho, A. C. P. L. F. (2022). MathFeature: Feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Briefings in Bioinformatics, 23(1), 1–10. https://doi.org/10.1093/BIB/BBAB434
  • Bonidia, R. P., Sampaio, L. D. H., Domingues, D. S., Paschoal, A. R., Lopes, F. M., de Carvalho, A. C. P. L. F., & Sanches, D. S. (2021). Feature extraction approaches for biological sequences: A comparative study of mathematical features. Briefings in Bioinformatics, 22(5), 1–20. https://doi.org/10.1093/BIB/BBAB011
  • Burley, S. K., Bhikadiya, C., Bi, C., Bittrich, S., Chen, L., Crichlow, G. V., Christie, C. H., Dalenberg, K., Di Costanzo, L., Duarte, J. M., Dutta, S., Feng, Z., Ganesan, S., Goodsell, D. S., Ghosh, S., Green, R. K., Guranović, V., Guzenko, D., Hudson, B. P., … Zhuravleva, M. (2021). RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Research, 49(D1), D437–D451. https://doi.org/10.1093/NAR/GKAA1038
  • Cai, Y. D., Feng, K. Y., Lu, W. C., & Chou, K.-C. (2006). Using LogitBoost classifier to predict protein structural classes. Journal of Theoretical Biology, 238(1), 172–176. https://doi.org/10.1016/j.jtbi.2005.05.034
  • Camproux, A. C., & Tufféry, P. (2005). Hidden Markov Model-derived structural alphabet for proteins: The learning of protein local shapes captures sequence specificity. Biochimica et Biophysica Acta, 1724(3), 394–403. https://doi.org/10.1016/J.BBAGEN.2005.05.019
  • Camproux, A. C., Gautier, R., & Tufféry, P. (2004). A hidden Markov model derived structural alphabet for proteins. Journal of Molecular Biology, 339(3), 591–605. https://doi.org/10.1016/j.jmb.2004.04.005
  • Camproux, A. C., Tuffery, P., Chevrolat, J. P., Boisvieux, J. F., & Hazout, S. (1999). Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Engineering, 12(12), 1063–1073. https://doi.org/10.1093/protein/12.12.1063
  • Chen, Z., Zhao, P., Li, F., Marquez-Lago, T. T., Leier, A., Revote, J., Zhu, Y., Powell, D. R., Akutsu, T., Webb, G. I., Chou, K.-C., Smith, A. I., Daly, R. J., Li, J., & Song, J. (2020). iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinformatics, 21(3), 1047–1057. https://doi.org/10.1093/BIB/BBZ041
  • Chothia, C., Gough, J., Vogel, C., & Teichmann, S. A. (2003). Evolution of the protein repertoire. Science, 300(5626), 1701–1703. https://doi.org/10.1126/SCIENCE.1085371
  • Chou, K.-C. (2005). Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein & Peptide Science, 6(5), 423–436. https://doi.org/10.2174/138920305774329368
  • Chou, K.-C., & Zhang, C. T. (1995). Prediction of protein structural classes. Critical Reviews in Biochemistry and Molecular Biology, 30(4), 275–349. https://doi.org/10.3109/10409239509083488
  • Dawson, N. L., Lewis, T. E., Das, S., Lees, J. G., Lee, D., Ashford, P., Orengo, C. A., & Sillitoe, I. (2016). CATH: An expanded resource to predict protein function through structure and sequence. Nucleic Acids Research, 45(D1), D289–D295. https://doi.org/10.1093/nar/gkw1098
  • de Brevern, A. G. (2005). New assessment of a structural alphabet. In Silico Biology., 5(3), 283–289.
  • De Brevern, A. G., Etchebest, C., & Hazout, S. (2000). Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins: Structure, Function, and Genetics, 41(3), 271–287. https://doi.org/10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO;2-Z
  • Dehzangi, A., Paliwal, K., Lyons, J., Sharma, A., & Sattar, A. (2014). Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics, 15(Suppl 1), S2. https://doi.org/10.1186/1471-2164-15-S1-S2
  • Dehzangi, A., Paliwal, K., Sharma, A., Dehzangi, O., & Sattar, A. (2013). A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(3), 564–575. https://doi.org/10.1109/TCBB.2013.65
  • Déraspe, M., Boisvert, S., Laviolette, F., Roy, P. H., & Corbeil, J. (2022). Flexible protein database based on amino acid k-mers. Scientific Reports, 12(1), 9101. https://doi.org/10.1038/s41598-022-12843-9
  • Ding, S., Li, Y., Shi, Z., & Yan, S. (2014). A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile. Biochimie, 97(1), 60–65. https://doi.org/10.1016/j.biochi.2013.09.013
  • Ding, Y.-S., Zhang, T.-L., & Chou, K.-C. (2007). Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein and Peptide Letters, 14(8), 811–815. https://doi.org/10.2174/092986607781483778
  • Dong, Q. W., Wang, X. L., & Lin, L. (2007). Methods for optimizing the structure alphabet sequences of proteins. Computers in Biology and Medicine, 37(11), 1610–1616. https://doi.org/10.1016/j.compbiomed.2007.03.002
  • Dubinkina, V. B., Ischenko, D. S., Ulyantsev, V. I., Tyakht, A. V., & Alexeev, D. G. (2016). Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinformatics, 17(1), 38. doi: 10.1186/S12859-015-0875-7/FIGURES/3.
  • Dudev, M., & Lim, C. (2007). Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites. BMC Bioinformatics, 8(1), 1–12. https://doi.org/10.1186/1471-2105-8-106/FIGURES/6
  • Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., Bhowmik, D., & Rost, B. (2022). ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381
  • Etchebest, C., Benros, C., Bornot, A., Camproux, A. C., & De Brevern, A. G. (2007). A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. European Biophysics Journal, 36(8), 1059–1069. doi: 10.1007/S00249-007-0188-5/TABLES/4.
  • Faure, G., Bornot, A., & de Brevern, A. G. (2008). Protein contacts, inter-residue interactions and side-chain modelling. Biochimie, 90(4), 626–639. https://doi.org/10.1016/j.biochi.2007.11.007
  • Faure, G., Joseph, A. P., Craveur, P., Narwani, T. J., Srinivasan, N., Gelly, J.-C., Rebehmed, J., & de Brevern, A. G. (2019). IPBAvizu: A PyMOL plugin for an efficient 3D protein structure superimposition approach. Source Code for Biology and Medicine, 14(1), 5. https://doi.org/10.1186/s13029-019-0075-3
  • Fetrow, J. S., Palumbo, M. J., & Berg, G. (1997). Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme. Proteins, 27(2), 249–271. https://doi.org/10.1002/(SICI)1097-0134(199702)27:2<249::AID-PROT11>3.0.CO;2-M
  • Fox, N. K., Brenner, S. E., & Chandonia, J. M. (2015). The value of protein structure classification information – Surveying the scientific literature. Proteins, 83(11), 2025–2038. https://doi.org/10.1002/PROT.24915
  • Gelly, J. C., Joseph, A. P., Srinivasan, N., & De Brevern, A. G. (2011). IPBA: A tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Research, 39(Web Server issue), W18–W23. https://doi.org/10.1093/nar/gkr333
  • Ghiurcuta, C. G., & Moret, B. M. E. (2014). Evaluating synteny for improved comparative studies. Bioinformatics, 30(12), i9–18. https://doi.org/10.1093/BIOINFORMATICS/BTU259
  • Ghouzam, Y., Postic, G., De Brevern, A. G., & Gelly, J. C. (2015). Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics, 31(23), 3782–3789. https://doi.org/10.1093/BIOINFORMATICS/BTV462
  • Ghouzam, Y., Postic, G., Guerin, P. E., De Brevern, A. G., & Gelly, J. C. (2016). ORION: A web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Scientific Reports, 6(1), 28268. https://doi.org/10.1038/srep28268
  • Gribskov, M., McLachlan, A. D., & Eisenberg, D. (1987). Profile analysis: Detection of distantly related proteins. Proceedings of the National Academy of Sciences of the United States of America, 84(13), 4355–4358. https://doi.org/10.1073/pnas.84.13.4355
  • Guyon, F., Camproux, A. C., Hochez, J., & Tufféry, P. (2004). SA-Search: A web tool for protein structure mining based on a Structural Alphabet. Nucleic Acids Research, 32(Web Server issue), W545–W548. https://doi.org/10.1093/NAR/GKH467
  • Harris, R. S., & Medvedev, P. (2020). Improved representation of sequence bloom trees. Bioinformatics, 36(3), 721–727. https://doi.org/10.1093/BIOINFORMATICS/BTZ662
  • Hayat, M., & Khan, A. (2012). MemHyb: Predicting membrane protein types by hybridizing SAAC and PSSM. Journal of Theoretical Biology, 292, 93–102. https://doi.org/10.1016/j.jtbi.2011.09.026
  • Hayat, M., Khan, A., & Yeasin, M. (2012). Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids, 42(6), 2447–2460. https://doi.org/10.1007/s00726-011-1053-5
  • Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., & McVean, G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44(2), 226–232. https://doi.org/10.1038/ng.1028
  • Jahandideh, S., Abdolmaleki, P., Jahandideh, M., & Asadabadi, E. B. (2007). Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophysical Chemistry, 128(1), 87–93. https://doi.org/10.1016/j.bpc.2007.03.006
  • Joseph, A. P., Srinivasan, N., & De Brevern, A. G. (2012). Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies. Biochimie, 94(9), 2025–2034. https://doi.org/10.1016/J.BIOCHI.2012.05.028
  • Kelley, D. R., Schatz, M. C., & Salzberg, S. L. (2010). Quake: Quality-aware detection and correction of sequencing errors. Genome Biology, 11(11), R116. https://doi.org/10.1186/GB-2010-11-11-R116/FIGURES/6
  • Khorsand, P., & Hormozdiari, F. (2021). Nebula: Ultra-efficient mapping-free structural variant genotyper. Nucleic Acids Research, 49(8), e47–e47. https://doi.org/10.1093/NAR/GKAB025
  • Ku, S. Y., & Hu, Y. J. (2008). Protein structure search and local structure characterization. BMC Bioinformatics, 9(1), 349. doi: 10.1186/1471-2105-9-349/FIGURES/7.
  • Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., . . . Szustakowki, J. (2001). Initial sequencing and analysis of the human genome. Nature, 409, 860–921.
  • Léonard, S., Joseph, A. P., Srinivasan, N., Gelly, J. C., & De Brevern, A. G. (2014). MulPBA: An efficient multiple protein structure alignment method based on a structural alphabet. Journal of Biomolecular Structure & Dynamics, 32(4), 661–668. https://doi.org/10.1080/07391102.2013.787026
  • Levitt, M., & Chothia, C. (1976). Structural patterns in globular proteins. Nature, 261(5561), 552–558. https://doi.org/10.1038/261552a0
  • Lewis, T. E., Sillitoe, I., Andreeva, A., Blundell, T. L., Buchan, D. W. A., Chothia, C., Cuff, A., Dana, J. M., Filippis, I., Gough, J., Hunter, S., Jones, D. T., Kelley, L. A., Kleywegt, G. J., Minneci, F., Mitchell, A., Murzin, A. G., Ochoa-Montaño, B., Rackham, O. J. L., … Orengo, C. (2013). Genome3D: A UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Research, 41(Database issue), D499–D507. https://doi.org/10.1093/NAR/GKS1266
  • Lewis, T. E., Sillitoe, I., Dawson, N., Lam, S. D., Clarke, T., Lee, D., Orengo, C., & Lees, J. (2018). Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Research, 46(D1), D435–D439. https://doi.org/10.1093/NAR/GKX1069
  • Li, Q., Zhou, C., & Liu, H. (2009). Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins, 74(4), 820–836. https://doi.org/10.1002/PROT.22191
  • Li, Z.-C., Zhou, X.-B., Lin, Y.-R., & Zou, X.-Y. (2008). Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids, 35(3), 581–590. https://doi.org/10.1007/s00726-008-0084-z
  • Liang, Y., & Zhang, S. (2017). Predict protein structural class by incorporating two different modes of evolutionary information into Chou’s general pseudo amino acid composition. Journal of Molecular Graphics & Modelling, 78, 110–117. https://doi.org/10.1016/J.JMGM.2017.10.003
  • Liu, B., Gao, X., & Zhang, H. (2019). BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 47(20), e127–e127. https://doi.org/10.1093/NAR/GKZ740
  • Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., & Chou, K. C. (2015). Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 43(W1), W65–W71. https://doi.org/10.1093/NAR/GKV458
  • Liu, T., & Jia, C. (2010). A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of Theoretical Biology, 267(3), 272–275. https://doi.org/10.1016/j.jtbi.2010.09.007
  • Lo, C. C., & Chain, P. S. G. (2014). Rapid evaluation and quality control of next generation sequencing data with FaQCs. BMC Bioinformatics, 15(1), 366. doi: 10.1186/S12859-014-0366-2/TABLES/3.
  • Lu, J., Rincon, N., Wood, D. E., Breitwieser, F. P., Pockrandt, C., Langmead, B., Salzberg, S. L., & Steinegger, M. (2022). Metagenome analysis using the Kraken software suite. Nature Protocols, 17(12), 2815–2839. https://doi.org/10.1038/s41596-022-00738-y
  • Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., . . . Wang, J. (2012). SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience, 1(1), 18.
  • Madera, M., & Bateman, A. (2008). Profile Comparer: A program for scoring and aligning profile hidden Markov models. Bioinformatics, 24(22), 2630–2631. https://doi.org/10.1093/BIOINFORMATICS/BTN504
  • Marçais, G., & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6), 764–770. https://doi.org/10.1093/BIOINFORMATICS/BTR011
  • Martin, F. J., Amode, M. R., Aneja, A., Austine-Orimoloye, O., Azov, A. G., Barnes, I., Becker, A., Bennett, R., Berry, A., Bhai, J., Bhurji, S. K., Bignell, A., Boddu, S., Branco Lins, P. R., Brooks, L., Ramaraju, S. B., Charkhchi, M., Cockburn, A., Da Rin Fiorretto, L., … Flicek, P. (2023). Ensembl 2023. Nucleic Acids Research, 51(D1), D933–D941. https://doi.org/10.1093/NAR/GKAC958
  • Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A., & Punta, M. (2013). Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Research, 41(12), e121–e121. https://doi.org/10.1093/NAR/GKT263
  • Mizianty, M. J., & Kurgan, L. A. (2009). Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics, 10(1), 414. https://doi.org/10.1186/1471-2105-10-414
  • Murzin, A G., Brenner, S. E., Hubbard, T., & Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247(4), 536–540. https://doi.org/10.1006/jmbi.1995.0159
  • Nallapareddy, V., Bordin, N., Sillitoe, I., Heinzinger, M., Littmann, M., Waman, V. P., Sen, N., Rost, B., & Orengo, C. (2023). CATHe: Detection of remote homologues for CATH superfamilies using embeddings from protein language models. Bioinformatics, 39(1), 29. https://doi.org/10.1093/BIOINFORMATICS/BTAD029
  • Narwani, T. J., Etchebest, C., Craveur, P., Léonard, S., Rebehmed, J., Srinivasan, N., Bornot, A., Gelly, J.-C., & de Brevern, A. G. (2019). In silico prediction of protein flexibility with local structure approach. Biochimie, 165, 150–155. https://doi.org/10.1016/J.BIOCHI.2019.07.025
  • Nguyen, L. A. T., Dang, X. T., Le, T. K. T., Saethang, T., Tran, V. A., Ngo, D. L., Gavrilov, S., Nguyen, N. G., Kubo, M., Yamada, Y., & Satou, K. (2014). Predicting βeta-turns and βeta-turn types using a novel over-sampling approach. Journal of Biomedical Science and Engineering, 07(11), 927–940. https://doi.org/10.4236/jbise.2014.711090
  • Oates, M. E., Stahlhacke, J., Vavoulis, D. V., Smithers, B., Rackham, O. J. L., Sardar, A. J., Zaucha, J., Thurlby, N., Fang, H., & Gough, J. (2015). The SUPERFAMILY 1.75 database in 2014: A doubling of data. Nucleic Acids Research, 43(Database issue), D227–D233. https://doi.org/10.1093/NAR/GKU1041
  • Ondov, B. D., Treangen, T. J., Melsted, P., Mallonee, A. B., Bergman, N. H., Koren, S., & Phillippy, A. M. (2016). Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1), 132. doi: 10.1186/S13059-016-0997-X/FIGURES/5.
  • Orengo, C. A., & Taylor, W. R. (1996). SSAP: Sequential structure alignment program for protein structure comparison. Methods in Enzymology, 266, 617–635. doi: 10.1016/S0076-6879(96)66038-8.
  • Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., & Thornton, J. M. (1997). CATH–a hierarchic classification of protein domain structures. Structure, 5(8), 1093–1108. doi: 10.1016/S0969-2126(97)00260-8.
  • Pandini, A., Fornili, A., & Kleinjung, J. (2010). Structural alphabets derived from attractors in conformational space. BMC Bioinformatics, 11(1), 97. https://doi.org/10.1186/1471-2105-11-97
  • Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America, 98(17), 9748–9753. doi: 10.1073/PNAS.171285098/ASSET/8F490F1C-A68A-45AD-A4A9-6166D702BD74/ASSETS/GRAPHIC/PQ1712850005.JPEG.
  • Rangavittal, S., Stopa, N., Tomaszkiewicz, M., Sahlin, K., Makova, K. D., & Medvedev, P. (2019). DiscoverY: A classifier for identifying y chromosome sequences in male assemblies. BMC Genomics, 20(1), 641. doi: 10.1186/S12864-019-5996-3/FIGURES/5.
  • Redfern, O. C., Harrison, A., Dallman, T., Pearl, F. M. G., & Orengo, C. A. (2007). CATHEDRAL: A fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Computational Biology, 3(11), e232. https://doi.org/10.1371/JOURNAL.PCBI.0030232
  • Sander, O., Sommer, I., & Lengauer, T. (2006). Local protein structure prediction using discriminative models. BMC Bioinformatics, 7(1), 14. https://doi.org/10.1186/1471-2105-7-14/FIGURES/13.
  • Schuchhardt, J., Schneider, G., Reichelt, J., Schomburg, D., & Wrede, P. (1996). Local structural motifs of protein backbones are classified by self-organizing neural networks. Protein Engineering, 9(10), 833–842. https://doi.org/10.1093/protein/9.10.833
  • Sehnal, D., Bittrich, S., Deshpande, M., Svobodová, R., Berka, K., Bazgier, V., Velankar, S., Burley, S. K., Koča, J., & Rose, A. S. (2021). Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Research, 49(W1), W431–W437. https://doi.org/10.1093/nar/gkab314
  • Sillitoe, I., Bordin, N., Dawson, N., Waman, V. P., Ashford, P., Scholes, H. M., Pang, C. S. M., Woodridge, L., Rauer, C., Sen, N., Abbasian, M., Le Cornu, S., Lam, S. D., Berka, K., Varekova, I. H., Svobodova, R., Lees, J., & Orengo, C. A. (2021). CATH: Increased structural coverage of functional space. Nucleic Acids Research, 49(D1), D266–D273. https://doi.org/10.1093/NAR/GKAA1079
  • Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35(11), 1026–1028. https://doi.org/10.1038/nbt.3988
  • Steinegger, M., & Söding, J. (2018). Clustering huge protein sequence sets in linear time. Nature Communications, 9(1), 2542. https://doi.org/10.1038/s41467-018-04964-5
  • Steinegger, M., Meier, M., Mirdita, M., Vöhringer, H., Haunsberger, S. J., & Söding, J. (2019). HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics, 20(1), 473. doi: 10.1186/S12859-019-3019-7/FIGURES/7.
  • Suresh, V., & Parthasarathy, S. (2014). SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures. Protein and Peptide Letters, 21(8), 736–742. https://doi.org/10.2174/09298665113209990064
  • Suresh, V., Ganesan, K., & Parthasarathy, S. (2013). A protein block based fold recognition method for the annotation of twilight zone sequences. Protein and Peptide Letters, 20(3), 249–254. https://doi.org/10.2174/0929866511320030003
  • Thomas, A., Deshayes, S., Decaffmeyer, M., Van Eyck, M. H., Charloteaux, B., & Brasseur, R. (2006). Prediction of peptide structure: How far are we? Proteins, 65(4), 889–897. https://doi.org/10.1002/PROT.21151
  • Tung, C. H., Huang, J. W., & Yang, J. M. (2007). Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biology, 8(3), R31. doi: 10.1186/GB-2007-8-3-R31/FIGURES/9.
  • Tung, C.-H., & Nacher, J. C. (2013). A complex network approach for the analysis of protein units similarity using structural alphabet. International Journal of Bioscience, Biochemistry and Bioinformatics, 3, 433–437. https://doi.org/10.7763/IJBBB.2013.V3.250
  • Tunyasuvunakool, K., Adler, J., Wu, Z., Green, T., Zielinski, M., Žídek, A., Bridgland, A., Cowie, A., Meyer, C., Laydon, A., Velankar, S., Kleywegt, G. J., Bateman, A., Evans, R., Pritzel, A., Figurnov, M., Ronneberger, O., Bates, R., Kohl, S. A. A., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 590–596. https://doi.org/10.1038/s41586-021-03819-2
  • Unger, R., Harel, D., Wherland, S., & Sussman, J. L. (1989). A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins, 5(4), 355–373. https://doi.org/10.1002/PROT.340050410
  • van Kempen, M., Kim, S. S., Tumescheit, C., Mirdita, M., Lee, J., Gilchrist, C. L. M., Söding, J., & Steinegger, M. (2023). Fast and accurate protein structure search with Foldseek. Nature Biotechnology, 42(2), 243–246. https://doi.org/10.1038/s41587-023-01773-0
  • Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., Yuan, D., Stroe, O., Wood, G., Laydon, A., Žídek, A., Green, T., Tunyasuvunakool, K., Petersen, S., Jumper, J., Clancy, E., Green, R., Vora, A., Lutfi, M., … Velankar, S. (2022). AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1), D439–D444. https://doi.org/10.1093/NAR/GKAB1061
  • Vetrivel, I., Mahajan, S., Tyagi, M., Hoffmann, L., Sanejouand, Y.-H., Srinivasan, N., de Brevern, A. G., Cadet, F., & Offmann, B. (2017). Knowledge-based prediction of protein backbone conformation using a structural alphabet. PLoS One, 12(11), e0186215. https://doi.org/10.1371/journal.pone.0186215
  • Wang, Y., Fu, L., Ren, J., Yu, Z., Chen, T., & Sun, F. (2018). Identifying Group-Specific sequences for microbial communities using Long k-mer sequence signatures. Front Microbiol, 9(MAY), 329350. doi: 10.3389/FMICB.2018.00872/BIBTEX.
  • Wenzheng, B., Yuehui, C., & Dong, W. (2014). Prediction of protein structure classes with flexible neural tree. Biomed Mater Eng, 24(6), 3797–3806. https://doi.org/10.3233/BME-141209
  • Zhu, L., Davari, M. D., & Li, W. (2021). Recent advances in the prediction of protein structural classes: Feature descriptors and machine learning algorithms. Crystals, 11(4), 324. 2021, Vol. 11, Page 324, Mar. https://doi.org/10.3390/cryst11040324
  • Zhu, X. J., Feng, C. Q., Lai, H. Y., Chen, W., & Hao, L. (2019). Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowledge-Based Systems, 163, 787–793. https://doi.org/10.1016/j.knosys.2018.10.007
  • Zimmermann, O., & Hansmann, U. H. E. (2008). LOCUSTRA: Accurate prediction of local protein structure using a two-layer support vector machine approach. Journal of Chemical Information and Modeling, 48(9), 1903–1908. doi: 10.1021/CI800178A/SUPPL_FILE/CI800178A_SI_001.PDF.
  • Zuo, Y. C., & Li, Q. Z. (2009). Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides, 30(10), 1788–1793. https://doi.org/10.1016/J.PEPTIDES.2009.06.032