161
Views
3
CrossRef citations to date
0
Altmetric
Research Article

EightyDVec: a method for protein sequence similarity analysis using physicochemical properties of amino acids

, , , &
Pages 3-13 | Received 20 Feb 2021, Accepted 13 Jul 2021, Published online: 02 Sep 2021

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2.
  • Anfinsen CB. 1973. Principles that govern the folding of protein chains. Science. 181(4096):223–230. doi:10.1126/science.181.4096.223.
  • Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM. 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 33(6):623. doi:10.1038/nbt.3238.
  • Blaisdell BE. 1986. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Nat Acad Sci. 83(14):5155–5159. doi:10.1073/pnas.83.14.5155.
  • Bonham-Carter O, Steele J, Bastola D. 2013. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Brief Bioinform. 15(6):890–905. doi:10.1093/bib/bbt052.
  • Bora VB, Kothari AG, Keskar AG. 2016. Robust automatic pectoral muscle segmentation from mammograms using texture gradient and euclidean distance regression. J Digit Imaging. 29(1):115–125. doi:10.1007/s10278-015-9813-5.
  • Can T, Wang Y-F. 2003. CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In Proceedings of the 2003 IEEE Bioinformatics Conferenceon Computational Systems Bioinformatics, CSB2003, 169–179. Stanford, CA, USA.: IEEE.
  • Chiu W, Baker ML, Jiang W, Z Hong Zhou. 2002. Deriving folds of macromolecular complexes through electron cryomicroscopy and bioinformatics approaches. Curr Opin Struct Biol. 12(2):263–269. doi:10.1016/S0959-440X(02)00319-6.
  • Chou K-C, Shen H-B. 2007. Recent progress in protein subcellular location prediction. Anal Biochem. 370(1):1. doi:10.1016/j.ab.2007.07.006.
  • Dai Q, Li Y, Liu X, Yao Y, Cao Y, He P. 2013. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: from content to position. BMC Bioinform. 14(1):152. doi:10.1186/1471-2105-14-152.
  • Dehority BA. 2003. Rumen Microbiology. Vol. 372. Nottingham: Nottingham University Press.
  • Ding S, Zhang S, Li Y, Wang T. 2012. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie. 94(5):1166–1171. doi:10.1016/j.biochi.2012.01.022.
  • El Maaty MIA, Abo-Elkhier MM, Abd Elwahaab MA. 2010. 3D graphical representation of protein sequences and their statistical characterization. Physica A. 389(21):4668–4676. doi:10.1016/j.physa.2010.06.031.
  • El-Lakkani A, El-Sherif S. 2013. Similarity analysis of protein sequences based on 2d and 3d amino acid adjacency matrices. Chem Phys Lett. 590:192–195. doi:10.1016/j.cplett.2013.10.032.
  • Felsenstein J. 2004. Phylip (phylogeny inference package) version 3.6. distributed by the author. https://evolution.genetics.washington.edu/phylip.html
  • Ghosh A, Barman S. 2016. Application of euclidean distance measurement and principal component analysis for gene identification. Gene. 583(2):112–120. doi:10.1016/j.gene.2016.02.015.
  • Gok M, Ozcerit AT. 2013. A new feature encoding scheme for HIV-1 protease cleavage site prediction. Neural Comput Appl. 22(7–8):1757–1761. doi:10.1007/s00521-012-0967-5.
  • Grinstead CM, Snell JL. 2012. Introduction to probability. American Math. Soc.
  • Gupta MK, Niyogi R, Misra M. 2013. An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition. SAR QSAR Environ Res. 24(7):597–609. doi:10.1080/1062936X.2013.773378.
  • Hamori E, Ruskin J. 1983. H curves, a novel method of representation of nucleotide series especially suited for long dna sequences. J Biol Chem. 258(2):1318–1327. doi:10.1016/S0021-9258(18)33196-X.
  • Haubold B. 2013. Alignment-free phylogenetics and population genetics. Brief Bioinform. 15(3):407–418. doi:10.1093/bib/bbt083.
  • He P-A, Wei J, Yao Y, Tie Z. 2012. A novel graphical representation of proteins and its application. Physica A. 391(1–2):93–99. doi:10.1016/j.physa.2011.08.015.
  • He Z, Zhang J, Shi X-H, Hu -L-L, Kong X, Cai Y-D, Chou K-C. 2010. Predicting drug-target interaction networks based on functional groups and biological features. PLoS One. 5(3):e9603. doi:10.1371/journal.pone.0009603.
  • Hu -L-L, Huang T, Cai Y-D, Chou K-C. 2011. Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One. 6(7):e22989. doi:10.1371/journal.pone.0022989.
  • Huang T, Niu S, Xu Z, Huang Y, Kong X, Cai Yu-Dong, Chou Kuo-Chen, Veitia RA. 2011. Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One. 6(8):e22940. doi:10.1371/journal.pone.0022940.
  • Huang X, Zhang J. 1996. Methods for comparing a DNA sequence with a protein sequence. Bioinformatics. 12(6):497–506. doi:10.1093/bioinformatics/12.6.497.
  • Komatsu K, Zhu S, Fushimi H, Qui TK, Cai S, Kadota S. 2001. Phylogenetic analysis based on 18s rrna gene and matk gene sequences of panax vietnamensis and five related species. Planta Med. 67(5):461–465. doi:10.1055/s-2001-15821.
  • Kong L, Zhang L, Lv J. 2014. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of chou’s pseudo amino acid composition. J Theor Biol. 344:12–18. doi:10.1016/j.jtbi.2013.11.021.
  • Kumar S, Stecher G, Tamura K. 2016. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 33(7):1870–1874. doi:10.1093/molbev/msw054.
  • Li B-Q, Hu -L-L, Niu S, Cai Y-D, Chou K-C. 2012a. Predict and analyze s-nitrosylation modification sites with the mrmr and ifs approaches. J Proteomics. 75(5):1654–1665. doi:10.1016/j.jprot.2011.12.003.
  • Li B-Q, Huang T, Liu L, Cai Y-D, Chou K-C. 2012b. Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network. PLoS One. 7(4):e33393. doi:10.1371/journal.pone.0033393.
  • Li Y, Song T, Yang J, Zhang Y, Yang J. 2016. An alignment-free algorithm in comparing the similarity of protein sequences based on pseudo-markov transition probabilities among amino acids. PLoS One. 11(12):e0167430. doi:10.1371/journal.pone.0167430.
  • Li, Chun, Lili Xing, and Xin Wang. 2008. “2-D graphical representation of protein sequences and its application to coronavirus phylogeny,„ BMB reports, 41(3): 217–222.
  • Liu T, Jia C. 2010. A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. J Theor Biol. 267(3):272–275. doi:10.1016/j.jtbi.2010.09.007.
  • Mu Z, Wu J, Zhang Y. 2013. A novel method for similarity/dissimilarity analysis of protein sequences. Physica A. 392(24):6361–6366. doi:10.1016/j.physa.2013.08.008.
  • Otu HH, Sayood K. 2003. A new sequence distance measure for phylogenetic tree construction. Bioinformatics. 19(16):2122–2130. doi:10.1093/bioinformatics/btg295.
  • Pearson WR. 1990. Rapid and sensitive sequence comparison with fastp and fasta. Methods Enzymol. 183(1990):63–98.
  • Ralescu D, Adams G. 1980. The fuzzy integral. J Math Anal Appl. 75(2):562–570. doi:10.1016/0022-247X(80)90101-8.
  • Ren J, Song K, Deng M, Reinert G, Cannon CH, Sun F. 2015. Inference of markovian properties of molecular sequences from ngs data and applications to comparative genomics. Bioinformatics. 32(7):993–1000. doi:10.1093/bioinformatics/btv395.
  • Rout RK, Choudhury PP, Maity SP, Daya Sagar BS, Hassan SS. 2018. Fractal and mathematical morphology in intricate comparison between tertiary protein structures. Comput Methods Biomech Biomed Eng. 6(2):192–203.
  • Saw AK, Tripathy BC, Nandi S. 2019. Alignment-free similarity analysis for protein sequences based on fuzzy integral. Sci Rep. 9(1):2775. doi:10.1038/s41598-019-39477-8.
  • Shajii A, Yorukoglu D, Yu YW, Berger B. 2016. Fast genotyping of known snps through approximate k-mer matching. Bioinformatics. 32(17):i538–i544. doi:10.1093/bioinformatics/btw460.
  • Wang P, Hu L, Liu G, Jiang N, Chen X, Xu J, Zheng W, Li L, Tan M, Chen Z, et al. 2011. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS One. 6(4):e18476. doi:10.1371/journal.pone.0018476.
  • Wei L, Liao M, Gao X, Zou Q. 2014. An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans Nanobioscience. 14(4):339–349. doi:10.1109/TNB.2014.2352454.
  • Wei L, Liao M, Gao X, Zou Q. 2015. Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans Nanobioscience. 14(6):649–659. doi:10.1109/TNB.2015.2450233.
  • Wu Z-C, Xiao X, Chou K-C. 2010. 2D-MH: a web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. J Theor Biol. 267(1):29–34. doi:10.1016/j.jtbi.2010.08.007.
  • Yang J-Y, Peng Z-L, Chen X. 2010. Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinform. 11(1):S9. doi:10.1186/1471-2105-11-S1-S9.
  • Yao Y-H, Dai Q, Li C, He P-A, Nan X-Y, Zhang Y-Z. 2008. Analysis of similarity/dissimilarity of protein sequences. Proteins. 73(4):864–871. doi:10.1002/prot.22110.
  • Yu C, He RL, Yau SS-T. 2013. Protein sequence comparison based on k-string dictionary. Gene. 529(2):250–256. doi:10.1016/j.gene.2013.07.092.
  • Yu H-J, Huang D-S. 2012. Novel 20-D descriptors of protein sequences and it’s applications in similarity analysis. Chem Phys Lett. 531:261–266. doi:10.1016/j.cplett.2012.02.030.
  • Zhang L, Zhao X, Kong L. 2014a. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of chous pseudo amino acid composition. J Theor Biol. 355:105–110. doi:10.1016/j.jtbi.2014.04.008.
  • Zhang S, Ding S, Wang T. 2011. High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie. 93(4):710–714. doi:10.1016/j.biochi.2011.01.001.
  • Zhang S, Liang Y, Yuan X. 2014b. Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel–Ziv complexity. J Theor Biol. 341:71–77. doi:10.1016/j.jtbi.2013.10.002.
  • Zhang S, Ye F, Yuan X. 2012. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J Biomol Struct Dyn. 29(6):1138–1146. doi:10.1080/07391102.2011.672627.
  • Zhang Z, Wang W. 2014. RNA-Skim: a rapid method for RNA-seq quantification at transcript level. Bioinformatics. 30(12):i283–i292. doi:10.1093/bioinformatics/btu288.
  • Zielezinski A, Vinga S, Almeida J, Karlowski WM. 2017. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 18(1):186. doi:10.1186/s13059-017-1319-7.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.