224
Views
36
CrossRef citations to date
0
Altmetric
Review

Recent progress in predicting protein sub-subcellular locations

, &
Pages 391-404 | Published online: 09 Jan 2014

References

  • UniProt Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res.38(Database issue), D142–D148 (2010).
  • Hongzhan H, Shukla HD, Cathy W, Satya S. Challenges and solutions in proteomics. Curr. Genomics8(1), 21–28 (2007).
  • Chou KC. Chapter 4. In: Gene Cloning & Expression Technologies. Weinrer PW, Lu Q (Eds). Eaton Publishing, MA, USA, 57–70 (2002).
  • Chou KC. Chapter 5. In: Automation in Proteomics and Genomics: An Engineering Case-Based Approach (Harvard–MIT interdisciplinary special studies courses). Alterovitz G, Benson R, Ramoni MF (Eds). Wiley & Sons Ltd., West Sussex, UK (2009).
  • Murphy RF, Boland MV, Velliste M. Towards a systematics for protein subcelluar location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. Proc. Int. Conf. Intell. Syst. Mol. Biol.8, 251–259 (2000).
  • Chou KC, Shen HB. Recent progress in protein subcellular location prediction. Anal. Biochem.370(1), 1–16 (2007).
  • Shen HB, Yang J, Chou KC. Methodology development for predicting subcellular localization and other attributes of proteins. Expert Rev. Proteomics4(4), 453–463 (2007).
  • Feng ZP. An overview on predicting the subcellular location of a protein. In Silico Biol.2(3), 291–303 (2002).
  • Nakai K, Horton P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci.24(1), 34–36 (1999).
  • Horton P, Park KJ, Obayashi T et al. WoLF PSORT: protein localization predictor. Nucleic Acids Res.35(Web Server issue), W585–W587 (2007).
  • Briesemeister S, Rahnenfuhrer J, Kohlbacher O. Going from where to why – interpretable prediction of protein subcellular localization. Bioinformatics26(9), 1232–1238 (2010).
  • Emanuelsson O, von Heijne G. Prediction of organellar targeting signals. Biochim. Biophys. Acta1541(1–2), 114–119 (2001).
  • Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci.14(11), 2804–2813 (2005).
  • Rastogi S, Rost B. Bioinformatics predictions of localization and targeting. Methods Mol. Biol.619, 285–305 (2010).
  • Cedano J, Aloy P, Perez-Pons JA, Querol E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol.266(3), 594–600 (1997).
  • Andrade MA, O’Donoghue SI, Rost B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol.276(2), 517–525 (1998).
  • Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins43(3), 246–255 (2001).
  • Chou KC. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics6, 262–274 (2009).
  • Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics21(1), 10–19 (2005).
  • Chou KC, Cai YD. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. Biophys. Res. Commun.320(4), 1236–1239 (2004).
  • Chou KC, Cai YD. Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J. Cell. Biochem.91(6), 1197–1203 (2004).
  • Cai YD, Chou KC. Predicting subcellular localization of proteins in a hybridization space. Bioinformatics20(7), 1151–1156 (2004).
  • Du P, Li Y. Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics7, 518 (2006).
  • Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics17(8), 721–728 (2001).
  • Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res.26(9), 2230–2236 (1998).
  • Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics20(1), 21–28 (2004).
  • Chou KC, Shen HB. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. J. Proteome Res.5(8), 1888–1897 (2006).
  • Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY. ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinformatics9, 80 (2008).
  • Yuan Z. Prediction of protein subcellular locations using Markov chain models. FEBS Lett.451(1), 23–26 (1999).
  • Chou KC, Shen HB. Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc.3(2), 153–162 (2008).
  • Chou KC, Shen HB. Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Nat. Sci.2, 1090–1103 (2010).
  • Pierleoni A, Martelli PL, Fariselli P, Casadio R. BaCelLo: a balanced subcellular localization predictor. Bioinformatics22(14), e408–e416 (2006).
  • Lin HN, Chen CT, Sung TY, Ho SY, Hsu WL. Protein subcellular localization prediction of eukaryotes using a knowledge-based approach. BMC Bioinformatics10(Suppl. 15), S8 (2009).
  • Blum T, Briesemeister S, Kohlbacher O. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics10, 274 (2009).
  • Nanni L, Lumini A. Ensemblator: an ensemble of classifiers for reliable classification of biological data. Pattern Recogn. Lett.28(5), 622–630 (2007).
  • Nair R, Rost B. Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol.348(1), 85–100 (2005).
  • Xu Q, Hu DH, Xue H, Yu W, Yang Q. Semi-supervised protein subcellular localization. BMC Bioinformatics10(Suppl. 1), S47 (2009).
  • Shen YQ, Burger G. ‘Unite and conquer’: enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics8, 420 (2007).
  • Cai YD, Lu L, Chen L, He JF. Predicting subcellular location of proteins using integrated-algorithm method. Mol. Divers.14(3), 551–558 (2009).
  • Nanni L, Brahnam S, Lumini A. High performance set of PseAAC and sequence based descriptors for protein classification. J. Theor. Biol.266(1), 1–10 (2010).
  • Nanni L, Lumini A. Using ensemble of classifiers in bioinformatics. In: Machine Learning Research Progress. Peters H, Vogel M (Eds). Nova Publisher, NY, USA (2008).
  • Chou KC, Zhang CT. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol.30(4), 275–349 (1995).
  • Chen C, Chen L, Zou X, Cai P. Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept. Lett.16(1), 27–31 (2009).
  • Chen C, Chen LX, Zou XY, Cai PX. Predicting protein structural class based on multi-features fusion. J. Theor. Biol.253(2), 388–392 (2008).
  • Ding H, Luo L, Lin H. Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition. Protein Pept. Lett.16(4), 351–355 (2009).
  • Du P, Li Y. Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information. J. Theor. Biol.253(3), 579–586 (2008).
  • Jiang X, Wei R, Zhao Y, Zhang T. Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location. Amino Acids34(4), 669–675 (2008).
  • Joshi RR, Sekharan S. Characteristic peptides of protein secondary structural motifs. Protein Pept. Lett.17(10), 1198–1206 (2010).
  • Li FM, Li QZ. Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept. Lett.15(6), 612–616 (2008).
  • Li S, Li H, Li M, Shyr Y, Xie L, Li Y. Improved prediction of lysine acetylation by support vector machines. Protein Pept. Lett.16(8), 977–983 (2009).
  • Lin H. The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J. Theor. Biol.252(2), 350–356 (2008).
  • Lin H, Ding H, Guo FB, Zhang AY, Huang J. Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept. Lett.15(7), 739–744 (2008).
  • Liu T, Zheng X, Wang C, Wang J. Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation. Protein Pept. Lett.17(10), 1263–1269 (2010).
  • Lu J, Niu B, Liu L, Lu WC, Cai YD. Prediction of small molecules’ metabolic pathways based on functional group composition. Protein Pept. Lett.16(8), 969–976 (2009).
  • Mohabatkar H. Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept. Lett.17(10), 1207–1214 (2010).
  • Nanni L, Lumini A. A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. Protein Pept. Lett.16(2), 163–167 (2009).
  • Shi MG, Huang DS, Li XL. A protein interaction network analysis for yeast integral membrane protein. Protein Pept. Lett.15(7), 692–699 (2008).
  • Tian F, Lv F, Zhou P, Yang Q, Jalbout AF. Toward prediction of binding affinities between the MHC protein and its peptide ligands using quantitative structure–affinity relationship approach. Protein Pept. Lett.15(10), 1033–1043 (2008).
  • Vilar S, Gonzalez-Diaz H, Santana L, Uriarte E. A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. J. Theor. Biol.261(3), 449–458 (2009).
  • Wang T, Xia T, Hu XM. Geometry preserving projections algorithm for predicting membrane protein types. J. Theor. Biol.262(2), 208–213 (2010).
  • Yang XY, Shi XH, Meng X et al. Classification of transcription factors using protein primary structure. Protein Pept. Lett.17(7), 899–908 (2010).
  • Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML. Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J. Theor. Biol.259(2), 366–372 (2009).
  • Zhou GP. An intriguing controversy over protein structural class prediction. J. Protein Chem.17(8), 729–738 (1998).
  • Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. Proteins44(1), 57–59 (2001).
  • Zhou GP, Doctor K. Subcellular location prediction of apoptosis proteins. Proteins50(1), 44–48 (2003).
  • Zhou XB, Chen C, Li ZC, Zou XY. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol.248(3), 546–551 (2007).
  • Zhi-Hua L, Huai-Liang W, Bo Z, Yuan-Qiang W, Yong L, Yu-Zhang W. Estimation of affinity of HLA-A*0201 restricted CTL epitope based on the SCORE function. Protein Pept. Lett.16(5), 561–569 (2009).
  • Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta405(2), 442–451 (1975).
  • Fawcett T. An introduction to ROC analysis. Pattern Recogn. Lett.27(8), 861–874 (2006).
  • Qin Z-C. ROC analysis for predictions made by probabilistic classifiers. In: International Conference on Machine Learning and Cybernetics. Yeung DS, Liu Z-Q, Wang X-Z, Yan H (Eds). Springer-Verlag Berlin, Heidelberg, Germany 3119–3124 (2005).
  • Honzik P, Kucera P, Hyncica O, Jirsik V. Novel method for evaluation of multi-class area under receiver operating characteristic. In: The Fifth International Conference on Soft Computing, Computing With Words and Perceptions in System Analysis, Decision and Control. IEEE, Famagusta, Cyprus (2009).
  • Chou KC. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun.278(2), 477–483 (2000).
  • Cui Q, Jiang T, Liu B, Ma S. Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics5, 66 (2004).
  • Chou KC, Elrod DW. Protein subcellular location prediction. Protein Eng.12(2), 107–118 (1999).
  • Chou KC, Shen HB. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One5(4), e9931 (2010).
  • Cai YD, Chou KC. Predicting 22 protein localizations in budding yeast. Biochem. Biophys. Res. Commun.323(2), 425–428 (2004).
  • Chou KC, Shen HB. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem. Biophys. Res. Commun.347(1), 150–157 (2006).
  • Shen HB, Chou KC. A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. Anal. Biochem.394(2), 269–274 (2009).
  • Shen HB, Chou KC. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. Biophys. Res. Commun.355(4), 1006–1011 (2007).
  • Shen HB, Chou KC. Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers85(3), 233–240 (2007).
  • Shen HB, Chou KC. Virus-mPLoc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn.28(2), 175–186 (2010).
  • Chou KC, Shen HB. Large-scale plant protein subcellular location prediction. J. Cell. Biochem.100(3), 665–678 (2007).
  • Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence. Genome Res.14(10A), 1957–1966 (2004).
  • Chou KC, Shen HB. Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One5(6), e11335 (2010).
  • Cooper GM. The Cell – A Molecular Approach. Sinauer Associates, MA, USA (2000).
  • Lei Z, Dai Y. An SVM-based system for predicting protein subnuclear localizations. BMC Bioinformatics6, 291 (2005).
  • Shen HB, Chou KC. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. Biochem. Biophys. Res. Commun.337(3), 752–756 (2005).
  • Denoeux T. A k-nearest neighbor classification rule based on Dempster–Shafer Theory. IEEE Trans. Syst. Man. Cybern.25, 804–813 (1995).
  • Zouhal LM, Denoeux T. An evidence-theoretic k-NN rule with parameter optimization. IEEE Trans. Syst. Man. Cybern.28, 263–271 (1998).
  • Shafer G. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ, USA (1976).
  • Schaffer AA, Aravind L, Madden TL et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res.29(14), 2994–3005 (2001).
  • Lei Z, Dai Y. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. BMC Bioinformatics7, 491 (2006).
  • Huang WL, Tung CW, Huang HL, Ho SY. Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems98(2), 73–79 (2009).
  • Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY. ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. Biosystems90(2), 573–581 (2007).
  • Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res.36(Database issue), D202–D205 (2008).
  • Ho SY, Chen JH, Huang MH. Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications. IEEE Trans. Syst. Man. Cybern. B Cybern.34(1), 609–620 (2004).
  • Leslie C, Eskin E, Noble WS. The spectrum kernel: a string kernel for SVM protein classification. Pac. Symp. Biocomput.2002, 564–575 (2002).
  • Mei S, Fei W. Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics11(Suppl. 1), S17 (2010).
  • Li FM, Li QZ. Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids34(1), 119–125 (2008).
  • Lio P, Vannucci M. Wavelet change-point prediction of transmembrane proteins. Bioinformatics16(4), 376–382 (2000).
  • Gao QB, Wang ZZ, Yan C, Du YH. Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett.579(16), 3444–3448 (2005).
  • Nanni L, Lumini A. Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids34(4), 653–660 (2008).
  • Du P, Cao S, Li Y. SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J. Theor. Biol.261(2), 330–335 (2009).
  • Dellaire G, Farrall R, Bickmore WA. The nuclear protein database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome. Nucleic Acids Res.31(1), 328–330 (2003).
  • Bickmore WA, Sutherland HG. Addressing protein localization within the nucleus. EMBO J.21(6), 1248–1254 (2002).
  • Shen HB, Chou KC. Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng. Des. Sel.20(11), 561–567 (2007).
  • Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics26(5), 680–682 (2010).
  • Brendel V. PROSET – a fast procedure to create non-redundant sets of protein sequences. Math. Comput. Model.16(6–7), 37–43 (1992).
  • Wang G, Dunbrack RL Jr. PISCES: a protein sequence culling server. Bioinformatics19(12), 1589–1591 (2003).
  • Garg A, Bhasin M, Raghava GP. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J. Biol. Chem.280(15), 14427–14432 (2005).
  • Casadio R, Martelli PL, Pierleoni A. The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief Funct. Genomic Proteomic7(1), 63–73 (2008).
  • Guo T, Hua S, Ji X, Sun Z. DBSubLoc: database of protein subcellular localization. Nucleic Acids Res.32(Database issue), D122–D124 (2004).
  • van Dijk AD, Bosch D, ter Braak CJ, van der Krol AR, van Ham RC. Predicting sub-Golgi localization of type II membrane proteins. Bioinformatics24(16), 1779–1786 (2008).
  • Chou WC, Yin Y, Xu Y. GolgiP: prediction of Golgi-resident proteins in plants. Bioinformatics26(19), 2464–2465 (2010).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.