1,751
Views
10
CrossRef citations to date
0
Altmetric
Article; Bioinformatics

Predicting protein function via multi-label supervised topic model on gene ontology

, , , &
Pages 630-638 | Received 01 Aug 2016, Accepted 14 Mar 2017, Published online: 24 Mar 2017

References

  • Pandey G, Kumar V, Steinbach M. Computational approaches for protein function prediction: a survey. Twin Cities: Department of Computer Science and Engineering, University of Minnesota; 2006.
  • Ruepp A, Zollner A, Maier D, et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 2004;32(18):5539–5545.
  • Gene Ontology Consortium. The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–D261.
  • Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–29.
  • Altschul SF, Madden TL, Schäffer A, et al. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids. 1997;25:3389–3402.
  • Yu G, Fu G, Wang J, et al. Predicting protein function via semantic integration of multiple networks. IEEE/ACM Trans Comput Biol Bioinf. 2016;13(2):220–232.
  • Piovesan D, Giollo M, Leonardi E, et al. INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic Acids Res. 2015;43(W1):W134–W140.
  • Piovesan D, Giollo M, Ferrari C, et al. Protein function prediction using guilty by association from interaction networks. Amino Acids. 2015;47(12):2583–2592.
  • Pellegrini M, Haynor D, Johnson JM. Protein interaction networks. Expert Rev Proteomics. 2004;1(2):239–249.
  • Cerri R, Barros RC, de Carvalho AC. A genetic algorithm for hierarchical multi-label classification. Proceedings of the 27th annual ACM symposium on applied computing. Trento: ACM; 2012. p. 250–255.
  • Clark WT, Radivojac P. Analysis of protein function and its prediction from amino acid sequence. Proteins Struct Funct Bioinf. 2011;79(7):2086–2096.
  • Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26(8):1819–1837.
  • Wu JS, Huang SJ, Zhou ZH. Genome-wide protein function prediction through multi-instance multi-label learning. IEEE/ACM Trans Comput Biol Bioinf. 2014;11(5):891–902.
  • Wu JS, Hu HF, Yan SC, et al. Multi-instance multilabel learning with weak-label for predicting protein function in electricigens. BioMed Res Int. 2015; 2015(1):1–9.
  • Vens C, Struyf J, Schietgat L, et al. Decision trees for hierarchical multi-label classification. Mach Learn. 2008;73(2):185–214.
  • Yu G, Rangwala H, Domeniconi C, et al. Predicting protein function using multiple kernels. IEEE/ACM Trans Comput Biol Bioinf. 2015;12(1):219–233.
  • Otero FEB, Freitas AA, Johnson CG. A hierarchical multi-label classification ant colony algorithm for protein function prediction. Memetic Comput. 2010;2(3):165–181.
  • Dumais ST. Latent semantic analysis, Ann Rev Inf Sci Technol. 2004;38(1):188–230.
  • Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
  • Lin L, Lin T, Wen D, et al. An overview of topic modeling and its current applications in bioinformatics. SpringerPlus. 2016;5(1):1608.
  • Rubin TN, Chambers A, Smyth P, et al. Statistical topic models for multi-label document classification. Mach Learn. 2012;88(1–2):157–208.
  • Ramage D, Hall D, Nallapati R, et al. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. Proceedings of the 2009 conference on empirical methods in natural language processing. Vol. 1. Singapore: Association for Computational Linguistics; 2009. p. 248–256.
  • Ramage D, Manning CD, Dumais S. Partially labeled topic models for interpretable text mining. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diago, CA: ACM. 2011. p. 457–465.
  • Yang Y, Downey D, Boyd-Graber J, et al. Efficient methods for incorporating knowledge into topic models. Conference on Empirical Methods in Natural Language Processing. September 17-21; Lisbon, Potugal; 2015. p. 308–317.
  • La Rosa M, Fiannaca A, Rizzo R, et al. Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinformatics. 2015;16(6):S2.
  • Bisgin H, Liu Z, Fang H, et al. Mining FDA drug labels using an unsupervised learning technique-topic modeling. BMC Bioinformatics. 2011;12(10):S11.
  • Pinoli P, Chicco D, Masseroli M. Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. 2013 IEEE 13th international conference on Bioinformatics and Bioengineering (BIBE).Chania, Greece: IEEE, 2013, p. 1–4.
  • Masseroli M, Chicco D, Pinoli P. Probabilistic latent semantic analysis for prediction of gene ontology annotations. The 2012 international joint conference on neural networks (IJCNN). Brisbane: IEEE; 2012. p. 1–8.
  • Pinoli P, Chicco D, Masseroli M. Latent Dirichlet allocation based on Gibbs sampling for gene function prediction. 2014 IEEE conference on computational intelligence in bioinformatics and computational biology. Honolulu, USA: IEEE, 2014,p. 1–8.
  • Leander S, Celine V, Jan S. HMC software and datasets [ Internet]; 2009. Available from: https://dtai.cs.kuleuven.be/clus/hmc-ens/
  • Krogel MA, Scheffer T. Multi-relational learning, text mining, and semi-supervised learning for functional genomics. Mach Learn. 2004;57(1–2):61–81.
  • Pan XY, Zhang YN, Shen HB. Large-scale prediction of human protein–protein interactions from amino acid sequence based on latent topic features. J Proteome Res. 2010;9(10):4992–5001.
  • Chua HN, Sung WK, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics. 2006;22(13):1623–1630.
  • Apweiler R, Bairoch A, Wu CH, et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;32(Suppl 1):115D.
  • Gene Ontology Consortium. Gene ontology [ Internet]; 1999–2015. Available from: http://geneontology.org/
  • Valentini G. True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE/ACM Trans Comput Biol Bioinf. 2011;8(3):832–847.
  • Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014;26(8):1819–1837.
  • Zhang ML, Zhou ZH. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng. 2006;18(10):1338–1351.
  • Tsoumakas G, Katakis I, Vlahavas I. Effective and efficient multilabel classification in domains with large number of labels. Proceedings of ECML/PKDD 2008 workshop on Mining Multidimensional Data (MMD’08) . Antwerp, Belgium: MMD2008, 2008, p. 30–44.
  • Cheng W, Hüllermeier E. Combining instance-based learning and logistic regression for multilabel classification. Mach Learn. 2009;76(2–3):211–225.
  • Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In Oded Maimonn Lior Rokach editors. Data mining and knowledge discovery handbook. New York, NY: Springer US, 2009, p. 667–685.