144
Views
7
CrossRef citations to date
0
Altmetric
Research Article

iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning

, ORCID Icon &
Pages 317-331 | Received 21 Dec 2020, Accepted 23 Feb 2021, Published online: 18 Mar 2021

References

  • C.C. Song, S.C. Zhang, and H. Huang, Choosing a suitable method for the identification of replication origins in microbial genomes, Front. Microbiol. 6 (2015), pp. 1049.
  • S. Waga and B. Stillman, The DNA replication fork in eukaryotic cells, Ann. Rev. Biochem. 67 (1998), pp. 721–751. doi:10.1146/annurev.biochem.67.1.721.
  • E. Ram, A. Kumar, S. Biswas, A. Kumar, S. Chaubey, M.I. Siddiqi, and S. Habib, Nuclear gyrB encodes a functional subunit of the Plasmodium falciparum gyrase that is involved in apicoplast DNA replication, Mol. Biochem. Parasitol. 154 (2007), pp. 30–39. doi:10.1016/j.molbiopara.2007.04.001.
  • G.I. McFadden and D.S. Roos, Apicomplexan plastids as drug targets, Trends Microbiol. 7 (1999), pp. 328–333. doi:10.1016/S0966-842X(99)01547-4.
  • D. Soldati, The apicoplast as a potential therapeutic target in Toxoplasma and other apicomplexan parasites, Parasitol. Today 15 (1999), pp. 5–7.
  • Y. Lubelsky, H.K. MacAlpine, and D.M. MacAlpine, Genome-wide localization of replication factors, Methods 57 (2012), pp. 187–195. doi:10.1016/j.ymeth.2012.03.022.
  • M.C. Marsolier-Kergoat, Asymmetry indices for analysis and prediction of replication origins in eukaryotic genomes, Plos One 7 (2012), pp. e45050. doi:10.1371/journal.pone.0045050.
  • M. Mechali, Eukaryotic DNA replication origins: Many choices for appropriate answers, Nat. Rev. Mol. Cell Biol. 11 (2010), pp. 728–738. doi:10.1038/nrm2976.
  • C.A. Nieduszynski, Y. Knox, and A.D. Donaldson, Genome-wide identification of replication origins in yeast by comparative genomics, Genes Dev. 20 (2006), pp. 1874–1879. doi:10.1101/gad.385306.
  • W. Chen, P. Feng, and H. Lin, Prediction of replication origins by calculating DNA structural properties, FEBS Lett. 586 (2012), pp. 934–938. doi:10.1016/j.febslet.2012.02.034.
  • W.C. Li, E.Z. Deng, and H. Ding, iORI-PseKNC: A predictor for identifying origin of replication with pseudo k-tuple nucleotide composition, Chemom. Intell. Lab. Syst. 141 (2015), pp. 100–106. doi:10.1016/j.chemolab.2014.12.011.
  • F.Y. Dao, H. Lv, F. Wang, C.Q. Feng, H. Ding, and H. Lin, Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique, Bioinformat. 35 (2018), pp. 2075–2083. doi:10.1093/bioinformatics/bty943.
  • Y.Q. Xing, G.Q. Liu, X.J. Zhao, and L. Cai, Genome-wide characterization and prediction of Arabidopsis thaliana replication origins, Biosyst. 124 (2014), pp. 1–6. doi:10.1016/j.biosystems.2014.07.001.
  • D.T. Do and N.Q.K. Le, Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features, Genomics 112 (2020), pp. 2445–2451. doi:10.1016/j.ygeno.2020.01.017.
  • F.Y. Dao, H. Lv, H. Zulfiqar, H. Yang, W. Su, H. Gao, H. Ding, and H. Lin, A computational platform to identify origins of replication sites in eukaryotes, Brief. Bioinform. (2020). doi:10.1093/bib/bbaa017.
  • B. Manavalan, S. Basith, T. Shin, and G. Lee, Computational prediction of species-specific yeast DNA replication origin via iterative feature representation, Brief. Bioinform. (2020). doi:10.1093/bib/bbaa304.
  • L.Y. Wei, W.J. He, A. Malik, R. Su, L.Z. Cui, and B. Manavalan, Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework, Brief. Bioinform. (2020). doi:10.1093/bib/bbaa275.
  • C. Chen, Q.M. Zhang, Q. Ma, and B. Yu, LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion, Chemom. Intel. Lab. Syst. 191 (2019), pp. 54–64. doi:10.1016/j.chemolab.2019.06.003.
  • F. Amini and G. Hu, A two-layer feature selection method using genetic algorithm and elastic net, Exp. Syst. Appl. 166 (2021), pp. 114072. doi:10.1016/j.eswa.2020.114072.
  • T. Oishi, Y. Hayashi, M. Noguchi, F. Yano, S. Kumada, K. Takayama, K. Okada, and Y. Onuki, Creation of novel large dataset comprising several granulation methods and the prediction of tablet properties from critical material attributes and critical process parameters using regularized linear regression models including interaction terms, Int. J. Pharm. 577 (2020), pp. 119083.
  • K. Takayama, T. Sato, K. Sato, H. Todo, Y. Obata, and K. Sugibayashi, Prediction of tablet characteristics based on sparse modeling for residual stresses simulated by the finite element method incorporating Drucker-Prager cap model, J. Drug Deliv. Sci. Technol. 52 (2019), pp. 1021–1031. doi:10.1016/j.jddst.2019.06.017.
  • K.C. Chou, Impacts of bioinformatics to medicinal chemistry, Med. Chem. 11 (2015), pp. 218–234. doi:10.2174/1573406411666141229162834.
  • Q. Zou, G. Lin, X. Jiang, X. Liu, and X. Zeng, Sequence clustering in bioinformatics: An empirical study, Brief. Bioinform. 21 (2020), pp. 1–10.
  • H. Lv, F.Y. Dao, Z.X. Guan, D. Zhang, J.X. Tan, Y. Zhang, W. Chen, and H. Lin, iDNA6mA-rice: A computational tool for detecting N6-methyladenine sites in rice, Front. Genet. 10 (2019), pp. 793. doi:10.3389/fgene.2019.00793.
  • Z. Chen, P. Zhao, F.Y. Li, T.T. Marquez-Lago, A. Leier, J. Revote, Y. Zhu, D.R. Powell, T. Akutsu, G.I. Webb, K.C. Chou, A.I. Smith, R.J. Daly, J. Li., and J. Song, iLearn: An integrated platform and meta-learner for feature engineering machine-learning analysis and modeling of DNA, RNA and protein sequence data, Brief. Bioinform. 21 (2020), pp. 1047–1057. doi:10.1093/bib/bbz041.
  • K.C. Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformat. 21 (2005), pp. 10–19. doi:10.1093/bioinformatics/bth466.
  • S. Akbar and M. Hayat, iMethyl-STTNC: Identification of N(6) methylade-nosine sites by extending the Idea of SAAC into Chou’s PseAAC to formulate RNA sequences, J. Theor. Biol. 455 (2018), pp. 205–211. doi:10.1016/j.jtbi.2018.07.018.
  • M. Arif, M. Hayat, and Z. Jan, iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition, J. Theor. Biol. 442 (2018), pp. 11–21. doi:10.1016/j.jtbi.2018.01.008.
  • E. Contreras-Torres, Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC, J. Theor. Biol. 454 (2018), pp. 139–145. doi:10.1016/j.jtbi.2018.05.033.
  • S. Krishnan, Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains, J. Theor. Biol. 445 (2018), pp. 62–74. doi:10.1016/j.jtbi.2018.02.008.
  • Y.Y. Liang and S.L. Zhang, Identify Gram-negative bacterial secreted protein types by incorporating different modes of PSSM into Chou’s general PseAAC via Kullback Leibler divergence, J. Theor. Biol. 454 (2018), pp. 22–29. doi:10.1016/j.jtbi.2018.05.035.
  • J. Mei, Y. Fu, and J. Zhao, Analysis and prediction of ion channel inhibitors by using feature selection and Chou’s general pseudo amino acid composition, J. Theor. Biol. 456 (2018), pp. 41–48. doi:10.1016/j.jtbi.2018.07.040.
  • W. Chen, T. Lei, D. Jin, H. Lin, and K.C. Chou, PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem. 456 (2014), pp. 53–60. doi:10.1016/j.ab.2014.04.001.
  • W. Chen and H. Lin, Pseudo nucleotide composition or PseKNC: An effective formulation for analyzing genomic sequences, Mol. Biosyst. 11 (2015), pp. 2620–2634. doi:10.1039/C5MB00155B.
  • W. Chen, H. Tang, J. Ye, and H. Lin, iRNA-PseU: Identifying RNA pseudouridine sites, Mol. Ther. – Nucl. Acid. 5 (2016), pp. 332.
  • B. Liu, L.Y. Fang, R. Long, X. Lan, and K.C. Zhou, iEnhancer-2L: A two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformat. 32 (2016), pp. 362–369. doi:10.1093/bioinformatics/btv604.
  • B. Liu, R. Long, and K.C. Chou, iDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformat. 32 (2016), pp. 2411–2418. doi:10.1093/bioinformatics/btw186.
  • B. Liu, S.Y. Wang, and R. Long, iRSpot-EL: Identify recombination spots with an ensemble learning approach, Bioinformat. 33 (2017), pp. 35–41. doi:10.1093/bioinformatics/btw539.
  • B. Liu, F. Yang, and K.C. Zhou, 2L-piRNA: A two-layer ensemble classifier for identifying piwi interacting RNAs and their function, Mol. Ther. – Nucl. Acids 7 (2017), pp. 267–277. doi:10.1016/j.omtn.2017.04.008.
  • B. Liu, F. Liu, X. Wang, J. Chen, L. Fang, and K.C. Chou, Pse-in-one: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res. 43 (2015), pp. W65–W71. doi:10.1093/nar/gkv458.
  • B. Liu and H. Wu, Pse-in-one 2.0: An improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci. 9 (2017), pp. 67–91.
  • L.C. Zhang and L. Kong, iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou’s pseudo components, J. Theor. Biol. 441 (2018), pp. 1–8. doi:10.1016/j.jtbi.2017.12.025.
  • S.L. Zhang, Z.P. Duan, W.H. Yang, C.L. Qian, and Y.W. You, iDHS-DASTS: Identifying DNase I hypersensitive sites based on LASSO and stacking learning, Molec. Omics 17 (2021), pp. 130–141. doi:10.1039/D0MO00115E.
  • P. Gislason, J. Benediktsson, and J. Sveinsson, Random forests for land cover classification, Pattern Recogn. Lett. 27 (2006), pp. 294–300. doi:10.1016/j.patrec.2005.08.011.
  • X. Li, L. Wang, and E. Sung, AdaBoost with SVM-based component classifiers, Eng Appl. Artif. Intell. 21 (2008), pp. 785–795. doi:10.1016/j.engappai.2007.07.001.
  • N. Alexey and K. Alois, Gradient boosting machines, A tutorial, Front Neurorobot. 7 (2013), pp. 21.
  • P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (2006), pp. 3–42. doi:10.1007/s10994-006-6226-1.
  • J.A.K. Suykens and J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett. 9 (1999), pp. 293–300. doi:10.1023/A:1018628609742.
  • W.Y. He, C.Z. Jia, and Q. Zou, 4mCPred: Machine learning methods for DNAN4-methylcytosine sites prediction, Bioinformat. 35 (2019), pp. 593–601. doi:10.1093/bioinformatics/bty668.
  • L.Y. Wei, S. Luan, L.A.E. Nagai, R. Su, and Q. Zou, Exploring sequence based features for the improved prediction of DNA N4-methylcytosine sites in multiple species, Bioinformat. 35 (2019), pp. 1326–1333. doi:10.1093/bioinformatics/bty824.
  • Y.P. Zhang and Q. Zou, PPTPP: A novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning, Bioinformat. 36 (2020), pp. 3982–3987. doi:10.1093/bioinformatics/btaa275.
  • L.Y. Wei, P.W. Xing, J.J. Tang, and Q. Zou, PhosPred-RF: A novel sequence-based predictor for phosphorylation sites using sequential information only, IEEE Trans. Nanobiosci. 16 (2017), pp. 240–247. doi:10.1109/TNB.2017.2661756.
  • J.S. Wang and S.L. Zhang, PA-PseU: An incremental passive-aggressive based method for identifying RNA pseudouridine sites via Chou’s 5-steps rule, Chemom. Intell. Lab. Syst.210 (2021), pp. 104250. doi:10.1016/j.chemolab.2021.104250.
  • S.L. Zhang and T. Xue, Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting, Molec Genet. Genom. 295 (2020), pp. 1431–1442. doi:10.1007/s00438-020-01711-8.
  • R. Su, J. Hu, Q. Zou, B. Manavalan, and L.Y. Wei, Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools, Brief. Bioinformatics 124 (2019), pp. 1–13.
  • J.H. Jia, Z. Liu, X. Xiao, B.X. Liu, and K.C. Chou, iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J. Theor. Biol. 377 (2015), pp. 47–56. doi:10.1016/j.jtbi.2015.04.011.
  • S. Basith, B. Manavalan, T.H. Shin, and G. Lee, iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomized tree, Comput. Struct. Biotechnol. J. 16 (2018), pp. 412–420. doi:10.1016/j.csbj.2018.10.007.
  • B. Manavalan, R.G. Govindaraj, T.H. Shin, M.O. Kim, and G. Lee, iBCE-EL: A new ensemble learning framework for improved linear B-cell epitope prediction, Front Immunol. 9 (1695), pp. 2018.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.