349
Views
3
CrossRef citations to date
0
Altmetric
Research Articles

M6A-GSMS: Computational identification of N6-methyladenosine sites with GBDT and stacking learning in multiple species

ORCID Icon, , &
Pages 12380-12391 | Received 24 Jun 2021, Accepted 16 Aug 2021, Published online: 30 Aug 2021

References

  • Aggarwal, C. C. (2015). Data classification: algorithms and applications. CRC Press.
  • Akbar, S., & Hayat, M. (2018). iMethyl-STTNC: Identification of N6-methyladenosine sites by extending the idea of SAAC into Chou's PseAAC to formulate RNA sequences. Journal of Theoretical Biology, 455, 205–211. https://doi.org/10.1016/j.jtbi.2018.07.018
  • Alam, W., Ali, S. D., Tayara, H., & Chong, K. t. (2020). A CNN-based RNA N6-methyladenosine site predictor for multiple species using heterogeneous features representation. IEEE Access, 8, 138203–138209. https://doi.org/10.1109/ACCESS.2020.3002995
  • Barzilay, I., Sussman, J. L., & Lapidot, Y. (1973). Further studies on the chromatographic behaviour of dinucleoside monophosphates. Journal of Chromatography, 79, 139–146. https://doi.org/10.1016/S0021-9673(01)85282-1
  • Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A. F., Wilson, J. F., Agakov, F., Navarro, P., & Haley, C. S. (2015). Application of high-dimensional feature selection: Evaluation for genomic prediction in man. Scientific Reports, 5, 10312. https://doi.org/10.1038/srep10312
  • Boccaletto, P., Machnicka, M. A., Purta, E., Piatkowski, P., Baginski, B., Wirecki, T. K., de Crécy-Lagard, V., Ross, R., Limbach, P. A., Kotter, A., Helm, M., Bujnicki, J. M. (2018). MODOMICS: A database of RNA modification pathways. 2017 update. Nucleic Acids Research, 46(D1), D303–307. https://doi.org/10.1093/nar/gkx1030
  • Bodi, Z., Button, J. D., Grierson, D., & Fray, R. G. (2010). Yeast targets for mRNA methylation. Nucleic Acids Research, 38(16), 5327–5335. https://doi.org/10.1093/nar/gkq266
  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
  • Chen, T., Hao, Y.-J., Zhang, Y., Li, M.-M., Wang, M., Han, W., Wu, Y., Lv, Y., Hao, J., Wang, L., Li, A., Yang, Y., Jin, K.-X., Zhao, X., Li, Y., Ping, X.-L., Lai, W.-Y., Wu, L.-G., Jiang, G., … Zhou, Q. (2015). m(6)A RNA methylation is regulated by microRNAs and promotes reprogramming to pluripotency. Cell Stem Cell, 16(3), 289–301. https://doi.org/10.1016/j.stem.2015.01.016
  • Chen, W., Ding, H., Feng, P., Lin, H., & Chou, K.-C. (2016). iACP: A sequence-based tool for identifying anticancer peptides. Oncotarget, 7(13), 16895–16909. https://doi.org/10.18632/oncotarget.7815
  • Chen, W., Feng, P., Ding, H., Lin, H., & Chou, K.-C. (2015). iRNA-methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition . Analytical Biochemistry, 490, 26–33. https://doi.org/10.1016/j.ab.2015.08.021
  • Chen, W., Tang, H., & Lin, H. (2017). MethyRNA: A web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure & Dynamics, 35(3), 683–687. https://doi.org/10.1080/07391102.2016.1157761
  • Chen, W., Tran, H., Liang, Z., Lin, H., & Zhang, L. (2015). Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome . Scientific Reports, 5(1), 13859. https://doi.org/10.1038/srep13859
  • Chen, W., Xing, P., & Zou, Q. (2017). Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Scientific Reports, 7, 40242. https://doi.org/10.1038/srep40242
  • Cristianin, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press. https://doi.org/10.1609/aimag.v22i2.1566
  • Dominissini, D., Moshitch-Moshkovitz, S., Salmon-Divon, M., Amariglio, N., & Rechavi, G. (2013). Transcriptome-wide mapping of N(6)-methyladenosine by m(6)A-seq based on immunocapturing and massively parallel sequencing. Nature Protocols, 8(1), 176–189. https://doi.org/10.1038/nprot.2012.148
  • Dominissini, D., Moshitch-Moshkovitz, S., Schwartz, S., Salmon-Divon, M., Ungar, L., Osenberg, S., Cesarkas, K., Jacob-Hirsch, J., Amariglio, N., Kupiec, M., Sorek, R., & Rechavi, G. (2012). Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature, 485(7397), 201–206. https://doi.org/10.1038/nature11112
  • Feng, Z. P., & Zhang, C. T. (2000). Prediction of membrane protein types based on the hydrophobic index of amino acids. Journal of Protein Chemistry, 19(4), 269–275. https://doi.org/10.1023/A:1007091128394
  • Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N., Caruthers, M. H., Neilson, T., & Turner, D. H. (1986). Improved free-energy parameters for predictions of RNA duplex stability. Proceedings of the National Academy of Sciences of the United States of America, 83(24), 9373–9377. https://doi.org/10.2307/28623.
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  • Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2/3), 131–163. https://doi.org/10.1023/A:1007465528199
  • Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England), 28(23), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565
  • Geula, S., Moshitch-Moshkovitz, S., Dominissini, D., Mansour, A. A., Kol, N., Salmon-Divon, M., Hershkovitz, V., Peer, E., Mor, N., Manor, Y. S., Ben-Haim, M. S., Eyal, E., Yunger, S., Pinto, Y., Jaitin, D. A., Viukov, S., Rais, Y., Krupalnik, V., Chomsky, E., … Hanna, J. H.,. (2015). Stem cells. m6A mRNA methylation facilitates resolution of naïve pluripotency toward differentiation. Science (New York, N.Y.), 347(6225), 1002–1006. https://doi.org/10.1126/science.1261417
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
  • Gislason, P., Benediktsson, J., & Sveinsson, J. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27(4), 294–300. https://doi.org/10.1016/j.patrec.2005.08.011
  • Goni, J. R., Perez, A., Torrents, D., & Orozco, M. (2007). Determining promoter location based on DNA structure first-principles calculations. Genome Biology, 8(12), R263. https://doi.org/10.1186/gb-2007-8-12-r263
  • Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., & Chou, K.-C. (2014). iNuc-PseKNC: A sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics (Oxford, England), 30(11), 1522–1529. https://doi.org/10.1093/bioinformatics/btu083
  • Huang, Y., He, N., Chen, Y., Chen, Z., & Li, L. (2018). BERMP: A cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach. International Journal of Biological Sciences, 14(12), 1669–1677. https://doi.org/10.7150/ijbs.27819
  • Iqbal, M. J., Faye, I., Samir, B. B., & Said, A. M. (2014). Efficient feature selection and classification of protein sequence data in bioinformatics. TheScientificWorldJournal, 2014, 173319–173869. https://doi.org/10.1155/2014/173869
  • Jović, A., Brkić, K., & Bogunović, N. (2015). A review of feature selection methods with applications. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics. MIPRO, IEEE (pp. 1200–1205).
  • Ke, G., Meng, Q., Finley, T. (2017). LightGBM: A highly efficient gradient boosting decision tree. The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS2017), 4-9 Dec 2017, Long Beach, CA, USA.
  • Khan, A., Shah, S., Wahid, F., Khan, F. G., & Jabeen, S. (2017). Identification of microRNA precursors using reduced and hybrid features. Molecular bioSystems, 13(8), 1640–1645. https://doi.org/10.1039/c7mb00115k
  • Lai, H.-Y., Zhang, Z.-Y., Su, Z.-D., Su, W., Ding, H., Chen, W., & Lin, H. (2019). iProEP: A computational predictor for predicting promoter. Molecular Therapy. Nucleic Acids, 17, 337–346. https://doi.org/10.1016/j.omtn.2019.05.028
  • Li, G.-Q., Liu, Z., Shen, H.-B., & Yu, D.-J. (2016). TargetM6A: Identifying N6-methyladenosine sites from RNA sequences via position-specific nucleotide propensities and a support vector machine. IEEE Transactions on Nanobioscience, 15(7), 674–682. https://doi.org/10.1109/TNB.2016.2599115
  • Li, X. C., Wang, L., & Sung, E. (2008). AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence, 21(5), 785–795. https://doi.org/10.1016/j.engappai.2007.07.001
  • Lin, H., Deng, E.-Z., Ding, H., Chen, W., & Chou, K.-C. (2014). iPro54-PseKNC: A sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 42(21), 12961–12972. https://doi.org/10.1093/nar/gku1019
  • Liu, B., Fang, L., Liu, F., Wang, X., & Chou, K.-C. (2016). iMiRNA-PseDPC: MicroRNA precursor identification with a pseudo distance-pair composition approach. Journal of Biomolecular Structure & Dynamics, 34(1), 223–235. https://doi.org/10.1080/07391102.2015.1014422
  • Liu, B., Long, R., & Chou, K. C. (2016). iDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics (Oxford, England), 32(16), 2411–2418. https://doi.org/10.1093/bioinformatics/btw186
  • Liu, B., Wang, S. Y., Long, R., & Chou, K. C. (2017). iRSpot-EL: Identify recombination spots with an ensemble learning approach. Bioinformatics (Oxford, England), 33(1), 35–41. https://doi.org/10.1093/bioinformatics/btw539
  • Liu, J., Dou, X., Chen, C., Chen, C., Liu, C., Xu, M. M., Zhao, S., Shen, B., Gao, Y., Han, D., & He, C. (2020). N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science (New York, N.Y.), 367(6477), 580–586. https://doi.org/10.1126/science.aay6018
  • Liu, K. W., & Chen, W. (2020). IMRM: A platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics (Oxford, England), 36(11), 3336–3342. https://doi.org/10.1093/bioinformatics/btaa155
  • Liu, N., Dai, Q., Zheng, G., He, C., Parisien, M., & Pan, T. (2015). N(6)-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions . Nature, 518(7540), 560–564. https://doi.org/10.1038/nature14234
  • Liu, Z., Xiao, X., Yu, D.-J., Jia, J., Qiu, W.-R., & Chou, K.-C. (2016). pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical biochemistry, 497, 60–67. https://doi.org/10.1016/j.ab.2015.12.017
  • Lorenz, R., Bernhart, S. H., Höner Zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology: AMB, 6, 14–26. https://doi.org/10.1186/1748-7188-6-26
  • Ma, S., & Huang, J. (2008). Penalized feature selection and classification in bioinformatics. Briefings in Bioinformatics, 9(5), 392–403. https://doi.org/10.1093/bib/bbn027
  • Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta, 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9
  • Meyer, K. D., Saletore, Y., Zumbo, P., Elemento, O., Mason, C. E., & Jaffrey, S. R. (2012). Comprehensive Analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 149(7), 1635–1646. https://doi.org/10.1016/j.cell.2012.05.003
  • Michael, J. K., & Valiant, L. G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41(1), 67–95. https://doi.org/10.1145/174644.174647
  • Nazari, I., Tahir, M., Tayara, H., & Chong, K. T. (2019). iN6-Methyl (5-step): Identifying RNA N6-methyladenosine sites using deep learning mode via Chou's 5-step rules and Chou's general PseKNC. Chemometrics and Intelligent Laboratory Systems, 193, 103811. https://doi.org/10.1016/j.chemolab.2019.103811
  • Neumann, U., Genze, N., & Heider, D. (2017). EFS: An ensemble feature selection tool implemented as R-package and web-application. BioData Mining, 10(1), 21–29. https://doi.org/10.1186/s13040-017-0142-8
  • Pérez, A., Noy, A., Lankas, F., Luque, F. J., & Orozco, M. (2004). The relative flexibility of B-DNA and A-RNA duplexes: Database analysis. Nucleic Acids Research, 32(20), 6144–6151. https://doi.org/10.1093/nar/gkh954
  • Qiang, X., Chen, H., Ye, X., Su, R., & Wei, L. (2018). M6AMRFS: Robust prediction of N6-methyladenosine sites with sequence-based features in multiple species . Frontiers in Genetics, 9, 495. https://doi.org/10.3389/fgene.2018.00495
  • Rehman, M. U., Hong, K. J., Tayara, H., & Chong, K. t. (2021). m6A-NeuralTool: Convolution neural tool for RNA N6-methyladenosine site identification in different species. IEEE Access, 9, 17779–17786. https://doi.org/10.1109/ACCESS.2021.3054361
  • Roost, C., Lynch, S. R., Batista, P. J., Qu, K., Chang, H. Y., & Kool, E. T. (2015). Correction to “Structure and thermodynamics of N(6)-methyladenosine in RNA: A spring-loaded base modification”. Journal of the American Chemical Society, 137(5), 2107–2115. https://doi.org/10.1021/jacs.5b05858
  • Sanz, H., Valim, C., Vegas, E., Oller, J. M., & Reverter, F. (2018). SVM-RFE: Selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics, 19(1), 418–432. https://doi.org/10.1186/s12859-018-2451-4
  • Tahir, M., Tayara, H., & Chong, K. T. (2019). iPseU-CNN: Identifying RNA pseudouridine sites using convolutional neural networks. Molecular Therapy. Nucleic Acids, 16, 463–470. https://doi.org/10.1016/j.omtn.2019.03.010
  • Wang, X. F., & Yan, R. X. (2018). RFAthM6A: A new tool for predicting m6A sites in Arabidopsis thaliana. Plant Molecular Biology, 96(3), 327–337. https://doi.org/10.1007/s11103-018-0698-9
  • Wang, X., Lu, Z., Gomez, A., Hon, G. C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., Ren, B., Pan, T., & He, C. (2014). N6-methyladenosine-dependent regulation of messenger RNA stability. Nature, 505(7481), 117–120. https://doi.org/10.1038/nature12730
  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
  • Xiang, S., Liu, K., Yan, Z., Zhang, Y., & Sun, Z. (2016). RNAMethPre: A web server for the prediction and query of mRNA m6A sites. PLoS One, 11(10), e0162707–13. https://doi.org/10.1371/journal.pone.0162707
  • Xuan, J.-J., Sun, W.-J., Lin, P.-H., Zhou, K.-R., Liu, S., Zheng, L.-L., Qu, L.-H., & Yang, J.-H. (2018). RMBase v2.0: Deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 46(D1), D327–D334. https://doi.org/10.1093/nar/gkx934
  • Zhang, M., Sun, J.-W., Liu, Z., Ren, M.-W., Shen, H.-B., & Yu, D.-J. (2016). Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Analytical Biochemistry, 508, 104–113. https://doi.org/10.1016/j.ab.2016.06.001
  • Zhang, S., Duan, Z., Yang, W., Qian, C., & You, Y. (2021). iDHS-DASTS: Identifying DNase I hypersensitive sites based on LASSO and stacking learning. Molecular Omics, 17(1), 130–141. https://doi.org/10.1039/d0mo00115e
  • Zhang, Y., & Hamada, M. (2018). DeepM6ASeq: Prediction and characterization of m6A-containing sequences using deep learning. BMC Bioinformatics, 19(Suppl 19), 511–524. https://doi.org/10.1186/s12859-018-2516-4
  • Zhou, Y., Zeng, P., Li, Y.-H., Zhang, Z., & Cui, Q. (2016). SRAMP: Prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Research, 44(10), e91. https://doi.org/10.1093/nar/gkw104
  • Zou, Q., Xing, P., Wei, L., & Liu, B. (2019). Gene2Vec: Gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA (New York, NY), 25(2), 205–218. https://doi.org/10.1261/rna.069112.118

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.