935
Views
8
CrossRef citations to date
0
Altmetric
Research Paper

Accurate identification of RNA D modification using multiple features

ORCID Icon, , , &
Pages 2236-2246 | Received 21 Sep 2020, Accepted 23 Feb 2021, Published online: 17 Mar 2021

References

  • Li S, Mason CE. The pivotal regulatory landscape of RNA modifications, annual review of genomics and human genetics. Ann Rev Genom Hum Gen. 2014;15:127–150.
  • Meyer KD, Jaffrey SR. The dynamic epitranscriptome: N6-methyladenosine and gene expression control, nature reviews. Mol Cell Biol. 2014;15(5):313–326.
  • Kirchner S, Ignatova Z. Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nat Rev Genet. 2014;16(2):98–112.
  • Sun WJ, Li JH, Liu S, et al. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res. 2016;44(D1):D259–D265.
  • Roundtree IA, Evans ME, Pan T, et al. Dynamic RNA modifications in gene expression regulation. Cell. 2017;169(7):1187–1200.
  • Boccaletto P, Machnicka MA, Purta E, et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 2018;46(D1):D303–D307.
  • Guohua H, Jincheng L. Feature extractions for computationally predicting protein post- translational modifications. Curr Bioinf. 2018;13(4):387–395.
  • Dao F-Y, Lv H, Yang Y-H, et al. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput Struct Biotechnol J. 2020;18:1084–1091.
  • Lv H, Zhang Z-M, Li S-H, et al. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform. 2019;21(3):982–995.
  • Madison JT, Holley RW. The presence of 5,6-dihydrouridylic acid in yeast “soluble” ribonucleic acid. Biochem Biophys Res Commun. 1965;18(2):153–157.
  • Edmonds CG, Crain PF, Gupta R, et al. Posttranscriptional modification of tRNA in thermophilic archaea (Archaebacteria). J Bacteriol. 1991;173(10):3138–3148.
  • Sprinzl M, Vassilenko KS. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 2005;33:D139–D140.
  • Yu F, Tanaka Y, Yamashita K, et al. Molecular basis of dihydrouridine formation on tRNA, Proceedings of the National Academy of Sciences of the United States of America. PNAS. 2011;108:19593–19598.
  • Jones CI, Spencer AC, Hsu JL, et al. A counterintuitive Mg2+-dependent and modification-assisted functional folding of mitochondrial tRNAs. J Mol Biol. 2006;362(4):771–786.
  • Dalluge JJ, Hashizume T, Sopchik AE, et al. Conformational flexibility in RNA: the role of dihydrouridine. Nucleic Acids Res. 1996;24(6):1073–1079.
  • Dalluge JJ, Hamamoto T, Horikoshi K, et al. Posttranscriptional modification of tRNA in psychrophilic bacteria. J Bacteriol. 1997;179(6):1918–1923.
  • Kato T, Daigo Y, Hayama S, et al. A novel human tRNA-dihydrouridine synthase involved in pulmonary carcinogenesis. Cancer Res. 2005;65(13):5638.
  • Mittelstadt M, Frump A, Khuu T, et al. Interaction of human tRNA-dihydrouridine synthase-2 with interferon-induced protein kinase PKR. Nucleic Acids Res. 2007;36(3):998–1008.
  • Kuchino Y, Borek E. Tumour-specific phenylalanine tRNA contains two supernumerary methylated bases. Nature. 1978;271(5641):126–129.
  • Kellner S, Ochel A, Thüring K, et al. Absolute and relative quantification of RNA modifications via biosynthetic isotopomers. Nucleic Acids Res. 2014;42(18):e142–e142.
  • Xuan -J-J, Sun W-J, Lin P-H, et al. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2017;46(D1):D327–D334.
  • Boccaletto P, Machnicka MA, Purta E, et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 2017;46(D1):D303–D307.
  • Feng P, Xu Z, Yang H, et al. Identification of D modification sites by integrating heterogeneous features in saccharomyces cerevisiae. Molecules. 2019;24(3):380.
  • Xu Z-C, Feng P-M, Yang H, et al. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics. 2019;35(23):4922–4929.
  • Chan PP, Lowe TM. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2016;44(D1):D184–D189.
  • Zou Q, Lin G, Jiang X, et al. Sequence clustering in bioinformatics: an empirical study. Brief Bioinform. 2018;21:1–10.
  • Fu L, Niu B, Zhu Z, et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–3152.
  • Chen W, Yang H, Feng P, et al. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics. 2017;33(22):3518–3523.
  • Chen W, Tang H, Ye J, et al. iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucleic Acids. 2016;5:e332.
  • Chen W, Song X, Lv H, et al. iRNA-m2G: identifying N2-methylguanosine sites based on sequence-derived information. Mol Ther Nucleic Acids. 2019;18:253–258.
  • Chen Z, Zhao P, Li F, et al. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. 2019;21(3):1047–1057.
  • Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform. 2017;20(4):1280–1294.
  • Wang J, Chen S, Dong L, et al. CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table. Brief Bioinform. 2020. DOI:https://doi.org/10.1093/bib/bbaa063.
  • Nair AS, Sreenadhan SP. A coding measure scheme employing electron-ion interaction pseudopotential (EIIP). Bioinformation. 2006;1(6):197–202.
  • He W, Jia C, Duan Y, et al. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol. 2018;12(S4):44.
  • Han S, Liang Y, Ma Q, et al. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief Bioinform 2018;20: 2009-2027.
  • He WY, Jia CZ, Zou Q. 4mCPred: machine learning methods for DNA N-4-methylcytosine sites prediction. Bioinformatics. 2019;35:(4):593–601.
  • Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–2517.
  • Chen T, Guestrin C XGBoost: a Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery, 2016, 785–794.
  • Friedman J. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–1232.
  • Al-barakati HJ, Saigo H, Newman RH, et al. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics. 2019;15(3):189–204.
  • Liu K, Chen W, Lin H. XG-PseU: an eXtreme gradient boosting based method for identifying pseudouridine sites. Mol Genet Genomics. 2020;295(1):13–21.
  • Jia C, Bi Y, Chen J, et al. PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs. Bioinformatics. 2020;36(15):4276–4282.
  • Yu J, Shi S, Zhang F, et al. PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics. 2018;35(16):2749–2756.
  • Qu K, Zou QA. Review of DNA-binding proteins prediction methods. Curr Bioinf. 2018;13(4):14.
  • Wei L, Xing P, Shi G, et al. Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(4):1264–1273.
  • Wei L, Zhou C, Chen H, et al. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–4016.
  • Zhao X, Jiao Q, Li H, et al. ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles. Bmc Bioinformatics. 2020;21(1):43.
  • Zhou H, Chen C, Wang M, et al. Predicting golgi-resident protein types using conditional covariance minimization with xgboost based on multiple features fusion. Ieee Access. 2019;7:144154–144164.
  • Liu B, Luo Z, He J. sgRNA-PSM: predict sgRNAs on-target activity based on position-specific mismatch. Mol Ther Nucleic Acids. 2020;20:323–330.
  • Wang M, Cui X, Yu B, et al. SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting. Neural Comput Appl. 2020;32(17):13843–13862.
  • Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830.
  • Ljp VDM, Hinton GE. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9:2579–2605.
  • Yu L, Yao S, Gao L, et al. Conserved disease modules extracted from multilayer heterogeneous disease and gene networks for understanding disease mechanisms and predicting disease treatments. Front Genet. 2019;9:745.
  • Liu B, Zhu Y, Yan K. Fold-LTR-TCP: protein fold recognition based on triadic closure principle. Brief Bioinform. 2019;21(6):2185–2193.
  • Li C-C LB. MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks. Brief Bioinform. 2019;21:2133–2141.
  • Yu L, Zhao J, Gao L. Drug repositioning based on triangularly balanced structure for tissue-specific diseases in incomplete interactome. Artif Intell Med. 2017;77:53–63.
  • Zhang M, Xu Y, Li L, et al. Accurate RNA 5-methylcytosine site prediction based on heuristic physical-chemical properties reduction and classifier ensemble. Anal Biochem. 2018;550:41–48.
  • Zeng X, Zhu S, Liu X, et al. deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics. 2019;35(24):5191–5198.
  • Lin X, Quan Z, Wang Z-J, et al. A novel molecular representation with BiGRU neural networks for learning atom. Brief Bioinform. 2019;21(6):2099–2111.
  • Liu X, Hong Z, Liu J, et al. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform. 2019;21(2):486–497.
  • Zeng X, Lin Y, He Y, et al. Deep collaborative filtering for prediction of disease genes. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(5):1639–1647.
  • Meng C, Wei L, Zou Q. SecProMTB: support vector machine-based classifier for secretory proteins using imbalanced data sets applied to mycobacterium tuberculosis. PROTEOMICS. 2019;19(17):1900007.
  • Jin Q, Meng Z, Pham TD, et al. DUNet: a deformable network for retinal vessel segmentation. Knowledge-Based Syst. 2019;178:149–162.
  • Su R, Liu X, Wei L, et al. Deep-Resp-Forest: a deep forest model to predict anti-cancer drug response. Methods. 2019;166:91–102.
  • Su R, Liu X, Xiao G, et al. Meta-GDBP: a high-level stacked regression model to improve anticancer drug response prediction. Brief Bioinform. 2019;21(3):996–1005.
  • Wang Z, He W, Tang J, et al. Identification of highest-affinity binding sites of yeast transcription factor families. J Chem Inf Model. 2020;60(3):1876–1883.
  • Wang H, Ding Y, Tang J, et al. Identification of membrane protein types via multivariate information fusion with Hilbert–Schmidt Independence Criterion. Neurocomputing. 2020;383:257–269.
  • Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing. 2019;325:211–224.
  • Liu B, Li K. iPromoter-2L2.0: identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features. Mol Ther Nucleic Acids. 2019;18:80–87.
  • Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019;1(20):e127.
  • Branco P, Torgo L, Ribeiro RP. A survey of predictive modeling on imbalanced domains, ACM Comput. Surv 2016;49(2):31.
  • Kaur H, Pannu HS, Malhi AK. A systematic review on imbalanced data challenges in machine learning: applications and solutions, ACM comput. Surv 2019;52(4):79.
  • Batista G, Bazzan A, Monard M-C. Balancing training data for automated annotation of keywords: a case study. 2003.
  • Batista G, Bazzan A, Monard M-C. Balancing training data forautomated annotation of keywords: a case study. In II Brazilian Workshop on Bioinformatics. Brazil: Macaé. 2003
  • Wilson DL. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern. 1972;SMC-2(3):408–421.
  • Yang H, Yang W, Dao F-Y, et al. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief Bioinform. 2019;21(5):1568–1580.
  • Wei L, Wan S, Guo J, et al. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med. 2017;83:82–90.
  • Wei L, Xing P, Zeng J, et al. Improved prediction of protein–protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med. 2017;83:67–74.
  • Li J, Pu Y, Tang J, et al. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE J Biomed Health Inform. 2020;24(10):3012–3019.
  • Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J Theor Biol. 2019;462:230–239.
  • Shen Y, Ding Y, Tang J, et al. Critical evaluation of web-based prediction tools for human protein subcellular localization. Brief Bioinform. 2020;21(5):1628–1640. 1628-1640.
  • Jiang Q, Wang G, Zhang T, et al. Predicting human microRNA-disease associations based on support vector machine. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Hongkong, China. 2010, p. 467–472.
  • Davis J, Goadrich M. The relationship between precision-recall and ROC Curves. In: CML '06: Proceedings of the 23rd Iinternational Conference on Machine Learning, New York, United States. 2006, p. 223–240.
  • Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–874.
  • Wang GH, Wang YD, Feng WX, et al. Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells. BMC Genomics. 2008;9:S22.
  • Ding YJ, Tang JJ, Guo F. Identification of drug-target interactions via fuzzy bipartite local model. Neural Comput Appl. 2020;32(14):10303–10319.
  • Ding Y, Tang J, Guo F. Identification of drug-side effect association via semisupervised model and multiple kernel learning. IEEE J Biomed Health Inform. 2019;23(6):2619–2632.
  • Ding Y, Tang J, Guo F. Identification of drug-target interactions via multiple information integration. Inf Sci. 2017;418-419:546–560. [ 418-419].
  • Li Z, Tang J, Guo F. Learning from real imbalanced data of 14-3-3 proteins binding specificity. Neurocomputing. 2016;217:83–91.
  • Wang G, Luo X, Wang J, et al. MeDReaders: a database for transcription factors that bind to methylated DNA. Nucleic Acids Res. 2017;46(D1):D146–D151.
  • Zhao Y, Wang F, Juan L. MicroRNA Promoter identification in arabidopsis using multiple histone markers. Biomed Res Int. 2015;2005:861402.
  • Bizhi W, Hangxiao Z, Limei L, et al. A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Curr Bioinf. 2019;14(7):628–639.
  • Lv Z, Ao C, Zou Q. Protein function prediction: from traditional classifier to deep learning. PROTEOMICS. 2019;19:1900119.
  • Li P, Manman P, Bo L, et al. The advances and challenges of deep learning application in biological big data processing. Curr Bioinf. 2018;13(4):352–359.
  • Tang Y-J, Pang Y-H LB. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics. 2020;36:(21):5177–5186.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.