320
Views
64
CrossRef citations to date
0
Altmetric
Articles

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach

, , , , , & show all
Pages 1720-1730 | Received 29 Aug 2014, Accepted 18 Sep 2014, Published online: 28 Oct 2014
 

Abstract

DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.

Acknowledgments

The authors wish to thank the two anonymous reviewers, whose constructive comments were very helpful for strengthening the presentation of this study.

Additional information

Funding

Funding. This work was supported by the National Natural Science Foundation of China (No. 61300112, No. 61370165, and No. 61370010), the Natural Science Foundation of Guangdong Province (No. S2012040007390 and No. S2013010014475), the Scientific Research Innovation Foundation in Harbin Institute of Technology (Project No. HIT.NSRIF.2013103), the Shanghai Key Laboratory of Intelligent Information Processing, China (Grant No. IIPL-2012-002). The project sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.