1,317
Views
11
CrossRef citations to date
0
Altmetric
Research Article

RaFoSA: Random forests secondary structure assignment for coarse-grained and all-atom protein systems

| (Reviewing Editor)
Article: 1214061 | Received 11 May 2016, Accepted 14 Jul 2016, Published online: 08 Aug 2016
 

Abstract

Secondary structures (SS) of proteins are of great importance to structural, molecular, and computational biology and chemistry. Accurate and reliable method for automatic SS assignment when only coarse-grained (CG) information is available is needed. RaFoSA, a novel, accurate, and reliable method for automatic SS assignment based on coordinates of alpha carbon (CAC) atoms alone is presented here. Results from RaFoSa have been rigorously compared to those from Dictionary of Protein SS (DSSP, the acclaimed gold-standard for automatic SS assignment) and STRIDE. Requiring only CAC, RaFoSA achieves an agreement of 96% (and 94%) with DSSP (and STRIDE) that require all-atom and hydrogen-bonding information. No known automatic SS assignment method based on CG system has ever achieved such agreement with DSSP and STRIDE. Furthermore, RaFoSA has been applied to a real-life problem and its possible use for ranking proteins in their order of SS-based stability is shown in this paper. Overall, RaFoSA’s abilities to accurately and reliably assign SS to CG or all-atom protein systems make this work important. Furthermore, it must be emphasized that SS assignment by RaFoSA is different from (and is more rigorous than) SS prediction from amino acids sequence. Indeed, SS assignment by RaFoSA can differentiate between frames from molecular dynamics simulations trajectories, while existing methods for SS prediction from amino acid sequence cannot. Source codes and a webserver implementation of RaFoSA are available at http://bioinformatics.center/RaFoSA.

Public Interest Statement

Secondary structures (SS) of proteins are of great importance to structural, molecular, and computational biologists and chemists. Therefore, there is a need to develop accurate and reliable method for SS assignment when only alpha carbon information is available. This is why RaFoSA, a novel and reliable method for automatic SS assignment based on the coordinates of alpha carbon (CAC) atoms alone has been developed and presented here. RaFoSA accurately and reliably assigns SS to amino acids of proteins and achieves an agreement of 96% (and 94%) with DSSP (and STRIDE) that require all-atom and hydrogen-bonding information. RaFoSA may also help in ranking proteins based on the stability of their secondary structures. Source codes of RaFoSA are available at http://bioinformatics.center/tools/rafosa. A webserver that implements RaFoSA is available at http://bioinformatics.center/servers/rafosa.

Competing Interests

The author declare no competing interest.

Acknowledgements

I thank Dr Claus Andersen (for making a copy of DSSPCont available), Dr Gilles Labesse (for making a copy of PSEA available), and Dr David Case (for providing a free academic license of AMBER14). The extensive feedback from people (Dr Jiří Koubek, Chris YC Lo, ChiHong ChangChien, etc.) who have being using RaFoSA and/or offered criticisms that helped in improving RaFoSa and this paper are acknowledged.

Additional information

Funding

This work was financially supported by stipends from Academia Sinica, Taipei, Taiwan, and from National Tsing Hua University, Hsinchu, Taiwan.

Notes on contributors

Emmanuel Oluwatobi Salawu

Emmanuel Oluwatobi Salawu received bachelor of technology (with Honors) in Physiology at Ladoke Akintola University of Technology, and later proceeded to study Computer Science at the University of Hertfordshire where he earned master of science (with Distinction). He is currently a Bioinformatics and Structural Biology PhD candidate at National Tsing Hua University. He has interests in molecular dynamics simulations, protein structures, machine learning, image analysis, epidemiology, and malaria research. The work presented here is part of his research activities involving secondary structures of proteins and machine learning.