Abstract
The selection of effective features from various descriptors of chemical compounds and the exploitation of the most appropriate classifier is a momentous issue in improving overall accuracies of virtual screening of chemical compounds. In this article, the performance of various feature-selection methods and various classifiers of chemical compound-protein binding affinities are compared by using six series of compounds: cytochrome P450 2C9 inhibitors, multi-drug-resistance reversal compounds, estrogen receptor ligands, inhibitors of human ether-a-go-go-related genes, and ligands of serotonin receptor 5HT1A and 5HT2A. As a result, it was found that the genetic algorithm was superior to the other feature-selection methods, and its combination with Random Forests and Adaboosts or Baggings gave almost the same performance as support-vector machines and was superior to the other classifiers. The precision and recall of these methods were almost the same or ascendant to those of previous work. The automatically selected descriptors for each protein-compound affinity prediction were plausible and would be informative to interpret the resulting model.
Acknowledgements
We wish to acknowledge Dr Nishikawa for supporting our research and useful comments. We would like to acknowledge for Mr Shimada, Mr Horiuchi, Mr Nemoto, and Dr Yamaguchi in REPRORI (Reverse Proteomics Research Institute Co., Ltd) for helpful discussions. This work was supported in part by a grant of NEDO project of the Ministry of Economy. Trade, and Industry of Japan.