Search in:

Journal of Enzyme Inhibition and Medicinal Chemistry Volume 31, 2016 - Issue 6

Submit an article Journal homepage

Free access

2,678

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Pred-binding: large-scale protein–ligand binding affinity prediction

Piar Ali SharBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Weiyang TaoBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Shuo GaoBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Chao HuangBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Bohui LiBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Wenjuan ZhangBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Mohamed ShahenBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Chunli ZhengBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Yaofei BaiBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, China

Yonghua WangBioinformatics Center, College of Life Sciences, Northwest A & F University, Yangling, Shaanxi, ChinaCorrespondence[email protected]

show all

Pages 1443-1450 | Received 10 Jul 2015, Accepted 05 Jan 2016, Published online: 18 Feb 2016

Cite this article
https://doi.org/10.3109/14756366.2016.1144594
CrossMark

In this article

Abstract
Introduction
Methods
Results
Discussion
Conclusions
Supplemental material
Acknowledgements
References

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions
View PDF PDF

Abstract

Drug target interactions (DTIs) are crucial in pharmacology and drug discovery. Presently, experimental determination of compound–protein interactions remains challenging because of funding investment and difficulties of purifying proteins. In this study, we proposed two in silico models based on support vector machine (SVM) and random forest (RF), using 1589 molecular descriptors and 1080 protein descriptors in 9948 ligand–protein pairs to predict DTIs that were quantified by K_i values. The cross-validation coefficient of determination of 0.6079 for SVM and 0.6267 for RF were obtained, respectively. In addition, the two-dimensional (2D) autocorrelation, topological charge indices and three-dimensional (3D)-MoRSE descriptors of compounds, the autocorrelation descriptors and the amphiphilic pseudo-amino acid composition of protein are found most important for K_i predictions. These models provide a new opportunity for the prediction of ligand–receptor interactions that will facilitate the target discovery and toxicity evaluation in drug development.

Keywords

Binding affinity prediction
drug target interaction
random forest
support vector machine

Introduction

The interactions of proteins and targets can regulate several functions of pharmaceutically useful protein targets which include enzymes, ion channels, G protein-coupled receptors (GPCRs) and nuclear receptorsCitation1,Citation2. These drug target interactions (DTIs) are meaningful to explain biological activities of these proteins. Indeed, some small chemical binders target to multiple proteins and sometimes unfortunately bind to unwanted off-targetsCitation3. Such unwanted DTIs could be severe and cause harmful side effects. Therefore, identifying suitable targets is a leading urgency in recent drug discoveryCitation4. Although several methods, such as fluorescence assays, are implementable, the experimental determination of compound–protein interactions remains challenging because of funding investment and difficulties in purifying proteins even nowadays. Thus, the prediction of DTIs based on the structure of drugs and/or their potential targets is crucial for medicinal chemistryCitation5.

Computational approaches are reliable and preferable for identifying the drug–target interactionCitation4 and have provided more than 50% food and drug administration (FDA)-approved drugs targeting to membranous proteins (GPCRs), nuclear receptors and transporters proteinsCitation6. Presently, computational methodologies for DTI prediction are divided into two categories, ligand-based and receptor-based approachesCitation7. Among ligand-based methods, we can cite quantitative structure–activity relationships (QSAR) and a similarity search-based approach are most reported and used widelyCitation1. On the other hand, receptor-based methods, such as reverse docking, have also been applied in drug–target-binding affinity prediction, DTI prediction and drug repositioningCitation3,Citation7. In docking method, the structures are evaluated on the basis of a force field or a scoring functionCitation8. It predicts the preferred conformations and binding strength of a ligand molecule, typically a small organic molecule, as bound to a protein pocketCitation9. Docking provides a reasonable accuracy in predicting DTI when 3D structure of protein and large quantities of data are presentCitation10. Although there are some limitations, it is not accurate when those proteins whose 3D structures are unknown, especially for membranous proteins whose 3D structure is difficult to be crystallizedCitation11,Citation12. Although homologous modeling provides predicted 3D structures, but it is limited by availability of homologous templatesCitation13.

A drug's ability to affect a given protein target is related to the drug-target affinity which is basically determined by the chemical and protein structureCitation14. K_i value, representing protein–ligand binding affinity, is the inhibition constant for a drug to bind a targeted protein under a certain experimental conditionCitation15. For instance, drug binding to a receptor with high K_i level implies low-binding affinity, and vice versa for low K_i. Proteochemometric modeling, based on chemical descriptors (features of the ligand) in combination with particular protein descriptors (features of the target), has been successfully applied for studying bioactivity of moleculesCitation16. Inspired by these successes, we proposed fast predicting models by integrating molecular descriptors, protein descriptors with two powerful statistical methods support vector machine (SVM) and random forest (RF). The models consisted of internal cross-validation and external tests, which provides a reliable performance with high accuracy. Consequently, the new computational methods for binding-affinity prediction of protein-ligand interaction will provide a compelling opportunity to drug target discovery.

Methods

Dataset

The ligand and target dataset information with known binding affinity was collected from Psychoactive Drug Screening Program (PDSP)Citation17 K_i database (http://pdsp.med.unc.edu/kidb.php, as of 1 October 2014), which is a unique resource in the public domain and provides information on the abilities of drugs to interact with an expanding number of molecular targets. In the process of building dataset, some drugs and targets were removed due to their chemical descriptors could not be evaluated. Besides, after analyzing the original K_i values, we found that there are hundreds of same K_i values in the database, such as 1000 nM and 10 000 nM, which are potential censored data. These repeat K_i values may strongly affect the accuracy of the following modeling. So we excluded the ligand-target-K_i entries with the repeats number of K_i more than 70. This threshold was selected according to the following two criteria: (1) maintain entries as many as possible; (2) exclude the censored data as many as possible. Consequently, a dataset which is consisted of 9948 ligand-target-K_i pairs was constructed. There are 2003 ligands and 209 targets without redundancy in this dataset, which were used as a benchmark dataset.

Computation of molecular and protein descriptors

The ligand structures were collected from PubChemCitation18, DrugBankCitation19 and ChemSpider databasesCitation20. The protein sequences were collected from Uniprot databaseCitation21. To describe the structural and physiochemical features of a molecule, the molecular descriptors were calculated by DRAGON program 5.4 () (http://www.talete.mi.it/index.htm)Citation22. The Dragon software implements about more than 20 molecular descriptors categories (topological descriptors, constitutional descriptors, 2D autocorrelations, topological charge indices, eigenvalue-based indices, molecular properties, etc.), while 75 descriptors, such as nArCSOH, nRCSSH, nN = N and nSOOH, were eliminated, as these descriptors are constant values for all drugs. Eventually, 1589 descriptors were used for subsequent analysis (Supplementary Table S1). Similarly, to describe the amino acid sequence features of a protein, the protein sequence descriptors like peptide composition, dipeptide composition, autocorrelation descriptors, etc., were calculated using PROFEAT web serverCitation23. In total, for each protein, we obtained 1080 protein descriptors for following analysis (Supplementary Table S2).

Figure 1. The flowchart of the proposed method for predicting binding affinity.

Construction of training set and test set

To provide a more rigorous evaluation of a model's predictive capability, the dataset was split into training (used to build the model) and test (used to validate the model's accuracy) sets. Furthermore, the aforementioned dataset was randomly and equally split into five subsets, and one subset was selected as the test set, while others were selected as the training set. Besides, in this study, all descriptors data were prescaled to the range from −1 to 1, while the K_i data were transformed to log₂(K_i) for modeling.

Support vector machine

Support vector machine (SVM) is one of the state-of-the-art algorithms and has been extensively used for regression (SVR) and classification (SVC). When using SVM in regression tasks, the support vector regressor uses a cost function to measure the empirical risk in order to minimize the regression errorCitation24. In this work, this method has been applied to estimate the nonlinear relationships between the K_i values of ligand–protein pairs and their related descriptors. Briefly, given training samples where x_i denotes the ith input vector; y_i indicates the ith output value and n is the total number of samples. The modeling aim is to provide a regression function y = f(x) that predict outputs y_j as less error as much by using a new set of input vector x_j. Mathematically, an SVM regressor is produced as following: (1) where α_i and are Lagrange multipliers, which have been obtained by minimizing the regularized cost function, while, the kernel function K(x, x_i) was the radial basis function (RBF) which has the highest performance: (2) or (3)

There are two parameters associated with RBF kernels: regularization parameter C and the kernel parameter γ. Improper selection of these two parameters can cause overfitting or under fitting problems. Therefore, parameter C and γ play a crucial role in the performance of SVMCitation25. Here, the selection of C and γ were based on the overall accuracy of the internal fivefold cross-validation using the grid search methodCitation26. In this study, we used a portion of the codes from the LIBSVM suite of programs, which employs a modified version of the sequential minimal optimization (SMO) algorithmCitation27.

Random forest

Random forest (RF) is considered to be one of the most successful ensemble methods which is fast, robust to noise, does not overfit but provides possibilities for explanation and visualization of its outputCitation28. In this study, RF was applied as a regression method by constructing a multitude of decision trees at training time and outputting the mean prediction of the individual trees. The procedure can be briefly summarized as follows: suppose the number of training cases were N, and the total number of descriptors in the regressor were M. (I) Make n bootstrap sample sets from the original sample set. (II) Set up an unpruned tree with each sample set B_p. At each node of the tree, randomly choose m_try descriptors on which to make the decision at that node, and then calculate the best splits based on these m_try descriptors. (III) Output the mean prediction of the individual trees. During the training process, importantly, RF applies a built-in cross validation to estimate test set error via the use of out-of-bag (OOB) samples that are the data not in the bootstrap sample.

In this work, we built RF models using Random Forest soft package which is developed by Leo Breiman et al (available at http://www.stat.berkeley.edu/users/breiman/). As the out-of-bag (OOB) error is used to get the estimates of feature importance, the feature selection is not quite necessary in building the RF modelsCitation27,Citation28. Here, the number of trees and number of randomly selected descriptors, two tuning parameters in RF, were set to 500 and M/3, respectively.

Model validation

In order to wholly evaluate the performance of these in silico models, the internal and external validation were performed. Briefly, (1) the dataset was randomly and equally split into five subsets as mentioned previously, and four subsets were selected as training set for modeling, while remaining subset serves as test set for validating models. This process was repeated five times until every subset served as test set. (2) Five external independent validations were carried out for all models using different test sets. (3) The performance of RF model was compared with of SVM model using F test. (4)

Here, n₁ and n₂ are the number of samples in the test sets of two compared models, MSE₁ is the higher mean square error while MSE₂ is lower mean square error. The MSEs were defined as follows: (5) where y_i is the ith experimental log₂ (K_i) value, denotes ith predicted log₂ (K_i) value, n is the total number of samples in the test sets.

Results

Support vector machines

We adopted Gaussian radial basis function as kernel function for SVM to perform prediction. To provide the highest predicting performance, we optimized two parameters, that is, the penalty parameter C and the Gaussian function parameter γ using the grid search method. As shown in , the optimal pair of (C, γ) is found at the crimson zone which have C = 2⁷ and γ = 2 ⁻ ^7.5, respectively. Using these parameters, we run the SVM models and make prediction to training set and test set ().

Figure 2. Grid search ranging from C = 2⁴ to 2¹⁰ and γ = 2⁻¹⁰ to 2⁻⁴. The optimal prediction was obtained at the point of log₂^C=7 and log₂^γ = −7.5.

Figure 2. Grid search ranging from C = 24 to 210 and γ = 2−10 to 2−4. The optimal prediction was obtained at the point of log2C = 7 and log2γ = −7.5.

For training and test sets, the cross-validation coefficient of determination (R²) was 0.8596 ± 0.0043 and 0.6079 ± 0.0117, respectively. The mean squared error (MSE) for the training and test sets were 2.4591 ± 0.0729 and 7.0487 ± 0.2619, respectively (). No obvious overfitting can be observed from these models. The predicted values fall close to the experimental log₂ (K_i) values (). This good agreement between experimental and predicted log₂ (K_i) values for the test set compounds in the SVM model suggests that this model is reliable and can be applied for binding affinity prediction of certain protein and ligand.

Figure 3. Experimental versus RF predicted log₂ (K_i) values for training set (blue) and test set (red), the blue line is the fitted curve for training set. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

Figure 3. Experimental versus RF predicted log2 (Ki) values for training set (blue) and test set (red), the blue line is the fitted curve for training set. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

**Table 1. Fivefold cross validation of SVM and RF for log₂ (K_i) prediction of ligand target.**

Download CSV Display Table

Random forest

In this investigation, attempts have been made to predict the K_i values by RF algorithm. In order to build reliable models, all protein descriptors and ligand descriptors were used as features to build the model and both the training and test sets, which were also constructed. Two adjustable parameters in RF, the number of trees and the number of randomly selected features, were set to 500 and M/3, respectively, according to the defaults. The results obtained by the RF were shown R² of 0.8802 ± 0.0025, MSE of 2.2855 ± 0.0465 for the internal validation and R² of 0.6267 ± 0.0238, MSE 6.5828 ± 0.4075 for the external validation, respectively (). The MSE of the external validation is the same order of magnitude as the MSE of the external validation, which indicates that there is no obvious overfitting problem existing in this model. It can be clearly concluded that the present RF model exhibits satisfactory predictability from both the internal and external points of view with respect to the prediction of the test sets ().

Figure 4. Experimental versus RF predicted log₂ (K_i) values for training set (blue) and test set (red), the blue line is the fitted curve for training set. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

Figure 4. Experimental versus RF predicted log2 (Ki) values for training set (blue) and test set (red), the blue line is the fitted curve for training set. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.).

Additionally, one of the most significant functions of RF is providing possibilities for the explanation of its output for the accessible variable importance. An analysis of the underlying information of the most important descriptors extracted by the RF algorithm allows further conclusions to be drawn. By mapping the descriptors to the block (for compound) or group (for protein) where they belong to, we found a significant difference among the blocks and groups of descriptor ranks (one-way ANOVA, p < 0.001). Such as the blocks of 2D autocorrelation, topological charge indices and 3D-MoRSE descriptors for compound and the group of autocorrelation descriptors and the amphiphilic pseudoamino acid composition for protein are most important to K_i prediction (). On the basis of the variable importance outputs of RF model, the top 30 chemical descriptors and 30 protein descriptors were picked out for examples. The top 30 chemical descriptors mainly come from four blocks, including the 2D autocorrelation indices, 3D-MoRSE descriptors, RDF descriptors and topological descriptors (, Supplementary Table S3), and the top 30 protein descriptors were mainly from three groups, that is, autocorrelation descriptors, quasi-sequence-order descriptors and amphiphilic pseudoamino acid composition (, Supplementary Table S4), which are corresponding to the overall importance ().

Figure 5. The relative importance of descriptors: (A) the median of importance order of chemical descriptor blocks base on five-fold cross validation, (B) the median of importance order of protein descriptor groups base on fivefold cross validation, (C) the blocks of top 30 chemical descriptors in a RF model, (D) the blocks of top 30 protein descriptors in a RF model.

The chemical and protein descriptors employed in these statistical models may give some insights into the affinity of a ligand binding to a specific protein target. For example, about compound descriptor blocks, the 3D-MoRSE descriptors, such as Mor03p, Mor03v, Mor31p, Mor31v and Mor14m, are the descriptors characterizing the molecular size and 3D information, which are clearly important for the affinity of a ligand binding with the targetCitation29. The 2D autocorrelation, such as GATS1v, GATS1m, GATS1p, MATS7e, MATS6p, etc., describes how a considered property is distributed along a topological molecular structure, which have been reported a robust model for inhibitory activity toward p34^cdc2/cyclin b kinase enzyme of cytokinin-derived compoundCitation30,Citation31. As for the protein descriptor groups, amphiphilic pseudoamino acid composition were depicted by hydrophobicity and hydrophilicity of the constituent amino acids, which plays a pivotal role in protein foldingCitation32. And autocorrelation descriptors are defined based on the distribution of amino acid properties along the protein sequence, which can be used to test whether the value of a property at one residue is independent of the values of the property at neighboring residuesCitation31.

Discussion

The molecular information encoding into molecular descriptors is the first step into in silico chemoinformatics methods in drug design. There are a range of publications with prediction models for specific drug biological activity, drug toxicity, protein target interactionCitation33. Some studies developed models based on Shannon entropy measures and encompassing a multitarget network to predict multitarget drugs. For example, Riera–Fernandez et al. reported a new method based on the Markov–Shannon entropy to evaluate connectivity quality in complex networks, which show overall accuracy of 76.3% and can be applied for molecular and biomedical science and compound–protein bindingCitation34,Citation35. Many different computational chemistry, cheminformatics, bioinformatics and systems biology methods were applied to predict drug activities. LuanCitation36, AlonsoCitation37, Prado-PradoCitation38,Citation39 and coauthors developed multitarget/multiplexing quantitative structure–property relationships models for multiplexing assays of neurotoxicity/neuroprotective effects of drugs. These models are based on based on two different well-known software MARCH-INSIDE and DRAGON and show accuracy, specificity as well as sensitivity above 80–90%. Also, entropy methods are universal parameters useful to codify biologically relevant information in many systems. Tenorio-Borroto et al. trained and validated for the first time a quantitative structure–toxicity relationships model that correctly classified 91.1% multiplexing assay endpoints of 7903 drugsCitation40. All of these works focused on classification problems. If using classifier to predict the DTIs, the accuracy can be influenced by the criteria of classification. Besides, for different proteins, the definition of successful binding may vary. Fortunately, regression models, predicting the discrete rather than binary outcomes according to experimental data, are able to address these problems.

The quality of a QSAR model depends heavily on many factors, including particularly the representation of the object features, features selection and the application of the statistical approaches.Citation41 By mapping input variables into a high-dimensional feature space, SVM transforms complex problems into simpler problems to solve, which has been successfully applied in drug targeting, QSAR, artificial intelligence, financial domain, etcCitation2,42–46. Combining bagging strategy and decision tree theory, RF sometimes performs very well compared to many other methods, including support vector machines and neural networks and is robust against overfittingCitation25,Citation47. In this work, the object features were represented by compound and protein descriptors, and all descriptors were selected to build models except several descriptors which cannot be calculated. For statistical approach, K_i values were predicted by employing two state-of-art regression models including SVM and RF. Accordingly, as the difference among the performance could be found (), we applied the F-test to quantify the difference between these two modelsCitation44. Although the R² of RF seems to be higher than that of SVM for test sets, the F value (1.071) shows that it is not significant (p > 0.05), which suggests that the RF model and SVM model have a similar performance to predict K_i, at least for this dataset.

In this work, all the programs were implemented on a Huawei computer (Debian GNU/Linux 8.1 (jessie), two Intel® Xeon® processors X5650 (2.67 GHz) and 24 GB RAM). The total execution time of the fivefold cross-validation experiment of SVM (22 h, including 20 h for parameters optimization) is longer than that of the RF approach (15 h, without parameters optimization because of default setting). Therefore, RF is much more efficient than SVM for this study if including the optimization procedure. In addition to the robust and efficient features, an ideal regression model would be also interpretable. An important point is that SVM generates black box models that do not have the ability to explain, in an understandable form, the process of how the output was generatedCitation48. While RF provided the relative importance of variables, it should be useful for guiding the computer-aided drug design and assessing drug potential prior to synthesis.

Conclusions

In this study, we applied SVM and RF for binding affinity prediction problem based on a large scale data set, which generated two models with an attractive prediction power. The statistical results depicted as R² = 0.6079 and MSE = 7.0487 using SVM, R²= 0.6267 and MSE = 6.5828 using RF for the test set, which indicates that both models provide a potent K_i predictability without overfitting. To test which model is more reliable, we applied F test to compare the performance of RF and SVM. And it was found that they had similar prediction performance. Further, by deep analyzing on the importance of descriptors provided by RF model, we discovered that 2D autocorrelation, topological charge indices and 3D-MoRSE descriptors for compound, and amphiphilic pseudoamino acid composition, autocorrelation descriptors and quasi-sequence-order descriptors for protein are most important to K_i prediction. This adopted models and included above information will be helpful for screening and prediction of novel potent target inhibitors, and for further researches on the subject matter.

Supplementary material available online

Supplemental material

IENZ_1144594_Supplementray_Information.zip

Download Zip (52.1 KB)

Declaration of interest

The authors declared that they have no competing interests. The research is supported by the Fund of Northwest A & F University and is financially supported by the National Natural Science Foundation of China (Grant No. 31170796 and 81373892).

Related Research Data

Pred-binding: large-scale protein–ligand binding affinity prediction

Source: Figshare

Pred-binding: large-scale protein–ligand binding affinity prediction

Source: Figshare

Linking provided by

References

Yamanishi Y, Kotera M, Kanehisa M, et al Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 2010;26:i246–54
PubMed Web of Science ®Google Scholar
Li S, Xi L, Wang C, et al A novel method for protein-ligand binding affinity prediction and the related descriptors exploration. J Comput Chem 2009;30:900–9
PubMed Web of Science ®Google Scholar
Wang K, Sun J, Zhou S, et al Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity. PLoS Comput Biol 2013;9:e1003315
PubMed Web of Science ®Google Scholar
Gilson MK, Zhou HX. Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct 2007;36:21–42
PubMedGoogle Scholar
Gonzalez-Diaz H. Computational prediction of drug-target interactions in medicinal chemistry. Curr Top Med Chem 2013;13:1619–21
PubMed Web of Science ®Google Scholar
Shim J, Mackerell AD. Jr. Computational ligand-based rational design: role of conformational sampling and force fields in model development. Med Chem Comm 2011;2:356–70
Web of Science ®Google Scholar
Alaimo S, Pulvirenti A, Giugno R, et al Drug-target interaction prediction through domain-tuned network-based inference. Bioinformatics 2013;29:2004–8
PubMed Web of Science ®Google Scholar
Imai T, Hiraoka R, Seto T, et al Three-dimensional distribution function theory for the prediction of protein-ligand binding sites and affinities: application to the binding of noble gases to hen egg-white lysozyme in aqueous solution. J Phys Chem B 2007;111:11585–91
PubMed Web of Science ®Google Scholar
Li H, Leung KS, Wong MH, et al Improving AutoDock Vina using random forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol Inf 2015;34:115–26
PubMed Web of Science ®Google Scholar
Nagamine N, Sakakibara Y. Statistical prediction of protein chemical interactions based on chemical structure and mass spectrometry data. Bioinformatics 2007;23:2004–12
PubMed Web of Science ®Google Scholar
Yamanishi Y, Araki M, Gutteridge A, et al Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 2008;24:i232–40
PubMed Web of Science ®Google Scholar
Koehler Leman J, Ulmschneider MB, Gray JJ. Computational modeling of membrane proteins. Proteins 2015;83:1–24
PubMed Web of Science ®Google Scholar
Bordoli L, Kiefer F, Arnold K, et al Protein structure homology modeling using SWISS-MODEL workspace. Nat Protoc 2009;4:1–13
PubMed Web of Science ®Google Scholar
Galdeano C, Gadd MS, Soares P, et al Structure-guided design and optimization of small molecules targeting the protein-protein interaction between the von Hippel-Lindau (VHL) E3 ubiquitin ligase and the hypoxia inducible factor (HIF) alpha subunit with in vitro nanomolar affinities. J Med Chem 2014;57:8657–63
PubMed Web of Science ®Google Scholar
Cao DS, Liang YZ, Deng Z, et al Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS One 2013;8:e57680
PubMed Web of Science ®Google Scholar
Van Westen GJ, Swier RF, Cortes-Ciriano I, et al Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013;5:42
PubMed Web of Science ®Google Scholar
Roth BL, Lopez E, Patel S, et al The multiplicity of serotonin receptors: uselessly diverse molecules or an embarrassment of riches? Neuroscientist 2000;6:252–62
Web of Science ®Google Scholar
Wang Y, Xiao J, Suzek TO, et al PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 2009;37:W623–33
PubMed Web of Science ®Google Scholar
Law V, Knox C, Djoumbou Y, et al DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 2014;42:D1091–7
PubMed Web of Science ®Google Scholar
Pence HE, Williams A. ChemSpider: an online chemical information resource. J Chem Edu 2010;87:1123–4
Web of Science ®Google Scholar
Consortium U. Update on activities at the universal protein resource (UniProt) in 2013. Nucleic Acids Res 2013;41:D43–D7
PubMed Web of Science ®Google Scholar
Talete SRL. DRAGON For Windows (Software for Molecular Descriptor Calculations). version. 5.4, Milano, Italy; 2006. Available from: http://www.talete.mi.it
Google Scholar
Li ZR, Lin HH, Han LY, et al PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006;34:W32–7
PubMed Web of Science ®Google Scholar
Basak D, Pal S, Patranabis DC. Support vector regression. Neu InfPro - Lett Rev 2007;11:203–24
Google Scholar
Meyer D, Leisch F, Hornik K. The support vector machine under test. Neurocomputing 2003;55:169–86
Web of Science ®Google Scholar
Lin SW, Ying KC, Chen SC, et al Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 2008;35:1817–24
Web of Science ®Google Scholar
Platt J. Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14. Microsoft Res; 1998
Google Scholar
Robnik-Šikonja M. Improving random forests, Machine Learning: ECML 2004, Heidelberg, Berlin: Springer; 2004: 359–70
Google Scholar
Quillin ML, Breyer WA, Griswold IJ, et al Size versus polarizability in protein-ligand interactions: binding of noble gases within engineered cavities in phage T4 lysozyme. J Mol Biol 2000;302:955–77
PubMed Web of Science ®Google Scholar
Gonzalez MP, Caballero J, Helguera AM, et al 2D autocorrelation modelling of the inhibitory activity of cytokinin-derived cyclin-dependent kinase inhibitors. Bull Math Biol 2006;68:735–51
PubMed Web of Science ®Google Scholar
Caballero J, Fernandez L, Abreu JI, et al Amino acid sequence autocorrelation vectors and ensembles of Bayesian-regularized genetic neural networks for prediction of conformational stability of human lysozyme mutants. J Chem Inf Model 2006;46:1255–68
PubMed Web of Science ®Google Scholar
Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005;21:10–19
PubMed Web of Science ®Google Scholar
Munteanu CR, Gonzalez-Diaz H, Garcia R, et al Bio-AIMS collection of chemoinformatics web tools based on molecular graph information and artificial intelligence models. Comb Chem High Throughput Screen 2015;18:735–50
PubMed Web of Science ®Google Scholar
Romero Duran FJ, Alonso N, Caamano O, et al Prediction of multi-target networks of neuroprotective compounds with entropy indices and synthesis, assay, and theoretical study of new asymmetric 1,2-rasagiline carbamates. Int J Mol 2014;15:17035–64
Web of Science ®Google Scholar
Riera-Fernandez P, Munteanu CR, Escobar M, et al New Markov-Shannon entropy models to assess connectivity quality in complex networks: from molecular to cellular pathway, parasite-host, neural, industry, and legal-social networks. J Theor Biol 2012;293:174–88
PubMed Web of Science ®Google Scholar
Luan F, Cordeiro MN, Alonso N, et al TOPS-MODE model of multiplexing neuroprotective effects of drugs and experimental-theoretic study of new 1,3-rasagiline derivatives potentially useful in neurodegenerative diseases. Bioorg Med Chem 2013;21:1870–9
PubMed Web of Science ®Google Scholar
Alonso N, Caamano O, Romero-Duran FJ, et al Model for high-throughput screening of multitarget drugs in chemical neurosciences: synthesis, assay, and theoretic study of rasagiline carbamates. ACS Chem Neurosci 2013;4:1393–403
PubMed Web of Science ®Google Scholar
Prado-Prado F, García-Mera X, Escobar M, et al 2D MI-DRAGON: a new predictor for protein–ligands interactions and theoretic-experimental studies of US FDA drug-target network, oxoisoaporphine inhibitors for MAO-A and human parasite proteins. Eur J Med Chem 2011;46:5838–51
PubMed Web of Science ®Google Scholar
Prado-Prado F, García-Mera X, Alonso N, et al MI-DRA 3D: new model for reconstruction of US FDA drug-target network and theoretic-experimental studies of rasagiline derivatives inhibitors of AChE. ECSOC-16th International Electronic Conference on Synthetic Organic Chemistry. Spain: University of Santiago de Compostela; 2012
Google Scholar
Tenorio-Borroto E, Garcia-Mera X, Penuelas-Rivas CG, et al Entropy model for multiplex drug-target interaction endpoints of drug immunotoxicity. Curr Top Med Chem 2013;13:1636–49
PubMed Web of Science ®Google Scholar
Wang Y, Li Y, Ding J, et al Prediction of binding affinity for estrogen receptor alpha modulators using statistical learning approaches. Mol Divers 2008;12:93–102
PubMed Web of Science ®Google Scholar
Zhou W, Huang C, Li Y, et al A systematic identification of multiple toxin-target interactions based on chemical, genomic and toxicological data. Toxicology 2013;304:173–84
PubMed Web of Science ®Google Scholar
Yu H, Chen J, Xu X, et al A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS One 2012;7:e37608
PubMed Web of Science ®Google Scholar
Xu X, Zhang W, Huang C, et al A novel chemometric method for the prediction of human oral bioavailability. Int J Mol Sci 2012;13:6964–82
PubMed Web of Science ®Google Scholar
Wang W-C, Chau K-W, Cheng C-T, et al A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J Hydrol 2009;374:294–306
Web of Science ®Google Scholar
Min JH, Lee Y-C. Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Syst Appl 2005;28:603–14
Web of Science ®Google Scholar
Li Y, Wang Y, Ding J, et al In silico prediction of androgenic and nonandrogenic compounds using random forest. QSAR Comb Sci 2009;28:396–405
Google Scholar
Han L, Luo S, Yu J, et al Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inform 2015;19:728–34
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Pred-binding: large-scale protein–ligand binding affinity prediction

Abstract

Introduction