205
Views
5
CrossRef citations to date
0
Altmetric
Articles

SAR and QSAR models of cyclooxygenase-1 (COX-1) inhibitors

, &
Pages 755-784 | Received 01 Jun 2018, Accepted 14 Aug 2018, Published online: 02 Oct 2018
 

ABSTRACT

Cyclooxygenase-1 (COX-1) is one isoform of COX, and it is a main target of nonsteroidal anti-inflammatory drugs (NSAIDs). It is important to develop efficient and selective COX-1 inhibitors. In this work, 12 classification models for 1530 cyclooxygenase-1 (COX-1) inhibitors were built by support vector machine (SVM), decision tree (DT) and random forest (RF) methods. The best classification model (model 1A) was built by SVM with MACCS fingerprints. The classification accuracies for the training and test sets were 99.67% and 97.39%, respectively. The Matthews correlation coefficient (MCC) of the test set was 0.94. We also divided the 1530 COX-1 inhibitors into nine subsets according to their different scaffolds using Kohonen’s self-organizing map (SOM). In addition, six quantitative structure–activity relationship (QSAR) models for 181 COX-1 inhibitors whose IC50 were measured by enzyme immunoassay were built by multiple linear regression (MLR) and SVM. The best QSAR model (model 5A) was built by SVM with CORINA Symphony descriptors. The correlation coefficients of the training and test sets are 0.93 and 0.84, respectively. The models built in this study can be obtained from the authors.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (21675010), and ‘Chemical Grid Project’ of Beijing University of Chemical Technology. We thank the Molecular Networks GmbH, Nuremberg, Germany for providing the CORINA Symphony and SONNIA software programs for our scientific work.

Disclosure statement

The authors report no conflicts of interest.

Supplementary material

The supplementary material is available here at: https://doi.org/10.1080/1062936X.2018.1513952.

1. The optimum parameters and details of twelve classification models

For Model 1A, the penalty parameter C is 256, while gamma is 0.003906. The classification accuracies of the training set and test set are 99.67% and 97.39%, respectively. Moreover, on the training set, 5-fold cross-validation is 95.43%, 10-fold cross-validation is 96.24%, and leave-one-out cross-validation is 96.41%. On the test set, SE is 96.94%, SP is 97.60%, and MCC is 0.94.

For Model 1B, the penalty parameter C is 512, while gamma is 0.001953. The classification accuracies of the training set and test set are 99.67% and 96.87%, respectively. Moreover, on the training set, 5-fold cross-validation is 95.87%, 10-fold cross-validation is 95.70%, and leave-one-out cross-validation is 95.71%. On the test set, SE is 95.50%, SP is 97.60%, and MCC is 0.93.

For Model 1C, the penalty parameter C is 16, while gamma is 128. The classification accuracies of the training set and test set are 98.37% and 86.19%, respectively. Moreover, on the training set, 5-fold cross-validation is 86.36%, 10-fold cross-validation is 86.36%, and leave-one-out cross-validation is 86.68%. On the test set, SE is 67.35%, SP is 96.15%, and MCC is 0.69.

For Model 1D, the penalty parameter C is 4, while gamma is 128. The classification accuracies of the training set and test set are 96.20% and 89.34%, respectively. Moreover, on the training set, 5-fold cross-validation is 86.62%, 10-fold cross-validation is 86.85%, and leave-one-out cross-validation is 86.46%. On the test set, SE is 77.48%, SP is 95.67%, and MCC is 0.76.

For Model 2A, the optimal values of criterion and max_features are ‘entropy’ and ‘None’, respectively. The classification accuracies of the training set and test set are 100.00% and 94.77%, respectively. Moreover, on the training set, 5-fold cross-validation is 90.12%, 10-fold cross-validation is 91.02%, and leave-one-out cross-validation is 90.85%. On the test set, SE is 87.76%, SP is 98.08%, and MCC is 0.88.

For Model 2B, the optimal values of criterion and max_features are ‘entropy’ and ‘None’, respectively. The classification accuracies of the training set and test set are 100.00% and 96.24%, respectively. Moreover, on the training set, 5-fold cross-validation is 92.24%, 10-fold cross-validation is 92.98%, and leave-one-out cross-validation is 92.82%. On the test set, SE is 93.69%, SP is 97.60%, and MCC is 0.92.

For Model 2C, the optimal values of criterion and max_features are ‘entropy’ and ‘sqrt’, respectively. The classification accuracies of the training set and test set are 100.00% and 83.99%, respectively. Moreover, on the training set, 5-fold cross-validation is 83.33%, 10-fold cross-validation is 83.99%, and leave-one-out cross-validation is 84.15%. On the test set, SE is 67.35%, SP is 91.83%, and MCC is 0.62.

For Model 2D, the optimal values of criterion and max_features are ‘entropy’ and ‘None’, respectively. The classification accuracies of the training set and test set are 100.00% and 87.46%, respectively. Moreover, on the training set, 5-fold cross-validation is 82.16%, 10-fold cross-validation is 83.90%, and leave-one-out cross-validation is 82.33%. On the test set, SE is 81.98%, SP is 90.38%, and MCC is 0.72.

For Model 3A, the optimal values of n_estimators, criterion and max_features are 87, ‘entropy’, and ‘log2’, respectively. The classification accuracies of the training set and test set are 100.00% and 95.42%, respectively. Moreover, on the training set, 5-fold cross-validation is 94.28%, 10-fold cross-validation is 94.12%, and leave-one-out cross-validation is 94.36%. On the test set, SE is 86.73%, SP is 99.52%, and MCC is 0.90.

For Model 3B, the optimal values of n_estimators, criterion and max_features are 72, ‘entropy’, and ‘log2’, respectively. The classification accuracies of the training set and test set are 100.00% and 95.92%, respectively. Moreover, on the training set, 5-fold cross-validation is 94.05%, 10-fold cross-validation is 94.54%, and leave-one-out cross-validation is 95.05%. On the test set, SE is 90.99%, SP is 98.56%, and MCC is 0.91.

For Model 3C, the optimal values of n_estimators, criterion and max_features are 27, ‘entropy’, and ‘log2’, respectively. The classification accuracies of the training set and test set are 99.92% and 89.22%, respectively. Moreover, on the training set, 5-fold cross-validation is 86.68%, 10-fold cross-validation is 86.68%, and leave-one-out cross-validation is 88.89%. On the test set, SE is 72.45%, SP is 97.12%, and MCC is 0.75.

For Model 3D, the optimal values of n_estimators, criterion and max_features are 53, ‘gini’, and ‘sqrt’, respectively. The classification accuracies of the training set and test set are 100.00% and 91.22%, respectively. Moreover, on the training set, 5-fold cross-validation is 86.86%, 10-fold cross-validation is 87.36%, and leave-one-out cross-validation is 86.62%. On the test set, SE is 80.18%, SP is 97.12%, and MCC is 0.81.

2. The optimum parameters and details of six QSAR models

For Model 4A, the correlation coefficients of the training set and test set are 0.90 and 0.82, respectively. The MAE values are 0.34 and 0.43, and the MSE values are 0.17 and 0.32.

For Model 4B, the correlation coefficients of the training set and test set are 0.88 and 0.86, respectively. The MAE values are 0.37 and 0.36, and the MSE values are 0.22 and 0.21.

For Model 4C, the correlation coefficients of the training set and test set are 0.89 and 0.83, respectively. The MAE values are 0.34 and 0.41, and the MSE values are 0.18 and 0.30.

For Model 5A, the penalty parameter C, gamma, and epsilon are 16, 0.5, and 0.25, respectively. The correlation coefficients of the training set and test set are 0.93 and 0.84, respectively. The MAE values are 0.28 and 0.40, and the MSE values are 0.12 and 0.30.

For Model 5B, the penalty parameter C, gamma, and epsilon are 32, 0.25, and 0, respectively. The correlation coefficients of the training set and test set are 0.91 and 0.86, respectively. The MAE values are 0.28 and 0.36, and the MSE values are 0.18 and 0.21.

For Model 5C, the penalty parameter C, gamma, and epsilon are 32, 0.5, and 0.5, respectively. The correlation coefficients of the training set and test set are 0.92 and 0.85, respectively. The MAE values are 0.31 and 0.40, and the MSE values are 0.14 and 0.27.

3. MACCS fingerprints details

MACCS fingerprints whose IG value is positive are ranked in Figure S1. For model 1A, there are 52 MACCS fingerprints, while for model 1B, there are 61 MACCS fingerprints. All of them contributed to models 1A and 1B.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 543.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.