Figures & data
Figure 1 The data analysis and machine learning schema.
Abbreviations: ER, estrogen receptor; SVM, support vector machine; ROC, receiver operating characteristic; PCA, principal component analysis.
![Figure 1 The data analysis and machine learning schema.](/cms/asset/7e242af6-6753-4d21-b5d1-07af64c55704/dddt_a_12172952_f0001_c.jpg)
Figure 2 Principal component analysis (PCA) of the dataset.
Abbreviations: Ext, extended; AP2D, 2D atom pairs; FP, fingerprints.
![Figure 2 Principal component analysis (PCA) of the dataset.](/cms/asset/d4629c83-47a0-4343-939b-b873a8d82e85/dddt_a_12172952_f0002_b.jpg)
Figure 3 The heat map of distance matrix for the compounds in the collected dataset.
![Figure 3 The heat map of distance matrix for the compounds in the collected dataset.](/cms/asset/8be59fef-ef05-4185-9389-2c1643341785/dddt_a_12172952_f0003_c.jpg)
Table 1 Model performances of 5-fold cross validation
Figure 4 The ROC curves of the 5-fold cross validation models based on four types of fingerprints (FP) and four machine learning approaches.
Abbreviations: ROC, receiver operating characteristic; NB, Naïve Bayesian; KNN, k-nearest neighbor; RF, random forest; SVM, support vector machine; Ext, extended; AP2D, 2D atom pairs; TP, true positives, FPos, false positives.
![Figure 4 The ROC curves of the 5-fold cross validation models based on four types of fingerprints (FP) and four machine learning approaches.](/cms/asset/565c9c9f-e7e6-41f0-ad31-1c70beef7e8c/dddt_a_12172952_f0004_c.jpg)
Figure 5 Performance ranking of machine learning methods with various fingerprints (FP).
Abbreviations: NB, Naïve Bayesian; KNN, k-nearest neighbor; RF, random forest; SVM, support vector machine; Ext, extended; AP2D, 2D atom pairs.
![Figure 5 Performance ranking of machine learning methods with various fingerprints (FP).](/cms/asset/ac82e821-5e50-4831-8cd2-0a0b26f34c0b/dddt_a_12172952_f0005_c.jpg)
Figure 6 Performance ranking of fingerprints (FP) in various machine learning methods.
Abbreviations: NB, Naïve Bayesian; KNN, k-nearest neighbor; RF, random forest; SVM, support vector machine; Ext, extended; AP2D, 2D atom pairs.
![Figure 6 Performance ranking of fingerprints (FP) in various machine learning methods.](/cms/asset/66ce99b1-4d2d-40f1-8188-1d76f93ada7a/dddt_a_12172952_f0006_c.jpg)
Table 2 Model performances of test set
Table 3 Model performances of external test set
Table S1 Ten-fold cross validation model performance
Table S2 Five-fold cross validation model performance using experimental inactive agonists