ABSTRACT
The n-octanol/buffer solution distribution coefficient (or n-octanol/water partition coefficient) is of critical importance for measuring lipophilicity of drug candidates. After 4885 molecular descriptor generation, 15 molecular descriptors were selected to develop quantitative structure–property relationship (QSPR) models for distribution coefficients at pH 7.4 (log D7.4) of a large data set consisting of 1043 organic compounds, which was divided into a training set (600 compounds) and a test set (443 compounds). Support vector machine (SVM) based on genetic algorithm was used to develop a model for log D7.4 that has coefficient of determination r2 of 0.919 for the training set and 0.893 for the test set. The results suggest that the SVM model is accurate in predicting log D7.4.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplemental data for this article can be accessed at: https://doi.org/10.1080/1062936X.2020.1782468.