Abstract
Computational data mining is of interest in the pharmaceutical arena for the analysis of massive amounts of data and to assist in the management and utilization of the data. In this study, a data mining approach was used to predict the miscibility of a drug and several excipients, using Hansen solubility parameters (HSPs) as the data set. The K-means clustering algorithm was applied to predict the miscibility of indomethacin with a set of more than 30 compounds based on their partial solubility parameters [dispersion forces , polar forces and hydrogen bonding ]. The miscibility of the compounds was determined experimentally, using differential scanning calorimetry (DSC), in a separate study. The results of the K-means algorithm and DSC were compared to evaluate the K-means clustering prediction performance using the HSPs three-dimensional parameters, the two-dimensional parameters such as volume-dependent solubility and hydrogen bonding , and selected single (one-dimensional) parameters. Using HSPs, the prediction of miscibility by the K-means algorithm correlated well with the DSC results, with an overall accuracy of 94%. The prediction accuracy was the same (94%) when the two-dimensional parameters or the hydrogen-bonding (one-dimensional) parameter were used. The hydrogen-bonding parameter was thus a determining factor in predicting miscibility in such set of compounds, whereas the dispersive and polar parameters had only a weak correlation. The results show that data mining approach is a valuable tool for predicting drug–excipient miscibility because it is easy to use, is time and cost-effective, and is material sparing.
Acknowledgements
Waseem Kaialy thanks Damascus University for providing PhD scholarship.