ABSTRACT
Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.
Acknowledgements
This project was supported in part by an appointment to the Research Participation Program at the U.S. Food and Drug Administration by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy and the FDA. Jackson Lautier’s contribution was supported in part by the ORISE training program at the FDA and a National Science Foundation Graduate Research Fellowship under Grant No. DHE 1747453. We further thank seminar participants at the 2022 Joint Statistical Meetings in Washington, DC and two anonymous reviewers for providing insightful comments that improved the quality of our work.
Disclosure statement
The present study reflects the views of the authors and should not be construed to represent the views or recommendations of the U.S. Food and Drug Administration. The authors have no other potential conflicts of interest to report.