DISCLOSURE STATEMENT
No potential conflict of interest was reported by the author(s).
Notes
1 Earp et al. discuss the autonomy problem, but they do not say how their proposal aims to avoid what elsewhere I’ve called the “scope” and the “multiple models” problems (Sharadin Citation2019). Indeed, P4s appear to exacerbate the scope and the multiple models problems. The scope problem: Earp et al. suggest a P4 might be implemented using a system trained on an individual’s “emails, blog posts, or social media posts […] or even Facebook ‘liking’ activity” (6). But which emails and posts? Public ones? Which social media posts? Which platforms? The multiple models problem: Earp et al. suggest at least 5 different implementations of the P4 (p. 6), and there are multiple reasonable ways of executing each of these implementations, each of which will vary in their predictions concerning patient care. What principled way can there be for clinicians and other healthcare providers to decide between these models? Worse, the kinds of ML models Earp et al. propose to use (LLMs) are particularly prone to producing widely divergent outputs depending on technical design choices made in deploying the model (such as those involving the inference procedure and other “background conditions”). For discussion, see (Harding and Sharadin Citation2024). I say more about this issue below, in Section “TECHNICAL ISSUES.”
2 Earp et al.’s view seems to be that the primary technical challenge to using LLMs as P4s involves acquiring (enough, good) data: “In general, the primary function of LLMs is prediction: given data of a sufficient quality and relevance, prima facie LLMs should be able to predict medical preferences, too” (9). I agree that (enough, good) data will be a barrier to using any LLM as a P4. Here, I’ll set this (very big) issue to one side.
3 For related work on the difficulties associated with evaluating the capabilities of LLMs, see (Harding and Sharadin Citation2024).