163
Views
0
CrossRef citations to date
0
Altmetric
Open Peer Commentaries

Predicting Patient Preferences with Artificial Intelligence: The Problem of the Data Source

ORCID Icon

The concept of a Patient Preference Predictor—an algorithm that supplements or replaces the process of surrogate decision-making for incapacitated patients—was first suggested a decade ago (Rid and Wendler Citation2014a). The underlying idea is that certain demographic key characteristics, such as age or income, could be statistically predictive of an individual’s preferences in health care (Rid and Wendler Citation2014b). In their intriguing Target Article, Earp et al. (Citation2024) expand this initial proposal into a personalized version: instead of relying on demographic data, large language models are to extract personal preferences or values from a variety of sources generated by, or pertaining to, individual patients. The hope, for both the original and the revised proposal, is that the preferences so derived would be more accurate, on average, than those typically identified by human surrogate decision-makers (Shalowitz, Garrett-Mayer, and Wendler Citation2006).

Many people believe that the following is true about artificial intelligence: machine-learning models are only as good as the quality of the data on which they were trained (Moseley Citation2024). As the authors suggest utilizing entirely novel types of sources, the issue of dataset quality is especially pertinent to the proposed predictor algorithm, and it is this concern on which I shall focus in this comment.

The Target Article considers four potential data sources: (1) a patient’s explicit responses to questions about treatment preferences, collected—for example—from questionnaires filled in for healthcare institutions at a previous time of competence; (2) electronic health records, documenting past treatments and associated decisions; (3) a patient’s online activities in any context; and (4) value-eliciting discrete-choice experiments.

The first two of these appear rather uncontroversial. Both the patient’s own responses when asked about treatment preferences as well as electronic health records fulfill two decisive criteria, which we may denote as thematic relevance and individual applicability: they deal specifically with medical affairs, and they are generated precisely for the individual in question rather than for anybody else. Consequently, data sources of this kind promise to offer a reliable basis for the envisioned predictor.

However, as these sources thus already contain data of the required kind in a form directly utilizable also by traditional means, the transformative potential of automated preference prediction would likely be rather limited if these were the only sources on which it could rely. Ideally, the software would be able to provide advice especially for patients who did not leave behind a substantial trail of personal medico-ethical information from previous interactions with the healthcare system. The much bolder part of the authors’ proposal therefore concerns the two remaining types of data source, which I shall now consider in a little more detail.

PRIVATE PREFERENCES, PUBLIC PREFERENCES

The third potential data source that Earp et al. consider are social-media posts, blog entries, and e-mails. For the sake of the argument, let us assume that the first major hurdle—finding sufficiently many statements authored by the respective patient that are actually medically relevant, or from which medically relevant preferences can somehow be inferred—was already taken. Thematic relevance is therefore fulfilled. What about individual applicability?

Consider the following example. Commenting on a friend’s social-media post about the severe illness of his mother, I write: “In this condition, I would definitely have her treatment discontinued”. This statement should be an ideal addition to the dataset as it clearly links a diagnosis to a preferred decision; most personalized online sources would be of much lower quality in this regard. Still—is my remark about the medical condition of my friend’s mother really indicative of the treatment I would prefer for myself in the same situation? Does “I would have her treatment discontinued” automatically translate to “you should discontinue my treatment, if in this condition, when the time comes”? Not necessarily. One’s own case is always special, and therefore it must be probed explicitly if the responses are to be faithful in this regard.

In a survey study that simulated crash scenarios with autonomous vehicles, with the aim of establishing people’s preferences for moral principles built into the cars, researchers found that the vast majority of participants favored utilitarian decision rules, so that casualties would be minimized. But this changed as soon as they were placed into the car themselves. Suddenly, the car was meant to prioritize protecting its passengers, even if this resulted in additional deaths overall (Bonnefon, Shariff, and Rahwan Citation2016). The only difference between the scenarios was that the person in the car about to crash was I rather than someone else.

What would be required, then, are signals in the text corpus that permit the classification of statements into general and idiosyncratic ones—and ideally also allow an estimate of the accuracy of the respective allocation. A lengthy, contemplative blog entry or personal e-mail may contain certain markers to this effect that the algorithm could pick up. For most online activities of the average person, however, this differentiation will be rather difficult to obtain. Individual applicability does not simply follow from the use of reflexive pronouns in the context.

PLAYFUL PREFERENCES, PAINFUL PREFERENCES

The final type of potential data source that the authors consider is immune to this problem. Value-eliciting discrete-choice experiments can explicitly put the user, rather than anyone else, into the respective scenarios. Data so collected must, so it seems, be optimal with regard to individual applicability. But how relevant would it be?

There are several reasons that deter people from drafting advance directives, including the necessary time investment, insufficient health literacy, and the somber nature of the topic (Scholten et al. Citation2018). To attract potential users, Earp et al. therefore envision preference collection in the form of a gamified app or internet platform.

Not everybody will enjoy pondering end-of-life decisions as much as we ethicists do. To incentivize people beyond the rather modest allure of advance directives, the presented discrete-choice situations—even if gamified—would therefore likely have to depart substantially from familiar clinical setups (“Your avatar is severely demented and does not recognize her family—specify the next move!”, would in all probability not fare any better in mobilizing wide participation). The hope would have to be that individual preferences of medical relevance can still be extrapolated from answers in well chosen, more inspiriting contexts. Let us stipulate that, technologically, this could indeed be achieved. Would we therefore be able to elicit the type of responses we desire?

If the game, or series of scenarios, is enjoyable enough to attract people, it will likely also be distant enough from the gravity of medical decision-making in situations of incapacitation to fail to evoke rejoinders that are true treatment preferences. Being in the right headspace for balancing preferences of the kind required can be unsettling. Many people avoid drafting advance directives because doing so would entail contemplating disease, suffering, and the inescapable finitude of one’s life. Subjecting oneself to the emotional force of these topics may not be pleasant; but it may be necessary to achieve the desired result. Hence, reactions to entertaining, gamified prompts may diverge significantly from the truly invested, self-reflective perspective that disconcerting scenarios of one’s own diseased future command. Extracting treatment preferences regarding these solemn issues and simultaneously sparing users the associated psychological costs to incentivize them may therefore be mutually exclusive. For these costs are the prerequisite of said preferences to form in the first place; and what has not formed, cannot be brought out—not even in a gamified way.

CONCLUSION

I sketched two potential problems with the two arguably most transformative types of data source on which a personalized version of a preference-predicting algorithm could rely: the difficulty of differentiating between general and idiosyncratic statements in online activities, and the tension between incentivizing users and the faithfulness of the preferences so elicited. If I was correct in these assessments, then the accuracy of a personalized predictor algorithm might be lower than hoped, or its scope narrower.

Does this mean that one should not attempt to develop the envisioned predictor? Certainly not. The idea is promising and if it could be realized, it would address one of the major challenges in healthcare. There is thus a clear need and a strong use case, although there may still be a long way to go.

Two years ago, in this journal, our research group presented an algorithm for solving moral dilemma situations in medicine—the first of its kind (Meier et al. Citation2022a, Citation2022b; Hein et al. Citation2022). Only when we were actually creating it, we found that certain aspects we had not deemed particularly difficult during the conception phase were actually very tricky; but also that some obstacles we had anticipated to be nearly insurmountable had rather straightforward solutions.

Which type of data source works best for a preference predictor? Do personalized or population-based approaches deliver the better results? Would we need hybrid solutions that unify the two paradigms? Let’s build them and find out.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by Churchill College, University of Cambridge.

REFERENCES

  • Bonnefon, J.-F., A. Shariff, and I. Rahwan. 2016. The social dilemma of autonomous vehicles. Science 352 (6293):1573–6. doi: 10.1126/science.aaf2654.
  • Earp, B. D., S. Porsdam Mann, J. Allen, S. Salloch, V. Suren, K. Jongsma, M. Braun, D. Wilkinson, W. Sinnott-Armstrong, A. Rid, et al. 2024. A personalized patient preference predictor for substituted judgments in healthcare: Technically feasible and ethically desirable. The American Journal of Bioethics 24 (7):13–26. doi: 10.1080/15265161.2023.2296402.
  • Hein, A., L. J. Meier, A. Buyx, and K. Diepold. 2022. A fuzzy-cognitive-maps approach to decision-making in medical ethics. 2022 IEEE international conference on fuzzy systems (FUZZ-IEEE), 1–8. doi: 10.1109/FUZZ-IEEE55066.2022.9882615.
  • Meier, L. J., A. Hein, K. Diepold, and A. Buyx. 2022a. Algorithms for ethical decision-making in the clinic: A proof of concept. The American Journal of Bioethics 22 (7):4–20. doi: 10.1080/15265161.2022.2040647.
  • Meier, L. J., A. Hein, K. Diepold, and A. Buyx. 2022b. Clinical ethics – to compute, or not to compute? The American Journal of Bioethics 22 (12):W1–W4. doi: 10.1080/15265161.2022.2127970.
  • Moseley, H. 2024. In the AI science boom, beware: Your results are only as good as your data. Nature. doi: 10.1038/d41586-024-00306-2.
  • Rid, A., and D. Wendler. 2014a. Treatment decision making for incapacitated patients: Is development and use of a patient preference predictor feasible? Journal of Medicine and Philosophy 39 (2):130–52. doi: 10.1093/jmp/jhu006.
  • Rid, A., and D. Wendler. 2014b. Use of a patient preference predictor to help make medical decisions for incapacitated patients. Journal of Medicine and Philosophy 39 (2):104–29. doi: 10.1093/jmp/jhu001.
  • Scholten, G., S. Bourguignon, A. Delanote, B. Vermeulen, G. van Boxem, and B. Schoenmakers. 2018. Advance directive: Does the GP know and address what the patient wants? Advance directive in primary care. BMC Medical Ethics 19 (1):58. doi: 10.1186/s12910-018-0305-2.
  • Shalowitz, D. I., E. Garrett-Mayer, and D. Wendler. 2006. The accuracy of surrogate decision makers: A systematic review. Archives of Internal Medicine 166 (5):493–7. doi: 10.1001/archinte.166.5.493.