Abstract
The objective of this research was to evaluate the relative capabilities of ChatGPT-3.5, ChatGPT-4 and Bard (later renamed Gemini) in responding to queries concerning health with respect to comprehensiveness, accuracy and currency. A panel of general practitioners ranked the answers provided by all three chatbots on a five-point scale for five questions. In terms of comprehensiveness, ChatGPT-4 had a better mean score than Bard, and Bard had a better mean score than ChatGPT-3.5. There was no statistically significant difference between the three chatbots in terms of accuracy. Bard outperformed ChatGPT-4 in the currency domain, while ChatGPT-4 performed better than ChatGPT-3.5. All chatbots received high ratings for accuracy, currency and comprehensiveness.
Disclosure statement
No potential conflict of interest was reported by the author.
Additional information
Notes on contributors
Mustafa Said Yıldız
Mustafa Said Yıldız, PhD ([email protected]) is an Associate Professor and Internal Auditor, the Internal Audit Department, Turkiye Ministry of Health. Research interests are health policy, management and economics.