For Severe Acute Pancreatitis, How Does ChatGPT Measure Up?

TOPLINE:
ChatGPT (version 3.5) demonstrates moderate accuracy in responding to guideline-based clinical questions about the management of severe acute pancreatitis.
METHODOLOGY:
Researchers evaluated the accuracy of ChatGPT in answering 34 short-answer questions and 15 true/false questions regarding the management of severe acute pancreatitis.
The artificial intelligence (AI) tool was tested in both English and Chinese, with each question asked twice to evaluate reproducibility.
Two senior critical care medicine specialists assessed the responses for accuracy, and a third expert resolved any disagreements.
Researchers compared the accuracy rates between the English and Chinese responses, as well as between the short-answer and true/false questions.
TAKEAWAY:
ChatGPT was more accurate in English than in Chinese (71% vs 59%; P = .203).
The AI tool achieved higher accuracy with short-answer questions than with true/false questions in English (76% vs 60%; P = .405).
In Chinese, no significant difference in accuracy was observed between the short-answer and true/false questions (59% vs 60%; P = .938).
The reproducibility of ChatGPT responses was moderate to good in English and average in Chinese, indicating some reliability in the tool’s output.
IN PRACTICE:
“For clinicians who need rapid access to essential information regarding the management of severe acute pancreatitis, [ChatGPT] has the potential to be a valuable tool,” the authors wrote. “However, it is important to note that the current accuracy of ChatGPT is insufficient to assist clinicians in making judgments and determining the disposition of diseases.”
SOURCE:
The study, with first author Jun Qiu, Department of Critical Care Medicine, Chengdu First People’s Hospital, Chengdu, China, was published online in BMC Gastroenterology.
LIMITATIONS:
The study’s sample size was relatively small. The questions were limited to using the 2019 guidelines for managing severe acute pancreatitis, limiting their generalizability. The evolving nature of AI models could result in different responses over time. Additionally, subjectivity may have been introduced due to the need for clinical knowledge in evaluating some responses.
DISCLOSURES:
The study had no funding. The authors declared no conflicts of interest.

Send comments and news tips to [email protected].

Trending now

No results