论文已发表
注册即可获取德孚的最新动态
IF 收录期刊
用于艾滋病患者健康咨询的人工智能驱动的大型语言模型
Authors Zhao CY, Song C, Yang T, Huang AC, Qiang HB, Gong CM, Chen JS, Zhu QD
Received 29 April 2025
Accepted for publication 6 August 2025
Published 25 August 2025 Volume 2025:18 Pages 5187—5198
DOI https://doi.org/10.2147/JMDH.S533621
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 3
Editor who approved publication: Dr Scott Fraser
Chun-Yan Zhao,1,2,* Chang Song,1,2,* Tong Yang,3,* Ai-Chun Huang,1 Hang-Biao Qiang,1 Chun-Ming Gong,1 Jing-Song Chen,4 Qing-Dong Zhu1
1Department of Tuberculosis, The Fourth People’s Hospital of Nanning, Nanning, Guangxi, People’s Republic of China; 2Clinical Medical School, Guangxi Medical University, Nanning, Guangxi, People’s Republic of China; 3Department of Rehabilitation, Hepu County People’s Hospital, Beihai, Guangxi, People’s Republic of China; 4Department of Gastroenterology, Hepu County People’s Hospital, Beihai, Guangxi, People’s Republic of China
*These authors contributed equally to this work
Correspondence: Qing-Dong Zhu, Department of Tuberculosis, The Fourth People’s Hospital of Nanning, No. 1 Changgang Two-Li, Xingning District, Nanning, Guangxi, 530023, People’s Republic of China, Tel +86 0771-5636973, Email zhuqingdong2003@163.com Jing-Song Chen, Department of Gastroenterology, Hepu County People’s Hospital, No. 95, Dinghai North Road, Hepu County, Beihai, Guangxi, 536100, People’s Republic of China, Tel +86 0779-7106010, Email 410155791@qq.com
Purpose: This study endeavors to conduct a comprehensive assessment on the performance of large language models (LLMs) in health consultation for individuals living with HIV, delve into their applicability across a diverse array of dimensions, and provide evidence-based support for clinical deployment.
Patients and Methods: A 23-question multi-dimensional HIV-specific question bank was developed, covering fundamental knowledge, diagnosis, treatment, prognosis, and case analysis. Four advanced LLMs—ChatGPT-4o, Copilot, Gemini, and Claude—were tested using a multi-dimensional evaluation system assessing medical accuracy, comprehensiveness, understandability, reliability, and humanistic care (which encompasses elements such as individual needs attention, emotional support, and ethical considerations). A five-point Likert scale was employed, with three experts independently scoring. Statistical metrics (mean, standard deviation, standard error) were calculated, followed by consistency analysis, difference analysis, and post-hoc testing.
Results: Claude obtained the most outstanding performance with regard to information comprehensiveness (mean score 4.333), understandability (mean score 3.797), and humanistic care (mean score 2.855); Copilot demonstrated proficiency in diagnostic questions (mean score 3.880); Gemini illustrated exceptional performance in case analysis (mean score 4.111). Based on the post-hoc analysis, Claude outperformed other models in thoroughness and humanistic care (P < 0.05). Copilot showed better performance than ChatGPT in understandability (P = 0.045), while Gemini performed significantly better than ChatGPT in case analysis (P < 0.001). It is important to note that performance varied across tasks, and humanistic care remained a consistent weak point across all models.
Conclusion: The superiority of diverse models in specific tasks suggest that LLMs hold extensive application potential in the management of HIV patients. Nevertheless, their efficacy in the realm of humanistic care still needs improvement.
Keywords: artificial intelligence, large language model, HIV, health consultation, performance analysis