Are Large Language Models True Healthcare Jacks
View PDF HTML (experimental)
Abstract:Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Personnel in Chinese (EMPEC), a pioneering large-scale healthcare knowledge benchmark in traditional Chinese. EMPEC consists of 157,803 exam questions across 124 subjects and 20 healthcare professions, including underrepresented occupations like Optometrists and Audiologists. Each question is tagged with its release time and source, ensuring relevance and authenticity. We conducted extensive experiments on 17 LLMs, including proprietary, open-source models, general domain models and medical specific models, evaluating their performance under various settings. Our findings reveal that while leading models like GPT-4 achieve over 75% accuracy, they still struggle with specialized fields and alternative medicine. Surprisingly, general-purpose LLMs outperformed medical-specific models, and incorporating EMPEC's training data significantly enhanced performance. Additionally, the results on questions released after the models' training cutoff date were consistent with overall performance trends, suggesting that the models' performance on the test set can predict their effectiveness in addressing unseen healthcare-related queries. The transition from traditional to simplified Chinese characters had a negligible impact on model performance, indicating robust linguistic versatility. Our study underscores the importance of expanding benchmarks to cover a broader range of healthcare professions to better assess the applicability of LLMs in real-world healthcare scenarios. Comments: 15 pages, 4 figures Subjects: Computation and Language (cs.CL) Cite as: arXiv:2406.11328 [cs.CL] (or arXiv:2406.11328v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2406.11328arXiv-issued DOI via DataCite
Submission history
From: Zheheng Luo [view email]
[v1] Mon, 17 Jun 2024 08:40:36 UTC (2,313 KB)
相關(guān)知識(shí)
MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
Data Solutions for Healthcare
Research into language learning and motivation has changed direction over the pa
Do LLMs Provide Consistent Answers to Health
虛擬物種的基本原理及其在物種分布模型評(píng)估中的應(yīng)用
The foundation of wellness – Esports Healthcare
[2025] 150 Courses & Webinars on AI in Healthcare — Class Central
Disrupting diagnostic hegemony: reimagining mental health language with British South Asian communities
Language interpreting and translation: migrant health guide
Language: A Powerful Tool in Promoting Healthy Behaviors
網(wǎng)址: Are Large Language Models True Healthcare Jacks http://www.u1s5d6.cn/newsview1706456.html
推薦資訊
- 1發(fā)朋友圈對(duì)老公徹底失望的心情 12775
- 2BMI體重指數(shù)計(jì)算公式是什么 11235
- 3補(bǔ)腎吃什么 補(bǔ)腎最佳食物推薦 11199
- 4性生活姿勢(shì)有哪些 盤點(diǎn)夫妻性 10428
- 5BMI正常值范圍一般是多少? 10137
- 6在線基礎(chǔ)代謝率(BMR)計(jì)算 9652
- 7一邊做飯一邊躁狂怎么辦 9138
- 8從出汗看健康 出汗透露你的健 9063
- 9早上怎么喝水最健康? 8613
- 10五大原因危害女性健康 如何保 7828