首頁資訊 Are Large Language Models True Healthcare Jacks

Are Large Language Models True Healthcare Jacks

來源：泰然健康網(wǎng) 時(shí)間：2025年08月21日 01:04

View PDF HTML (experimental)

Abstract:Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Personnel in Chinese (EMPEC), a pioneering large-scale healthcare knowledge benchmark in traditional Chinese. EMPEC consists of 157,803 exam questions across 124 subjects and 20 healthcare professions, including underrepresented occupations like Optometrists and Audiologists. Each question is tagged with its release time and source, ensuring relevance and authenticity. We conducted extensive experiments on 17 LLMs, including proprietary, open-source models, general domain models and medical specific models, evaluating their performance under various settings. Our findings reveal that while leading models like GPT-4 achieve over 75% accuracy, they still struggle with specialized fields and alternative medicine. Surprisingly, general-purpose LLMs outperformed medical-specific models, and incorporating EMPEC's training data significantly enhanced performance. Additionally, the results on questions released after the models' training cutoff date were consistent with overall performance trends, suggesting that the models' performance on the test set can predict their effectiveness in addressing unseen healthcare-related queries. The transition from traditional to simplified Chinese characters had a negligible impact on model performance, indicating robust linguistic versatility. Our study underscores the importance of expanding benchmarks to cover a broader range of healthcare professions to better assess the applicability of LLMs in real-world healthcare scenarios. Comments: 15 pages, 4 figures Subjects: Computation and Language (cs.CL) Cite as: arXiv:2406.11328 [cs.CL] (or arXiv:2406.11328v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2406.11328

arXiv-issued DOI via DataCite

Submission history

From: Zheheng Luo [view email]
[v1] Mon, 17 Jun 2024 08:40:36 UTC (2,313 KB)

網(wǎng)址: Are Large Language Models True Healthcare Jacks http://www.u1s5d6.cn/newsview1706456.html

91高清中文字幕|亚洲无码网站网址|欧美一区二区乱伦|a乱码精品一区二区三|成人一区二区毛片|国产日韩精品视频短片|不卡无码无需播放器|鲁噜精品免费视频|wwwh日韩中出|精品五月婷婷无码

Are Large Language Models True Healthcare Jacks

Submission history

推薦資訊

從出汗看健康出汗透露你的健康信號(hào)

早上怎么喝水最健康？

91高清中文字幕|亚洲无码网站网址|欧美一区二区乱伦|a乱码精品一区二区三|成人一区二区毛片|国产日韩精品视频短片|不卡无码无需播放器|鲁噜精品免费视频|wwwh日韩中出|精品五月婷婷无码

Are Large Language Models True Healthcare Jacks

Submission history

推薦資訊

從出汗看健康 出汗透露你的健康信號(hào)

早上怎么喝水最健康？

從出汗看健康出汗透露你的健康信號(hào)

早上怎么喝水最健康？