ChatGPT can almost pass the US Medical Licensing Exam

AI software was able to achieve passing scores for the exam, which usually requires years of medical training

ChatGPT can score at or around the approximately 60 percent passing threshold for the United States Medical Licensing Exam (USMLE), with responses that make coherent, internal sense and contain frequent insights, according to a study published February 9, 2023, in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth.

ChatGPT is a new artificial intelligence (AI) system, known as a large language model (LLM), designed to generate human-like writing by predicting upcoming word sequences. Unlike most chatbots, ChatGPT cannot search the internet. Instead, it generates text using word relationships predicted by its internal processes.

Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.

After screening to remove image-based questions, the authors tested the software on 350 of the 376 public questions available from the June 2022 USMLE release.

After indeterminate responses were removed, ChatGPT scored between 52.4% and 75.0% across the three USMLE exams. The passing threshold each year is approximately 60%. ChatGPT also demonstrated 94.6% concordance across all its responses and produced at least one significant insight (something that was new, non-obvious, and clinically valid) for 88.9% of its responses. Notably, ChatGPT exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.

While the relatively small input size restricted the depth and range of analyses, the authors note their findings provide a glimpse of ChatGPT’s potential to enhance medical education, and eventually, clinical practice. For example, they add, clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports for easier patient comprehension.

“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” say the authors.

Author Dr. Tiffany Kung added that ChatGPT's role in this research went beyond being the study subject: "ChatGPT contributed substantially to the writing of [our] manuscript... We interacted with ChatGPT much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress...All of the co-authors valued ChatGPT's input."

ChatGPT can almost pass the US Medical Licensing Exam

AI software was able to achieve passing scores for the exam, which usually requires years of medical training

New method improves precision of particle collision simulations

CoreWeave, Perplexity forge a strategic HPC-driven AI partnership

AI agents open new frontiers in predicting preterm birth

Palantir, NVIDIA propose a ‘sovereign AI operating system,’ a new blueprint for AI supercomputing infrastructure

Mapping a sea of light: Astronomers use supercomputers to probe the early Universe, but how much is signal vs. interpretation?

Cratered clues: How supercomputers are reconstructing the violent history of asteroid Psyche

Reducing the data bottleneck: A curious look at compression for supercomputing workflows

Machine learning meets the Cerrado: Mapping the hidden carbon power of Brazil’s wetlands

AI for financial stability, or systemic risk? A look at the ‘Faustian bargain’

Supercomputing illuminates the machinery of life

EMAIL NEWSLETTER SUBSCRIPTION

ChatGPT can almost pass the US Medical Licensing Exam

AI software was able to achieve passing scores for the exam, which usually requires years of medical training