The Vox Investigation Into the Major ER Diagnosis Study That Found AI Won — and the Catch That Makes the Victory Far Less Clear-Cut

Journalists adore a certain type of medical story, which typically features a quiet hospital hallway, a confused physician, and a last-minute rescue. That scene is completely reversed in the new Boston-based Science paper. This time, while the human team is still adjusting the anticoagulation drip, a model sitting somewhere in an OpenAI data center reads clumsy electronic health records and silently arrives at lupus.

It’s worth reading twice in part because Dylan Scott’s Vox piece doesn’t oversell it. On their own, the numbers are powerful. OpenAI’s o1 model correctly diagnosed patients in triage 67% of the time. The scores of the two physicians who tested against it were 50 and 55. The model increased to 81 by admission. To their credit, the doctors finished at 70 and 79, significantly closing the gap. Nevertheless, the headline speaks for itself, and it was written by many media outlets.

Keys	Values
Study Title	Reasoning model performance in emergency department diagnosis
Published In	Science, April 30, 2026
Lead Institutions	Harvard Medical School; Beth Israel Deaconess Medical Center
Co-Author Quoted	Dr. Adam Rodman, general internist and medical educator
AI Model Tested	OpenAI’s o1 reasoning model (released 2024)
Triage Accuracy	AI: 67% — Doctors: 50% and 55%
Admission-Stage Accuracy	AI: 81% — Doctors: 70% and 79%
Data Source	Real ER cases from Beth Israel Deaconess
Reporter	Dylan Scott, health correspondent at Vox
Key Caution	Authors warn against using results to replace physicians
Earlier Counter-Study	Nature Medicine, Feb 2026 — ChatGPT underestimated severity in 52% of cases

Every thorough article has a catch that is hidden in the second half, and Scott appears to be particularly interested in it. One of the co-authors, Dr. Adam Rodman, told reporters he feels “a little bit queasy” about the potential applications of the findings. That is not a line to be thrown away. As he watches his own paper leave the lab, the researcher worries about what hospital administrators, insurance executives, and tech investors will do with it. Reading between the lines gives the impression that he has already seen the PowerPoint slide that it will eventually become.

In reality, the study measured diagnostic reasoning on paper. cold text. No bedside chatter, no patient recoiling when a resident applies pressure to the lower right quadrant, and no parent discreetly pointing out that their child hasn’t eaten in two days. O1 didn’t see any of the human signal that has always been as important to emergency medicine as labs and imaging. It observed the chart. It performed remarkably well on the chart. It is genuinely unclear if it would perform nearly as well in a real shift with interruptions and incomplete information arriving in waves.

It’s also difficult to ignore the timing. In more than half of test scenarios, ChatGPT, a generalist rather than a reasoning model, underestimated patient severity, according to a different February Nature Medicine paper. In one instance, it advised someone on the verge of diabetic shock to monitor things at home. There was hardly any impact from that study. With its prestigious co-authors and cleaner numbers, the April paper has been widely circulated.

The Vox Investigation Into the Major ER Diagnosis Study That Found AI Won — and the Catch That Makes the Victory Far Less Clear-Cut

The cultural pattern seems familiar as we watch this develop. Before anyone trusted self-driving cars on a foggy freeway, they passed their closed-course tests for years. By 2020, radiologists were expected to be replaced by AI. The radiologists are still reading scans six years after that deadline, frequently with software assisting them in the background. This also most likely takes that form. Not a substitute. A second pair of eyes that never grows weary, never overlooks lupus on the differential, and never, ever has to explain things to the family in the waiting area.

Before any of this is applied to actual patients, the authors have explicitly requested clinical trials. The more intriguing question is whether anyone pays attention.

What's Hot

Why America’s Approach to Preventive Health Is Still Thirty Years Behind What the Evidence Has Been Saying All Along

Why Every American City Needs a Climate Resilience Plan Right Now — and How Few of Them Actually Have One

CEOs Are Leaning Hard on the Word ‘Resilience’ to Explain Global Turmoil. Employees Are Noticing.

The Resilience Gap Nobody Is Talking About: Why America’s Frontline Workers Are Running on Empty Right Now

The Gizmodo Reality Check on AI Beating ER Doctors That Every Excited Tech Investor and Every Cautious Clinician Should Read Before Forming an Opinion

AI Accountability Frameworks May Soon Be as Mandatory as HIPAA Compliance, Healthcare IT Leaders Say the Shift Is Already Here

Why America’s Approach to Preventive Health Is Still Thirty Years Behind What the Evidence Has Been Saying All Along

Why Every American City Needs a Climate Resilience Plan Right Now — and How Few of Them Actually Have One

CEOs Are Leaning Hard on the Word ‘Resilience’ to Explain Global Turmoil. Employees Are Noticing.

Healthy Workers Are Ditching Company Insurance to Save $1,000 a Month. The Implications for Employer Health Plans Are Enormous.

Doctors Couldn’t Help Them. They Rolled the Dice With AI — and the New York Times Investigation Into What Happened Next Is Gripping

The American Medical Association’s New AI Deepfake Policy Principles Are the Most Important Document in Digital Health Published This Year

Our Picks

Why America’s Approach to Preventive Health Is Still Thirty Years Behind What the Evidence Has Been Saying All Along

Why Every American City Needs a Climate Resilience Plan Right Now — and How Few of Them Actually Have One

CEOs Are Leaning Hard on the Word ‘Resilience’ to Explain Global Turmoil. Employees Are Noticing.

What's Hot

The Vox Investigation Into the Major ER Diagnosis Study That Found AI Won — and the Catch That Makes the Victory Far Less Clear-Cut

Related Posts