A study published April 30 in Science by Harvard, Beth Israel Deaconess and Stanford researchers found OpenAI's o1 reasoning model received perfect clinical-reasoning scores on 98% of cases, versus 35% for attending physicians.
OpenAI's o1 Reasoning Model Outperformed Doctors at Diagnosis in a Real-World Harvard-Stanford Study
On April 30, 2026, the journal Science published a study by researchers at Harvard Medical School, Beth Israel Deaconess Medical Center and Stanford evaluating how OpenAI's reasoning model "o1" performs on real emergency-room cases. NPR and STAT covered the result the same day: in a head-to-head test using only the electronic health records available at the time of care, the AI model matched or exceeded experienced physicians on diagnosis, recommended tests and case-management decisions.
The most striking number came from clinical-reasoning evaluations, where graders scored how well a respondent explained diagnostic thinking and next steps. The o1 model received perfect scores on 98% of the cases reviewed, while attending physicians earned them on 35%. The model was particularly strong at rare diseases and complex multi-system cases — exactly the situations where human cognitive bias tends to fail. It also outperformed GPT-4, the previous AI baseline.
“NPR and STAT covered the result the same day: in a head-to-head test using only the electronic health records available at the time of care, the AI model matched or exceeded experienced physicians on diagnosis, recommended tests and case-management decisions.”
The authors are explicit that this does not mean AI should replace doctors. The study evaluated reasoning quality, not patient outcomes; physicians make decisions in noisy, time-pressured environments that text-based benchmarks do not fully capture. They argue the right next step is controlled trials embedded in real clinical workflows, where AI assists clinicians on hard cases while staying under human oversight.
The result lands inside a wider debate about the role of AI in medicine. STAT and Science have called for new regulatory frameworks, and a separate WHO/Europe report covering all 27 EU member states found 74% already deploying AI in diagnostics. The promise — clinicians supported by tireless reasoning copilots that catch what tired humans miss — is starting to look credible. The hard work is integrating it ethically and equitably.
How did this story make you feel?
📎 Cite this article
Good News Good Vibes. (2026, April 30). OpenAI's o1 Reasoning Model Outperformed Doctors at Diagnosis in a Real-World Harvard-Stanford Study. Retrieved from https://goodnewsgoodvibes.com/en/article/openai-o1-outperforms-doctors-real-world-diagnosis-harvard-stanford-science
https://goodnewsgoodvibes.com/en/article/openai-o1-outperforms-doctors-real-world-diagnosis-harvard-stanford-science
Editorial Team
Our editorial team curates and verifies positive news from credible sources worldwide.
Last reviewed: April 30, 2026
Trending
Tropical Rainforest Loss Dropped 36% in 2025, Driven by a Sharp Reduction in Brazil
Environment · 5 minGreen Sea Turtle Downlisted from "Endangered" to "Least Concern" by IUCN — A Once-in-a-Generation Conservation Win
Animals · 4 min80-Year-Old Vietnam Veteran William Alvarez Crosses Finish Line in His Fourth Boston Marathon
Sports · 5 minYuvelis Morales Blanco, 24, Wins 2026 Goldman Environmental Prize for Helping Halt Fracking in Colombia
Human Stories · 5 minLACMA Opens Peter Zumthor's David Geffen Galleries — A 900-Foot Building Spanning Wilshire Boulevard
Culture · 5 min