The paradox arrived quietly in a Harvard research paper: AI systems that can diagnose emergency room patients more accurately than physicians create a problem no one has solved—who gets blamed when the machine is wrong?
A study published this week by Harvard researchers found that large language models offered more accurate emergency room diagnoses than two human doctors presented with identical cases. The AI did not merely match clinical performance. It exceeded it. Researchers tested the models across a range of medical contexts, including real emergency room scenarios where the stakes are measured in minutes and lives.
The findings landed in medical journals and tech publications with predictable headlines celebrating machine superiority over human judgment. But beneath the benchmark numbers lies an uncomfortable question that the research raises but cannot answer: accountability does not scale with accuracy.
Consider what happens when this AI errs. A doctor misdiagnoses a patient; the physician bears liability, the hospital reviews protocols, the system learns from the failure. An AI misdiagnoses the same patient while a physician stands in the room—nothing in the current legal infrastructure clarifies who bears responsibility. Was it the doctor who trusted the model's output? The hospital that deployed it? The company that trained it on data it did not fully understand? The regulator who approved it under frameworks designed for software, not autonomous decision-making?
Medical AI has crossed a threshold. The Harvard study adds to a growing body of evidence that large language models can match or exceed specialist-level performance in defined diagnostic tasks. Models trained on vast medical literature now parse symptoms, weigh rare conditions, and synthesize complex presentations faster than clinicians working from memory and experience alone. The technical problem—building an AI accurate enough to be useful—is largely solved.
What remains is a governance crisis dressed as a technical one. Healthcare systems are moving toward AI-assisted diagnosis without established frameworks for error attribution. Malpractice law assumes human decision-makers. Hospital liability structures assume physician agency. When a machine recommendation sits between doctor and patient, neither assumption holds.
The Harvard researchers noted this gap without resolving it. Their methodology demonstrated AI superiority; they did not propose who answers for the cases where superiority fails. That question belongs to regulators, insurers, hospital administrators, and courts—none of whom have moved quickly enough to keep pace with the technology.
Some health systems are already piloting AI diagnostic tools in emergency departments, attracted by efficiency gains and accuracy improvements. The legal departments advising those hospitals are, by most accounts, improvising. One senior physician at a major urban hospital, speaking without attribution, described the current approach as "deploying the technology and hoping the liability questions work themselves out."
They will not work themselves out. When an AI system errs in an emergency room and a patient suffers harm, the accountability vacuum will become a courtroom. The Harvard study tells us the machines have arrived. What it cannot tell us is who pays when they fail.