You can find AI subtly ingrained in the walls of the majority of large hospitals today if you stroll through their hallways. Patients who might require a follow-up are flagged by the software. It transcribes notes in real time while listening in on doctor-patient conversations. It involves scanning chest scans and X-rays before a radiologist has had their morning coffee. Something changed in a matter of years. The topic of “could AI work in medicine?” gave way to “AI is already here.” Whether any of this is genuinely improving patients’ health is the question that no one seems to be raising loud enough.
After months of researching the state of health-care AI, MIT Technology Review discovered this unsettling core. Researchers from the University of Michigan and the University of Toronto, who have both spent years observing how these tools are being used in clinical settings, were consulted by the publication through journalist Jessica Hamzelou. It wasn’t exactly a scandal that they discovered. It was something more subdued and, in some respects, more concerning: a general belief that accuracy equates to advantage without supporting data.
Over the past few years, Jenna Wiens, a computer scientist at the University of Michigan, has witnessed a sort of cultural shift. She spent a large portion of her career trying to persuade medical professionals to even think about using AI tools. That opposition mostly vanished in the early 2020s. Not only did healthcare providers show interest, but they also started implementing these technologies quickly, sometimes without stopping to consider what the data actually revealed about patient outcomes. Roughly 65% of American hospitals were already utilizing AI-assisted predictive tools, according to a study conducted in early 2025. Of those hospitals, only two-thirds had assessed the precision of the instruments they were using. Fewer still had examined them for bias.
Consider “ambient AI,” also known as AI scribes, which are devices that automatically produce summaries and notes while listening to doctor-patient conversations. They are currently offered by several businesses, and their uptake has been quick. Anecdotally, there has been a positive response. According to a staff member at a large medical facility in New York, doctors are “overjoyed” by the technology because it allows them to fully concentrate on the person in front of them instead of a keyboard.
According to preliminary research, these tools lessen clinician burnout, which is a genuine and significant issue. This is where the trail ends, though: researchers have examined clinician productivity, provider satisfaction, and the accuracy of the notes. What this means for the patients themselves has not been sufficiently researched. Does relieving a physician of the burden of documentation genuinely alter the way a physician approaches a diagnosis? When AI summarizes patient data, does this have an impact on how medical students, who are still honing their clinical intuition, process the information? These are open-ended questions with no known answers, not rhetorical ones.
It is worthwhile to acknowledge the actual discrepancy between clinical benefit and tool accuracy. It sounds impressive, and it is, that an AI can correctly identify a suspicious shadow on a chest X-ray 94% of the time. However, a number of human factors determine whether that accuracy results in better treatment choices, quicker interventions, or reduced mortality.

These factors include how much a doctor trusts the AI’s read, whether the clinical workflow can handle the output, and whether the patient population is similar to the ones the system was trained on. When implemented in a rural clinic with a different patient profile and fewer specialists to respond to its alerts, a system that was optimized on data from a large urban academic hospital might behave differently. The majority of the current deployment landscape ignores these contextual factors, which are extremely important.
Anyone who has seen industries quickly adopt powerful technologies will recognize a larger pattern here. Rigorous evaluation becomes an afterthought due to the pressure to modernize, stay ahead of the curve, and show innovation. AI in health care might be accurate. In many situations, it might even be truly helpful. However, “useful” must be defined in terms of patient health rather than just workflow metrics or clinician satisfaction ratings.
Wiens and her associates take care to clarify that they are not against the use of AI in healthcare. Stopping adoption is not the aim. It is to demand that adoption be accompanied by the kind of methodical assessment that ought to have been conducted from the beginning and to acknowledge that patients, not software providers, bear the consequences of making a mistake.
It’s difficult to ignore the fact that researching these tools’ downstream effects hasn’t received the same level of urgency as the development and marketing of these tools. Wiens points out that it’s possible that some AI tools are deliberately making patients worse off, but she believes the more likely scenario is that many of them are just less helpful than medical professionals believe. In any case, the industry is outpacing the evidence. Additionally, once the evidence is received, it might need some awkward adjustments.

