Almost every AI vendor pitch has a point where the numbers seem a bit too tidy, usually around slide fourteen. Over 90% accuracy. Two-week integration timelines. EHR compatibility that is flawless. The branding is polished, the representative across the table exudes confidence, and the case studies come from sufficiently reliable systems. It’s simple to get carried away. Hospital procurement teams have a lot of them. And many of them have secretly regretted it.
It’s not that AI in healthcare isn’t effective. A portion of it actually does. The issue is that contracts signed without the proper questions often collapse right into the gap that can exist between a vendor demo and a deployed clinical environment. You begin to see patterns when you observe this occurring repeatedly in various health systems.
Consider the issue of performance guarantees. Technical metrics, such as AUC scores and sensitivity rates, which appear impressive in a boardroom but have a different meaning in an intensive care unit at two in the morning, are used by the majority of vendors. For example, in internal testing, Epic’s sepsis model reportedly achieved an AUC of between 0.76 and 0.83. It only detected 7% of sepsis cases prior to a clinician and had an 88% false positive rate when externally validated. The metric wasn’t precisely incorrect. Simply put, the metric was incorrect. Healthcare organizations must demand clinical outcome benchmarks, such as mortality, readmissions, and length of stay, from vendors before signing any contracts. These benchmarks should ideally be validated at independent sites rather than just technical scores.
The next area where vendor language tends to become ambiguous is data protection. A Business Associate Agreement must be in place for any AI system that handles Protected Health Information. That is a HIPAA baseline, not a point of negotiation.

However, the details of that agreement are very important. Does it forbid the vendor from training their models with data from your patients? Is there a 24-hour or 60-day window for reporting breaches? When the contract expires, what happens to the data? When a purportedly de-identified dataset with ZIP codes, ages, and diagnosis codes was linked to specific patients, one hospital discovered this the hard way. The $2.3 million fine was imposed. Calculating the reputational cost was more difficult.
Strangely, considering how much can go wrong, liability terms typically receive the least attention. You shouldn’t have to figure out who is at fault after the fact when an AI system causes a clinical error, such as a missed diagnosis or an incorrect drug interaction flag. Contracts must specify indemnity precisely and set liability limits that are commensurate with the risk. Vendors don’t always give you the whole story, as demonstrated by the September 2024 settlement between the Texas Attorney General and a Dallas-based health tech company over false accuracy claims. It is reasonable governance, not paranoia, to have a safety exit clause that is linked to performance benchmarks.
There’s also the slower, less dramatic risk of model drift. An AI system trained on 2022 patient data may behave differently by 2025. Populations shift, treatment protocols change, and models that once performed well can quietly degrade. Healthcare organisations need to ask vendors, directly and in writing, how they monitor ongoing performance, how often models are retested, and who initiates a review when something looks off. Long-term governance isn’t flashy, but it’s probably where the most preventable failures happen.
In 2026, regulatory expectations are also tightening. The FDA, the EU AI Act, and frameworks like the “Guiding Principles of Good AI Practice” introduced in January 2026 mean that compliance isn’t a static checklist. It’s a moving target. Vendors need to demonstrate ongoing alignment with these standards โ not just promise it. Similarly, the question of explainability has moved from a philosophical nicety to a practical necessity. Clinicians are increasingly being asked to justify AI-assisted decisions, and systems that can’t produce an audit trail or a readable rationale for their output create legal exposure that no hospital legal team should be comfortable with.
The remaining queries may sound like vendor relations housekeeping, but they deal with AI-specific security, third-party subprocessor chains, human oversight procedures, and collaborative risk management. They’re not. When third-party providers were involved in AI model-related breaches in 2025, the average damages were $4.91 million. It is not excessive due diligence to request a full subprocessor list and a SOC 2 Type II report. It is the bare minimum.
It’s difficult to ignore the fact that companies that successfully handle AI procurement typically have one trait in common: they are prepared to cause discomfort for the vendor prior to signing. It’s not hostile. Contracts pertaining to patient safety ought to operate in this manner.

