Ontario AI Scribes: 60% Recorded the Wrong Medication

5 min read
Article

Ontario's Auditor General tested 20 AI scribes approved by the province. Average accuracy: 12 out of 20. The easiest use case for AI in healthcare just collapsed at scale.

The free AI newsletter
Ontario AI Scribes: 60% Recorded the Wrong Medication

Five thousand Ontario physicians now rely on artificial intelligence to transcribe their consultations for them. On average, they save between five and seven hours per week. On May 12, provincial Auditor General Shelley Spence released the findings of an audit covering 20 of these tools, all approved by Supply Ontario for clinical use. Average accuracy on simulated consultations: 12 out of 20.

Nine of those twenty scribes invented treatments that were never mentioned in the conversation, including a reference to a total heart ablation. Twelve out of twenty recorded a different medication from the one prescribed. Seventeen out of twenty missed critical information related to the patient's mental health. The audit does not concern an academic prototype. It concerns tools already deployed in physicians' offices.

The use case that was supposed to be easy

For two years, AI scribes have held the same spot in the connected-health narrative. They were the tools that would prove the value of medical AI by example. The benefit was easy to read: free the physician from note-taking so they could actually look at the patient. The risk seemed bounded: you transcribe, you don't diagnose. The institutional validation was in place, with Ontario officially listing 20 vendors after a Supply Ontario procurement process.

This is the gentlest possible slope to introduce a language model into a consultation. No dosage calculations, no imaging interpretation, no therapeutic decisions. Just an enriched transcription. If this case holds, you can start talking about the others.

The May 12 audit dismantles that sequence. The simplest case does not hold, and it does not hold for reasons that have very little to do with the technology itself.

The real bug lives in the procedure

The most revealing element of the report is not the list of errors. It is the weighting that allowed these tools to be approved in the first place.

In the Supply Ontario evaluation grid, the accuracy of AI-generated medical notes counted for 4% of the total score. The "local presence in Ontario" criterion weighed 30%. Eleven of the twenty vendors did not submit the third-party audit reports or the ISO 27001 certification that were nonetheless required. Five of them failed to provide either a risk assessment or a privacy impact study. All were approved.

In other words, on Ontario's public market for medical transcription, being headquartered in Toronto mattered more than producing accurate notes. The government had built a tender where regional economic development outweighed patient safety by a factor of seven. AI made no decision here. The people who drafted the procurement rules did.

The ministerial defense does not hold

When questioned on the report, Minister Stephen Crawford offered two arguments. First, that the hallucinations appeared during the testing phase, not during real consultations. Second, that physicians review the notes before validating them in the patient record.

Both objections wobble at the first round of fact-checking.

The initial evaluation that led to the approval of the 20 tools was, according to the report, "performed several years ago". In the meantime, 5,000 Ontario physicians are recording their consultations with these tools. The gap between "testing phase" and a deployment of that scale suggests that the testing phase is happening right now, on real patients. The Auditor herself reports having observed an inaccurate note during her own personal consultation.

As for the mandatory physician review, it does not exist as a regulatory requirement. The government published guidelines recommending manual review. Recommending is not enforcing. When a physician saves between five and seven hours per week thanks to a tool, the economic incentive to re-read every line is not on the side of control.

Why this shifts the read on AI in healthcare

The event documents less the unsuitability of AI for medicine than the failure of a market entry procedure that under-weighted clinical accuracy in favour of an economic criterion. The core problem lives inside the validation method, not inside the language model running under the hood.

The comparison with Europe is instructive. In the Netherlands, the RIGH:T consortium brings together healthcare institutions to build a validation framework for AI scribes before deployment, with explicit measurement of hallucinations, missing information and bias. France and several other European Union countries are at the pilot stage, with no public audit report comparable to Ontario's. The AI Act has classified these tools as "high risk" since 2025, which in theory imposes a more demanding prior evaluation.

The European sequencing is slower and more cautious. Ontario did the opposite: deploy, then audit. Shelley Spence's report is the bill for that choice.

What this means for what comes next

The report contains ten recommendations. The ministry has accepted five. None of them, at this stage, mandates the withdrawal of the failing tools. None of them mandates compulsory review. The machine keeps running while the corrections are being discussed.

The scribes were supposed to be the easy test. Diagnostic AI, prescription assistance and automated imaging are materially riskier cases. If a public procurement procedure accepted to weight accuracy at 4% for a transcription tool, one can wonder about the grid that will be used for the next round of use cases.

AI in healthcare will work. Not this way, and not right away. On May 12, 2026, Ontario demonstrated that the operational question has shifted its centre of gravity: the technology is now ahead of the administrations responsible for buying it.

Topics covered:

HealthAnalysis

Frequently asked questions

What is an AI medical scribe?
An AI medical scribe is software that listens to the consultation between a doctor and a patient, then automatically drafts the clinical note for the patient record. The goal is to free the physician from manual note-taking.
How many Ontario physicians use an AI scribe?
Roughly 5,000 Ontario physicians use an AI scribe, according to the report published on May 12, 2026 by Auditor General Shelley Spence. These doctors reportedly save between five and seven hours per week.
What were the main findings of the Ontario audit on AI scribes?
Across 20 audited tools, 12 logged a different medication than the one prescribed, 9 invented treatments that were never discussed, and 17 missed critical mental health information. Average accuracy: 12 out of 20.
Why were these AI scribes approved despite the errors?
In the Supply Ontario scorecard, note accuracy weighed only 4% of the total, against 30% for local presence in Ontario. Eleven of the twenty vendors had not produced the required third-party audits, and were approved anyway.
Are physicians required to review AI-generated notes?
No. The Ontario government published guidelines recommending manual review, but did not make it mandatory. Systematic review is therefore neither enforced nor verified.
The free AI newsletter