Warm handoffs at data scale.
Some patients need a clinician's attention but never get it. Not because the model is wrong, but because no human can review every overnight SpO₂ trace or call every post-cataract patient daily. AI absorbs the data volume that was hiding those patients, then delivers a warm handoff a clinician can act on. Two domains, same posture: NICU monitoring that surfaces the overnight trace a neonatologist needs to see; post-cataract symptom check-ins that route escalations into same-day call-backs with the transcript pre-loaded. Both examples are AI evals. The pipeline and scoring are real, based on synthetic patient data.
Overnight neonatal SpO₂ traces turned into warm clinical handoffs, not cold alerts. Three LLM judges scored clinical accuracy, handoff quality, and artifact handling across four iterations (10 traces, 30 evals each). V2 regressed handoff quality from 90% to 30%. The eval caught it. V3 (2026-04-29): 90 / 100 / 100, and 0 urgent false negatives across 400 traces.
Open Project ↗Between-visit patient symptoms turned into same-day clinical escalations. A conversational agent runs a 5-symptom interview (Wanten 2025), classifies each response, and on escalate hands the clinician a same-day call-back with the patient's transcript pre-loaded, not a blank intake form. 12 of 13 should-escalate cases caught. The miss, week-4 asymptomatic PCO, sits in the 92%: symptom interviews can't catch structural findings.
Open Project ↗