Most of the toolkit works on documents. The harder questions need an agent that can actually touch the data. The ones I've built run on a small, reusable engine, and that same engine is what the rest could be built on. I've been practicing it on synthetic claims before it matters on real ones.
It's one loop. I ask for something in plain English, it works out the codes, runs a single read-only query, and hands back the members and the count. When the ask is ambiguous, like what counts as a GLP-1, it stops and asks instead of guessing. That loop is the reusable part. Everything below is the same loop with one more step on the cohort it returns.
The first thing I pointed it at is building the cohort itself. It's a Claude Code plugin: a read-only connection to a synthetic claims database, and a subagent that knows how to query it. I ask for a cohort in plain English, it works out the codes, and it pulls the members in one query. When the ask is ambiguous, like what counts as a GLP-1, it stops and asks me instead of guessing. It hands back the count and a small sample rather than the whole member list, so the same ask works no matter how big the panel gets.
find diabetic patients (E10/E11/E13) on a diabetes-indicated GLP-1 only, continuously enrolled in 2025, age 18-75, with no HbA1c test in 2025
It runs on synthetic claims by design. The mechanics transfer, the moat doesn't.
The next thing I pointed it at: once it has a cohort, who pays for those patients and where they are. Same plugin, same ask-first discipline, with one more step on the cohort the loop returns. I give it a cohort in plain English, it builds the cohort the same way, then it breaks the members down by plan type and by state, with counts and percentages and a plan-by-state cross-tab. It counts each member once, and it suppresses any cell too small to share, the way you have to with real data. When it's not clear who belongs in the denominator, everyone with the diagnosis or only the continuously enrolled, it stops and asks.
It also holds at real volume. The plan-by-state breakdown on a two-million-member panel comes back in about two seconds. The first cut was slow, and the slow part was never the database, it was handing the whole member list back through the model on every step. Now it builds the cohort once and passes back a handle and a count instead of the list, so the size of the cohort stops mattering.
take everyone with a diabetes diagnosis (E10/E11/E13) and break them down by plan type and state
It runs on synthetic claims by design. The mechanics transfer, the moat doesn't.
The third thing I pointed it at: once it has a cohort, put their pharmacy claims in order over time. Same plugin, same ask-first discipline, one more step on the cohort. I give it a cohort and a therapy in plain English, it builds the cohort the same way, then it walks each member's fills across the year and reports lines of therapy, switches from one drug to another, add-ons where a second drug starts alongside the first, and how long people stay on before they stop. When the ask doesn't name a therapy to follow, or says "a GLP-1" without saying which, it stops and asks.
The one real choice here is what counts as a switch. By default it counts at the brand level, the way payers usually do, so Ozempic to Rybelsus counts even though both are semaglutide. Ask for the molecule and that kind of same-drug brand change drops out; ask for the therapeutic class and the within-class switches drop out too. It's the same cohort either way; the granularity is a real decision, and the agent states the one it used. Small groups are suppressed, the way you have to with real data.
take type 2 diabetics continuously enrolled in 2025 and put their GLP-1 and metformin therapy in order: lines of therapy, switches, and how long they stay on
It runs on synthetic claims by design. The mechanics transfer, the moat doesn't.
The fourth thing I pointed it at: once it has a cohort, how a product is doing over time. Same plugin, same ask-first discipline, one more step on the cohort. I give it a product and a cohort in plain English, it builds the cohort the same way, then it counts that product's scripts by quarter and by plan, and breaks out the most recent quarter by plan. When I want a specific plan's number for last quarter, it reads that cell. When the ask doesn't name a product, or asks how something is performing without saying whether I mean total scripts or new patients, it stops and asks.
The one real choice here is what to count. Total scripts is every fill; new starts is each patient's first one, and they answer different questions, how much is being dispensed versus how fast a product is being picked up. The agent makes me pick and says which it used. The trend only means something if the data has a real one: most of the synthetic fills land in January, so almost any product looks like it's sliding off through the year, which is just an artifact of how the data was built. So the panel seeds a real shift, one product steadily gaining new patients across 2025 and another losing them, and the new-start curve picks it up. Small groups are suppressed, the way you have to with real data.
take everyone with an Ozempic fill and show scripts by quarter and by plan, with last quarter broken out by plan
It runs on synthetic claims by design. The mechanics transfer, the moat doesn't.
Same engine, same ask-first discipline, pointed at a different question. Each one is the cohort plus a single step, built for external teams, pharma and payers. These are possibilities, not a roadmap. The rest aren't built yet.
These don't ask the agent to read the notes. They assume the notes and EHR are already linked in as structured fields, and point the same engine at them. Same loop, richer data.