POV · Essay 01 · 2026-05

Why most SMB AI pilots die at month 3.

Three death-modes that have nothing to do with the model. And what survives them.

A four-week pilot is a coronation. The twelfth week is a divorce.

The agent ships. The dashboard says it’s working. Tier-1 deflection rate is climbing through the 60s; the CSM team has their first quiet Tuesday in two years; the founder has already mentioned the win on LinkedIn. Then somewhere between week eight and week fourteen, the Slack messages slow down, the deflection rate flattens, and the people who were calling this thing magical six weeks ago start asking if anyone is still running it.

Talk to enough buyers of completed pilots, and the pattern shows up: month three is when most SMB AI pilots actually die. Not the visible kind of death — nobody announces it. It’s a slow disengagement, and by the time someone notices the agent has been hallucinating for two weeks, the team has already silently routed around it.

There are three death-modes. They have nothing to do with the model.

01

The maintenance gap.

The pilot was a project. After cutover, it has to become a product. Nobody plans for that.

Specifically: somebody has to own the prompts. Not “the team” — a person, by name, with calendar time. Prompts drift the same way any other code does. New customer questions come in. Edge cases reveal themselves. The KB gets a new section. The model provider issues a quiet capability change. Each of these requires a small adjustment — a few sentences, twenty minutes — and if nobody is on the hook for those twenty minutes, the agent’s accuracy decays at maybe two percentage points per week. That’s invisible at week one. Fatal by week ten.

The buyer of the pilot was almost always a manager looking to free up their team. The product hasn’t found an owner. The agency walked away after cutover because they were paid to ship, not to maintain. The CSM team isn’t an engineering team. The agent is now an orphan.

02

The knowledge-base drift.

The agent’s outputs are only as accurate as the corpus it draws from. Most pilots wire up a KB connector — Notion, Confluence, internal docs — and assume the corpus is stable. It isn’t.

In a healthy SMB, the KB is changing every week. New product features. Old workarounds that became permanent. Pricing-page revisions. A pilot launched against the corpus on March 1st is sourcing answers from a March 1st reality by month three, and customers are asking June 15th questions. The agent will confidently tell a customer that pricing tier B includes feature X, when feature X moved to tier A two months ago. Nobody will know until a customer complains.

The fix is operational, not technical: someone needs to flag KB changes that affect agent outputs and update the prompts — or the retrieval logic — accordingly. That role doesn’t exist on most SMB teams. The agent fills the void, then drowns in it.

03

The boundary creep.

The pilot worked because we tested narrowly. The agent answered tier-1 questions, drafted replies, escalated tier-2 with context. That was the contract.

After two weeks of success, the human team starts pushing on the boundaries. “Could it also handle billing questions?” Sure, with a tweak. “Could it draft escalation summaries?” Of course. “Could it kick off the weekly NPS report?” Each yes is reasonable in isolation. None of them was tested in shadow. None of them was measured against the original baseline.

By month three, the agent is doing a job nobody designed it to do, on edge cases nobody validated, with prompts that have been quietly extended past their tested envelope. When it breaks, the failure looks like an AI failure. It’s actually a process failure: scope grew faster than measurement.

What these three death-modes have in common is that they’re operational, not technical. None of them gets fixed by upgrading from Claude Sonnet to Opus or by switching agent frameworks. They get fixed by treating the agent as a product that needs ongoing care — not a project that ended at cutover.

Which is why the unit of pricing matters. If the engagement is “we built the agent for $40k,” you’ve created an artifact. Artifacts decay. If the engagement is “we built the agent and we’re attached to it for $X/week as long as it’s running,” you’ve created an obligation. Obligations get maintained.

Most agencies don’t structure it that way because it’s a less impressive number on the proposal. $40k flat sounds like a real product engagement. $400/month for two years sounds like a lawyer billing. But the lawyer-billing version is the one that’s still alive at month twelve.

This is also why our pricing has a $100/week support tier — and why we’d rather not ship without it.

It’s not an upsell. It’s the survival mechanism.

When clients push back — “we’ll just maintain it ourselves” — we ask who specifically, with what calendar time, with what monitoring infrastructure, with what process for rolling back a prompt change that broke something subtle. The answer, almost always, is “we’ll figure it out.” That’s the answer that produces month-three death.

A hundred dollars a week isn’t much. That’s the point. It’s small enough that it doesn’t get cancelled when budgets tighten, large enough that an actual person at our end has weekly skin in the game. Two hours a week — monitoring, prompt tuning, drift fixes, a written check-in. Cancel anytime, but most clients don’t, because they remember what month three felt like the last time.

The pilot is the first four weeks. Adoption is the next year. Agencies that price the four weeks and walk away are selling you a coronation. Agencies that price the year are selling you something durable.

Pick the second kind.

If the math on a hundred dollars a week doesn’t work for the agent we’d build, the agent isn’t worth building. Discovery sessions exist to figure that out before either of us has spent serious money.

More essays coming

One essay per quarter. No newsletter.

Want the next one when it ships? Email us with “POV” in the subject and we’ll add you to the short list. We don’t do drip campaigns. You’ll hear from us four times a year, max.