Building an Agent Fleet at Scale

I spent the last quarter turning a single overworked assistant into a fleet of forty-odd narrow agents. Here is what actually changed — and the handful of decisions I'd make again on day one.


The pitch for an agent fleet is seductive and mostly wrong. You imagine a swarm that divides work, talks amongst itself, and quietly clears your backlog overnight. What you get on the first try is forty processes all confidently editing the same file, three of them undoing the other one’s work, and a bill that arrives faster than any of the results. The interesting problems are not in the model. They are in the plumbing around it: how work is handed off, who owns a branch, and what happens when two agents wake up believing they’re in charge.

So before any of the clever parts, I want to be honest about the boring premise. A fleet is a distributed system that happens to think. Every failure mode you already know from queues and workers shows up again, except now the workers are non-deterministic and occasionally argue with you. If you’ve operated services at scale, you have most of the instincts you need. If you haven’t, the model will teach you those instincts the expensive way.

Three decisions did most of the work. Split the job into narrow agents so each one keeps a small, sharp context. Give every agent its own worktree and branch so nothing can stomp anything else. Fan out only when the math says the saved time beats the spawn cost. The rest was detail.

Why a fleet instead of one big agent

The single-agent design fails for the same reason a single overloaded engineer fails: context. One agent holding the entire problem in its head spends most of its budget re-reading state it already saw, and the quality of its output sags as the conversation grows. A long session ships worse work on its tenth task than its first — not because the model got dumber, but because the signal it needs is now buried under everything it has already done.

Splitting the work fixes the economics more than the intelligence. A narrow agent with a sharp brief and a small context window is cheaper, faster, and easier to reason about than one generalist trying to be everything. The cost isn’t free — you pay for it in coordination — but coordination is a problem you can engineer. Lost context is not.

A fleet that spawns on reflex is just a more expensive way to be slow. The skill isn’t parallelism — it’s knowing the handful of moments parallelism actually pays. — a note I taped to my own dispatch logic

Isolation is the whole game

The first week I let agents share a working directory. By Thursday I had a HEAD that pointed at a commit no agent remembered making, an index full of half-staged changes from three different tasks, and a watcher process that had helpfully stashed someone’s uncommitted work mid-edit. The fix was unglamorous and total: one task, one git worktree, one branch, one pull request. Every agent that mutates the repo gets its own checkout. If a run dies, its partial work survives on its own branch instead of poisoning everyone else’s.

Worktrees buy you the thing distributed systems people care about most — failure containment. A crashed agent can’t corrupt a sibling’s tree because it never touched it. Merge conflicts become a deliberate, reviewable step at the end rather than a silent race in the middle. The moment I adopted hard isolation, the class of bug that ate that first week simply stopped existing.

# one task = one worktree = one branch = one PR
REPO="$(git rev-parse --show-toplevel)"
WT="${REPO}/.work/task-${SLUG}-$(date +%s)"

git -C "$REPO" fetch origin master
git -C "$REPO" worktree add "$WT" origin/master
cd "$WT"

# pause the rebase watcher so it can't stash mid-run
touch "$(git rev-parse --git-path skip-auto-rebase-watcher)"

echo "agent isolated in $WT"

Notice what the script does not do: it never assumes the parent directory is clean, and it never shares a checkout. That paranoia is the point. The cost is a few seconds of setup per task and some disk; the payoff is that no two agents can ever stomp each other’s HEAD.

Knowing when to fan out

Parallelism is not free, and the instinct to spawn is usually too eager. Every cold spawn re-pays a fixed setup cost — loading the brief, warming the cache, establishing the worktree — so a fleet that fans out for trivial work spends more on overhead than it saves on wall-clock time. The rule I settled on is almost embarrassingly simple: parallel wins only when the saved time beats the spawn tax.

Concretely, if you have N independent pieces of work and each takes t seconds, fanning out beats doing them in sequence only when (N − 1) × t is larger than the spawn cost. For two tasks that means each must be genuinely slow to justify a second agent; for ten, you almost always fan out. Below that threshold, you inline the work and skip the ceremony. The discipline of doing that arithmetic — instead of spawning on reflex — is what keeps the bill sane.

Three months in, the fleet does real work: it audits its own pull requests, files the findings as issues, routes each to the right narrow agent, and loops until a human is the only thing standing between a fix and the main branch. None of that came from a smarter model. It came from treating the agents like what they are — a herd of fast, forgetful, occasionally brilliant workers — and building the fences that let a herd be useful. Part three will get into the review loop itself, which is where the fleet finally started earning its keep.

Common questions

Why use a fleet of narrow agents instead of one large agent?

A single agent holding the whole problem spends most of its budget re-reading state it already saw, and quality sags as the session grows. A narrow agent with a sharp brief and a small context window is cheaper, faster, and easier to reason about. You pay for it in coordination, which you can engineer. Lost context you cannot.

What is the most important rule for running agents safely?

Hard isolation: one task, one git worktree, one branch, one pull request. Every agent that touches the repo gets its own checkout, so a crashed run cannot corrupt a sibling, and merge conflicts become a deliberate review step instead of a silent race.

When is it worth fanning out to parallel agents?

Only when the saved time beats the spawn tax. With N independent pieces of work that each take t seconds, parallel beats sequential when (N − 1) × t is larger than the cost of a cold spawn. For two slow tasks it can pay. For ten it almost always does. Below that, inline the work.

What problems show up first when running many agents?

The hard problems are in the plumbing, not the model: handoffs, branch ownership, and two agents both believing they are in charge. A fleet is a distributed system that happens to think, so the failure modes you know from queues and workers all return.