The pitch for an agent fleet is seductive and mostly wrong. You imagine a swarm that divides work, talks amongst itself, and quietly clears your backlog overnight. What you get on the first try is forty processes all confidently editing the same file, three of them undoing the other one’s work, and a bill that arrives faster than any of the results. The interesting problems are not in the model. They are in the plumbing around it: how work is handed off, who owns a branch, and what happens when two agents wake up believing they’re in charge.
So before any of the clever parts, I want to be honest about the boring premise. A fleet is a distributed system that happens to think. Every failure mode you already know from queues and workers shows up again, except now the workers are non-deterministic and occasionally argue with you. If you’ve operated services at scale, you have most of the instincts you need. If you haven’t, the model will teach you those instincts the expensive way.
Three decisions did most of the work. Split the job into narrow agents so each one keeps a small, sharp context. Give every agent its own worktree and branch so nothing can stomp anything else. Fan out only when the math says the saved time beats the spawn cost. The rest was detail.
Why a fleet instead of one big agent
The single-agent design fails for the same reason a single overloaded engineer fails: context. One agent holding the entire problem in its head spends most of its budget re-reading state it already saw, and the quality of its output sags as the conversation grows. A long session ships worse work on its tenth task than its first — not because the model got dumber, but because the signal it needs is now buried under everything it has already done.
Splitting the work fixes the economics more than the intelligence. A narrow agent with a sharp brief and a small context window is cheaper, faster, and easier to reason about than one generalist trying to be everything. The cost isn’t free — you pay for it in coordination — but coordination is a problem you can engineer. Lost context is not.
A fleet that spawns on reflex is just a more expensive way to be slow. The skill isn’t parallelism — it’s knowing the handful of moments parallelism actually pays. — a note I taped to my own dispatch logic
Isolation is the whole game
The first week I let agents share a working directory. By Thursday I had a HEAD that pointed at a commit no agent remembered making, an index full of half-staged changes from three different tasks, and a watcher process that had helpfully stashed someone’s uncommitted work mid-edit. The fix was unglamorous and total: one task, one git worktree, one branch, one pull request. Every agent that mutates the repo gets its own checkout. If a run dies, its partial work survives on its own branch instead of poisoning everyone else’s.
Worktrees buy you the thing distributed systems people care about most — failure containment. A crashed agent can’t corrupt a sibling’s tree because it never touched it. Merge conflicts become a deliberate, reviewable step at the end rather than a silent race in the middle. The moment I adopted hard isolation, the class of bug that ate that first week simply stopped existing.
# one task = one worktree = one branch = one PR
REPO="$(git rev-parse --show-toplevel)"
WT="${REPO}/.work/task-${SLUG}-$(date +%s)"
git -C "$REPO" fetch origin master
git -C "$REPO" worktree add "$WT" origin/master
cd "$WT"
# pause the rebase watcher so it can't stash mid-run
touch "$(git rev-parse --git-path skip-auto-rebase-watcher)"
echo "agent isolated in $WT"
Notice what the script does not do: it never assumes the parent directory is clean, and it never shares a checkout. That paranoia is the point. The cost is a few seconds of setup per task and some disk; the payoff is that no two agents can ever stomp each other’s HEAD.
Knowing when to fan out
Parallelism is not free, and the instinct to spawn is usually too eager. Every cold spawn re-pays a fixed setup cost — loading the brief, warming the cache, establishing the worktree — so a fleet that fans out for trivial work spends more on overhead than it saves on wall-clock time. The rule I settled on is almost embarrassingly simple: parallel wins only when the saved time beats the spawn tax.
Concretely, if you have N independent pieces of work and each takes t seconds, fanning out beats doing them in sequence only when (N − 1) × t is larger than the spawn cost. For two tasks that means each must be genuinely slow to justify a second agent; for ten, you almost always fan out. Below that threshold, you inline the work and skip the ceremony. The discipline of doing that arithmetic — instead of spawning on reflex — is what keeps the bill sane.
Three months in, the fleet does real work: it audits its own pull requests, files the findings as issues, routes each to the right narrow agent, and loops until a human is the only thing standing between a fix and the main branch. None of that came from a smarter model. It came from treating the agents like what they are — a herd of fast, forgetful, occasionally brilliant workers — and building the fences that let a herd be useful. Part three will get into the review loop itself, which is where the fleet finally started earning its keep.