Herding AI agents

How AI-focused development shifted from pairing with one junior agent to managing several AI workers in parallel.

7 min readai-engineeringworkflowagentsessay

Six months ago, AI-focused development still had a fairly simple shape: I was the developer, and next to me was an AI agent acting like a very fast junior. It could do almost anything. But it needed close management. A lot of it. I had to break down the task, feed it context, correct decisions, review the result, intervene again, and then make the instruction sharper.

That was powerful, but it was still coding with assistance. I still had the keyboard in my hands. The agent was support, not the operating model.

That has turned around completely for me. The interesting question is no longer, "How do I get one agent to solve this one task correctly?" The more interesting question is, "How do I manage several AI workers so useful work happens in parallel?"

The bottleneck is no longer whether an agent can write code. The bottleneck is whether I can structure the work well enough.

From junior assistant to small team

The first mode was easy to understand. One agent gets one task. It starts working. I watch. If it drifts, I stop it. If it forgets something, I remind it. If it produces something wrong, I write the instruction more clearly and send it back in.

That is a familiar dynamic. It feels like working with a junior developer, not as an insult, but as an operational reality. Someone is fast, capable, and willing to attempt almost anything, but not reliable enough to own the shape of the work without guidance.

The new mode feels different. I am no longer just working with one agent. I am working with three to ten AI workers in parallel. One looks at architecture. One looks at UI details. One looks at tests. One audits security. One looks at SEO. One reviews code quality. Not always all at once, and not always with such clean separation, but the pattern is clear: I am less the person producing every line myself and more the person assigning work, reviewing intermediate states, and making decisions.

That sounds bigger than it sometimes feels day to day. It is not magic. It is more like the job changed. The old reflex was: "I will implement this." The new reflex is: "Which parts of this can run in parallel, and which decisions must stay with me?"

Control replaces typing

The surprising part is how quickly your sense of productivity changes.

A good day used to be a day where I wrote a lot of code. Now a good day can be one where I barely write code at all, but keep several agents moving in the right direction. I read plans. I review diffs. I reject proposals. I ask for second opinions. I decide which recommendation becomes work and which one only sounded good in the abstract.

That is not less work. It is different work.

It also creates a different class of mistakes. When I write code myself, I make local mistakes: a wrong assumption, a bad name, a bug in an edge case. When I manage several agents, the mistakes are more often coordination mistakes. Two pieces of work do not fit together. An agent optimizes in the wrong direction. An audit finds real issues but does not prioritize them usefully. A proposed solution is technically correct and still strategically wrong.

The human does not disappear from the process. The human moves one level up.

Audits as AI workers

My biggest learning in this new mode is the value of specialized AI workers that I do not send out to implement, but to inspect.

A security audit is a different task than "fix this component." A code quality audit is a different task than "clean up this file." An SEO audit is a different task than "change this title." The value is not that the agent immediately fixes everything. The value is that it gives me an inventory I can read, evaluate, and turn into decisions.

That distinction matters.

When an agent implements directly, I need to review its solution. When an agent audits, I need to review its observations. The second is often more useful because it gives me a map before I decide where to go.

I have come to like this role a lot: send an agent out with a clear inspection lens, let it create an inventory, then sort the results. What is actually a problem? What is just taste? What is urgent? What matters later? What is a correction, and what would be unnecessary reconstruction?

That is where the leverage is. Not in blindly executing agent output, but in making better decisions because several useful perspectives arrived at the same time.

The hard part is task design

The more agents work at the same time, the less the system forgives unclear tasks.

With a single agent, I can still absorb ambiguity. I see quickly when it turns in the wrong direction. With several agents in parallel, bad task design becomes expensive. Then it is not one agent producing mediocre output. It is several agents producing work at the same time that I have to untangle later.

The task cannot only say what should happen. It also has to say what the agent is not allowed to decide. Which files are off limits. What kind of output is expected. Whether the agent should write or only read. Whether it may make changes or should produce a recommendation first. Whether it must respect existing patterns or explicitly look for alternatives.

This feels less like prompting and more like management. Not in the bureaucratic sense, but in the practical sense: shaping work so someone else can do it usefully.

That is the skill I currently care about most. Not "writing better prompts" as an isolated trick, but describing work so several semi-autonomous systems can produce useful results in parallel without everything collapsing afterwards.

The developer as project manager

I do not think this made development easier. It made development faster, but not automatically easier.

The old bottleneck was often execution. I knew what needed to happen, but it still had to be written, adjusted, tested, and refined. The new bottleneck is more often judgment. What is the right task in the first place? Which work can run in parallel? Which result do I trust? Which recommendation do I ignore? When do I ask a second agent to check something? When do I stop the round because there is enough information?

That is a very different identity than "I am the developer."

Technical understanding still matters. Maybe it matters more. Without my own technical foundation, I cannot evaluate agent output. I cannot set good boundaries. I cannot tell whether an audit found a real structural issue or merely sounds convincing. I cannot know whether a solution fits the style of the project.

But the daily work shifts. I am less a worker at the machine and more someone leading a small, fast, sometimes chaotic team. A team that does not get tired, but also does not have real ownership. A team that can do a lot, as long as I am clear enough.

What remains

For me, this is the real change of the last few months. Not that AI agents can suddenly write code. They could already do that impressively well. The change is that they are becoming useful as parallel workers.

That does not make the job passive. It makes it active on another level. More overview. More interfaces. More control. More decision-making.

Maybe that is why "herding AI agents" is the best description I have right now. It is no longer one tool I operate. It is a small group of specialized, fast, unevenly reliable workers that need direction.

And my job is no longer to do every task myself.

My job is to make sure the right tasks get done at all.

← back to all posts