The multi-agent temptation

The Part 1 single-agent loop handled one question, one tool, one answer. The Part 4 state machine handled a task with phases. But some work naturally wants to be divided: research, challenge, synthesize—three different jobs, three different system prompts, three different temperatures, three different lenses on the same input. Asking one agent to wear all three hats produces the kind of confidently mediocre middle that nobody ordered.

The pattern that survives is a crew: a small collection of specialists, each focused on one job, coordinated by a runner that decides who answers what. Claude Code CLI builds this directly into the product. Its AgentTool spawns built-in subagents (explore for read-only search, plan for read-only planning, verification for adversarial checking, general-purpose for the catch-all) and the main agent delegates to them by name. The architecture is what makes a single CLI feel like five.

Multi-agent crew: researcher and skeptic run in parallel, synthesizer composes their outputs Multi-agent crew pattern: researcher and skeptic fan out in parallel; the synthesizer composes both outputs into one balanced answer.

This post writes the same shape in Rust. Eugene v0.5 introduces eugene-crew: an Agent trait, a Crew that runs several agents in parallel via tokio, a Router to pick the right specialist for a free-form query, and a debate function that pits two agents against each other when accuracy matters more than throughput. The Part 3 skill registry and the Part 4 graph runner both come along; what is new is the layer above them where multiple loops talk to each other.


The multi-agent trap

Before the architecture, the warning. The source material on multi-agent design opens with a one-word sentiment: stop it. Adding agents is the easy answer to a hard problem, and it almost always makes things worse before it makes them better. Coordination tax is brutal: every extra agent means more latency, more tokens, more places for a hallucination to enter the chain, and more confusion when something goes wrong. A two-agent setup costs roughly twice as much and twice as long as a one-agent setup for the same end result. Three agents cost three times as much. Five agents is the architecture that looks impressive in a slide deck and falls apart the first time anyone has to debug a real failure.

A multi-agent crew is worth its overhead only when the roles are genuinely distinct. Researcher and writer are distinct roles: one looks outward and gathers, the other looks inward and composes. They use different prompts well. Researcher and researcher-with-slightly-different-temperature are not distinct roles. Splitting one job into two does not buy you anything; it doubles your bill for the privilege of averaging.

The right test for adding an agent is: would a person doing this job ask their colleague the same question, or a different one? If the same question, you have one agent. If the colleague’s specialism would let them notice something the first one missed, you have two.


The Agent trait

An agent is a function from a free-form query to a free-form answer. Implementations decide how to produce that answer: an LLM call with a focused system prompt, a chain of tool uses through Part 3’s registry, a database lookup. The runtime does not care.

#[async_trait]
trait Agent: Send + Sync + 'static {
    fn description(&self) -> AgentDescription;
    async fn handle(&self, query: &str) -> Result<String, CrewError>;
}

struct AgentDescription {
    pub name: String,
    pub summary: String,
}

The description does two jobs. It identifies the agent by name so the crew can dispatch to it. It carries a one-line summary that a router can show to a model when picking which specialist to invoke. Claude Code’s AgentDefinition has roughly the same shape: a name and a description the model sees as part of AgentTool’s tool list.

The concrete agents in the Part 5 gist are all the same shape: a Specialist struct holding a system prompt, an HTTP client, and an API key. The differences are in the prompts.

fn researcher(http: reqwest::Client, api_key: String) -> Specialist {
    Specialist {
        name: "researcher",
        summary: "gathers facts and concrete examples on the topic",
        system: "You are the researcher on a small editorial team. Given a \
                  topic, list 3-5 concrete facts, trends, or examples that an \
                  informed person would care about. Be specific and current. \
                  No hedging. Plain prose, no markdown.",
        http,
        api_key,
    }
}

skeptic and synthesizer follow the same shape with their own system prompts. The persona is the entire difference between them. Three structs, identical fields, divergent voices.


The Crew runner

The crew owns the dispatch table. Adding an agent is one line; running one is one async call. The interesting methods are the parallel and sequential variants.

async fn run_parallel(
    &self,
    calls: &[(String, String)],
) -> Vec<(String, Result<String, CrewError>)> {
    let futs = calls.iter().map(|(name, query)| async move {
        let name = name.clone();
        let result = match self.agents.get(&name) {
            Some(a) => a.handle(query).await,
            None => Err(CrewError::UnknownAgent(name.clone())),
        };
        (name, result)
    });
    join_all(futs).await
}

join_all from futures is what makes this cheap. Each call becomes a future, all futures wait on independent network round-trips, and tokio polls them concurrently in the same task. Two agents that each take a second produce both answers in one second total, not two. The same join_all powered the Part 3 registry’s parallel tool_use dispatch. Multiple tool calls inside one agent and multiple agents inside one crew use the exact same primitive.

run_sequential is the chain variant: each agent’s output becomes the next agent’s input. Useful for pipelines where the second specialist refines or formats what the first produced. It is also exactly what Part 4’s state machine already does, with named goto-edges between nodes. The two crates compose: an Agent impl can drive a Graph<S>, and a Node<S> can dispatch to an Agent. You pick the model that fits the work.


Routing: letting the model pick

Two patterns drive almost every multi-agent system. Either the caller knows which specialist to ask (run one explicitly, or fan out to several), or the caller cannot tell and wants someone else to decide. The second case is what a router is for.

The crate ships a Router trait and a minimal KeywordRouter that picks the first agent whose name appears in the query. Real-world routers ask the model. The system prompt for a router looks something like:

You are a triage clerk. Given a user message and a list of specialists,
pick exactly one specialist by name. Reply with only the name, no other text.

The router shows the model a list of AgentDescriptions, gets back a name, and the crew dispatches. The Claude Code CLI uses this pattern in AgentTool: the parent agent reads the descriptions of the available subagents in its own system prompt and decides which subagent_type to invoke.

Two cost notes worth keeping in mind. A router is itself a model call. If your routing decision is between two specialists and the wrong choice costs less than the routing call, do not route. Call both in parallel and take the cheaper option. And the router decision is a hallucination opportunity like any other: keep the agent list short, make the descriptions sharp, and refuse silently when the model picks an agent that does not exist.


The debate protocol

When accuracy matters more than throughput, the right structure is sometimes two agents who disagree on purpose. A pro agent defends a position. A con agent attacks it. A judge reads the transcript and renders a verdict. The model that ends up answering the user is the judge, not either advocate.

pub async fn debate(
    pro: &dyn Agent,
    con: &dyn Agent,
    judge: Option<&dyn Agent>,
    topic: &str,
    rounds: u32,
) -> Result<String, CrewError>;

Each round, pro produces an argument; con reads it and produces a counter-argument; the transcript accumulates. After rounds exchanges, the optional judge reads the full transcript and gives a verdict. Without a judge, the function returns the transcript itself, useful when a human is doing the final weighing.

The debate protocol is the rough Rust analogue of Claude Code’s verification agent: a specialist whose entire job is to try to break what another agent produced, with a stricter prompt than the implementer and explicit failure-mode awareness. The verification agent’s system prompt opens with “Your job is not to confirm the implementation works; it’s to try to break it” and lists two documented failure modes the agent should resist. That hostility is the feature: a critic who agrees with you is no critic.

Pull the debate protocol out for high-stakes accuracy domains: anything legal, medical, or security-adjacent. Pull the simpler crew pattern out for everything else. Both are too expensive to use casually; both are worth their weight when the alternative is shipping wrong answers.


Composing with earlier crates

Crews do not replace skills or graphs; they sit on top. A reasonable production agent uses all three.

Crew
├── Agent: researcher
│   └── Skill Registry (Part 3): search, read_file, fetch
├── Agent: skeptic
│   └── Skill Registry (Part 3): search, fact-check
└── Agent: synthesizer
    └── Graph (Part 4): draft → review → revise

The crew dispatches to agents. Each agent internally drives Part 3’s skill registry to use tools. The synthesizer agent may itself run a Part 4 graph to draft, review, and revise its answer before returning it. Composition is the point. Each crate solves one layer; the agent is the stack of layers.


Eugene v0.5 in practice

The Part 5 gist runs the three-specialist crew on a single topic. The Researcher and Skeptic dispatch in parallel; the Synthesizer composes from both:

let calls = vec![
    ("researcher".to_string(), topic.clone()),
    ("skeptic".to_string(), topic.clone()),
];
let results = crew.run_parallel(&calls).await;

let mut researcher_out = String::new();
let mut skeptic_out = String::new();
for (name, result) in results {
    let text = result.map_err(|e| anyhow!("{name}: {e}"))?;
    match name.as_str() {
        "researcher" => researcher_out = text,
        "skeptic" => skeptic_out = text,
        other => bail!("unexpected agent: {other}"),
    }
}

let synthesis_input = format!(
    "Topic: {topic}\n\n--- Researcher's findings ---\n{researcher_out}\n\n\
     --- Skeptic's objections ---\n{skeptic_out}\n\nWrite the balanced answer."
);
let final_answer = crew.run("synthesizer", &synthesis_input).await?;

Run it on "Should I learn Rust in 2026?" and you get three sections: a researcher’s enthusiastic but specific list of reasons, a skeptic’s catalogue of real downsides, and a synthesizer’s calm one-paragraph answer that takes both into account. The same crew on "Should we add a Postgres dependency?" produces something completely different and equally useful. The agents do not need to know about each other beyond what the orchestrator passes between them.

The total wall-clock is the time of the slowest agent in the parallel phase, plus the synthesizer. Two and a half seconds for a question that, asked naively, takes a single agent five and produces a worse answer.


What this reveals

A crew is not a framework. It is a HashMap<String, Box<dyn Agent>> and a couple of dispatch methods. The interesting work is in the prompts of the specialists and in the decision of when to split a job in two. The orchestration is mechanical; the choice of who to put in the room is editorial.

Tokio makes the cost of parallelism approximately zero in code complexity. join_all is one function call. The cost of parallelism in dollars and latency is non-zero, which is why the warning at the top of the post mattered. The two thoughts hold together: use crews when the roles are genuinely distinct, then split them with confidence because the runtime support is cheap.

The same Agent trait covers four shapes of work without modification. A specialist agent talking to a model. A wrapper around a state machine. A wrapper around a single skill. A debate participant. Anything that takes a query and returns text is an Agent. The composition rules are: parallel for fan-out, sequential for refinement, route for dispatch, debate for accuracy. Four patterns, one trait.


What comes next

Every agent in this post talks to one provider, Anthropic. That works as long as you only ever want Claude. As soon as a user needs to swap in OpenAI for cost, Gemini for some specific capability, or Ollama for local inference during development, the hard-coded URL and headers in the HTTP code become a problem. Part 6 introduces eugene-providers: a Provider trait with adapters for the four major providers, a cost/latency comparison sidebar, and the realization that almost every meaningful difference between providers is at the edges of the wire format, not in the middle. The agents you wrote in Part 5 stay the same.


The workspace

The polished version of the crew runner lives in the workspace as the eugene-crew crate. Six unit tests cover dispatch by name, parallel execution, sequential threading, the keyword router, the debate protocol, and the unknown-agent error path. See eugene/crates/eugene-crew. The crate is deliberately small: nothing about the orchestration layer needs to be a framework, and most of the leverage comes from tokio::spawn, join_all, and a focused system prompt per specialist. ⭐ Star on GitHub



Found this useful?

If this post helped you, please:

  • Clap / share / bookmark it so more Rust + AI developers can find it;
  • Follow Mengshou Programming for weekly Rust / AI engineering notes;
  • Leave a comment with your questions about multi-agent design, routing strategy, or debate protocols;
  • Check out the AI programming assistant service to bring Claude Code-level productivity to your team.