Multi-Agent Collaboration Patterns
Six distinct multi-agent topologies — debate, voting, expert panel, hierarchical, swarm, and round-robin. Tradeoffs, failure modes, and when to use each pattern.
Why Multiple Agents?
Single agents fail at complex tasks because one system prompt can't hold every expertise. A legal contract review needs legal analysis, financial modeling, and technical assessment — three different mindsets. Multi-agent collaboration splits the work across specialized agents, each with its own system prompt and tools.
The question is not whether to use multiple agents, but how they should coordinate. The coordination pattern determines throughput, accuracy, cost, and failure behavior.
The Six Patterns
1. Debate
Agents argue opposing positions. A judge agent (or voting mechanism) selects the winner.
Agent A: "This contract clause is unfavorable because..."
Agent B: "This clause is standard for the industry because..."
Agent A: "That's incorrect — the liability cap is half the industry norm..."
Judge: "Agent A wins. The clause is unfavorable. Reasoning: ..."
Coordination: Turn-based discussion. Agents respond to each other's arguments. Judge evaluates the full debate transcript.
Best for: Complex reasoning with no clear ground truth. Fact-checking. Policy analysis. Quality assessment of open-ended outputs (ChatEval uses multi-agent debate for LLM evaluation).
Failure mode: Deadlock — agents never converge. Gish gallop — one agent overwhelms with volume, not quality. Judge bias — judge consistently favors one agent's style.
Cost: N agents × K rounds of debate + 1 judge call. Expensive for long debates.
2. Voting
Agents evaluate independently, then votes are aggregated. No interaction between agents.
Agent 1: "Approve — risk is acceptable"
Agent 2: "Approve — within policy limits"
Agent 3: "Reject — missing GDPR compliance clause"
Agent 4: "Approve — standard terms"
Agent 5: "Approve — precedent supports"
Result: 4/5 approve → APPROVED
Coordination: Parallel execution. Each agent receives the same input and returns a judgment. Vote aggregation can be majority, supermajority, weighted (senior agents get more votes), or unanimous.
Best for: High-stakes decisions requiring confidence. Safety checks. Content moderation. Loan approval. Medical diagnosis.
Failure mode: Herd behavior — all agents use the same model and make the same mistake. Low-quality agents dilute the vote. No deliberation — bad arguments go unchallenged.
Cost: N parallel LLM calls per decision. Lower latency than debate (parallel, not sequential).
3. Expert Panel
Specialized agents each analyze from their domain. A synthesizer merges findings into one output.
Legal Agent: "Clause 7 violates GDPR Article 17..."
Financial Agent: "The pricing model costs 15% more than market rate..."
Technical Agent: "SLA guarantees 99.5% uptime, which meets requirements..."
Synthesizer: "CONTRACT REVIEW: Three concerns identified. Legal: GDPR violation.
Financial: 15% above market. Technical: SLA acceptable."
Coordination: Parallel analysis → sequential synthesis. Each expert works independently on the same input. Synthesizer merges findings.
Best for: Multi-domain analysis. Contract review. M&A due diligence. Competitive analysis. Code review (security + performance + style).
Failure mode: Integration gaps — experts find issues the synthesizer can't reconcile. Conflicting recommendations. Expert blind spots — no agent checks another agent's domain.
Cost: N parallel expert calls + 1 synthesizer call. Most cost-efficient of the interactive patterns.
4. Hierarchical (Manager-Worker)
A manager decomposes the task, delegates to workers, validates results, and synthesizes.
Manager: "Task: Research carbon impact + estimate CO2 + write report."
"Step 1: Researcher — gather carbon impact data."
"Step 2: Coder — write CO2 estimation script using research."
"Step 3: Writer — produce executive summary from both outputs."
Worker 1: [Researches] → "Training GPT-3 emitted ~552 tons CO2eq..."
Worker 2: [Codes using Worker 1's data] → "estimate_co2.py"
Worker 3: [Writes using Workers 1+2 outputs] → "Executive Summary: ..."
Manager: "All workers done. Aggregating into final report."
Coordination: Top-down delegation. Manager controls task assignment, sequencing, and quality checks. Workers may receive context from prior workers.
Best for: Complex multi-step workflows with clear dependencies. Our Multi-Agent Orchestrator blueprint. CrewAI hierarchical process. Software development pipelines.
Failure mode: Manager bottleneck — manager becomes the single point of failure. Poor decomposition — manager misjudges which worker needs what. Workers drift — no peer review between workers.
Cost: 1 manager call + N worker calls + aggregation. Most expensive pattern but also most reliable for structured workflows.
5. Swarm / Handoff
Agents hand off control to each other dynamically. No central coordinator — each agent decides whether to handle the task or pass it to another.
User: "I was charged twice for my subscription."
Triage: "This is a billing issue. Handing off to Billing Agent."
Billing: "I see two charges on your account. Processing refund..."
User: "Also, the app crashes when I open it."
Billing: "That's a technical issue. Handing off to Technical Agent."
Technical: "What device and OS version are you using?"
Coordination: Tool-based routing. Each agent registers a handoff tool. Agents call handoff tools to transfer control. The runner framework (OpenAI Agents SDK, AutoGen Swarm) manages the transfer.
Best for: Customer support. Multi-domain task routing. Dynamic workflows where the next step is unpredictable.
Failure mode: Infinite loops — Agent A hands to B, B hands back to A. Wrong routing — triage sends billing question to tech support. Context loss — each handoff loses some conversation context.
Cost: Variable. Best case: 1 handoff, 2 agent calls. Worst case: N handoffs across M agents.
6. Round-Robin
Agents speak in a fixed order, each building on the previous agent's output.
Round 1: Agent A: "Here are 3 marketing angles for the product."
Agent B: "Angle 1 is strongest. Weakness in angle 2: ..."
Agent C: "Agreed on angle 1. Add personalization to angle 3."
Round 2: Agent A: "Refining all angles with B and C's feedback."
Agent B: "Angle 3 is now competitive. Final check: ..."
Agent C: "All three angles are strong. TERMINATE."
Coordination: Fixed rotation. Every agent gets a turn each round. Termination when an agent signals completion or max rounds reached.
Best for: Brainstorming. Multi-perspective review. Creative ideation. AutoGen's RoundRobinGroupChat.
Failure mode: Forced participation — every agent must speak, even with nothing to add. Slow — N agents × K rounds is N×K turns before a result. Padding — agents add filler to meet their turn.
Cost: Linear with agents × rounds. Most predictable cost but highest for simple tasks.
Decision Matrix
| Question | Pattern |
|---|---|
| Two agents disagree on a classification, need resolution? | Debate |
| High-stakes decision where you want redundant checks? | Voting |
| Multi-domain analysis (legal + financial + technical)? | Expert Panel |
| Complex multi-step workflow with clear dependencies? | Hierarchical |
| Dynamic routing where the next step is unpredictable? | Swarm/Handoff |
| Brainstorming or creative ideation needing multiple perspectives? | Round-Robin |
Combining Patterns
Production systems often nest patterns:
Outer: Hierarchical — Manager decomposes the project
Sub-workflow 1: Expert Panel — Contract review (legal + financial + technical)
Sub-workflow 2: Voting — Approve/reject the contract based on panel findings
Sub-workflow 3: Debating agents — If vote is split, debate to resolution
Final: Manager aggregates expert panel findings + vote outcome + debate resolution
Cost and Latency Comparison
Values: N calls, ~5-15s
Values: N+1 calls, ~5-20s
Values: N×K calls, ~10-60s
Values: N×K+1 calls, ~10-60s
Values: 1-N calls, ~3-30s
Values: N+2 calls, ~10-45s
Note:
Start with one agent. Every multi-agent pattern adds cost, latency, and failure modes. A single well-prompted agent with good tools solves 80% of tasks. Only add multiple agents when you can point to a specific failure that specialization would fix. The most common multi-agent mistake: using 3 agents for a task one agent could handle in half the time and cost.
Key Takeaway
The pattern follows the problem shape. Structured workflows with dependencies → Hierarchical. Multi-perspective analysis → Expert Panel. High-stakes decisions → Voting. Dynamic unpredictable routing → Swarm. Disagreement resolution → Debate. Ideation → Round-Robin. Don't pick a pattern first and force the problem to fit — let the problem's coordination needs dictate the topology.
Related Articles
Incident Runbook Agent Blueprint
AI agent that reads your on-call runbook, analyzes incident details, classifies severity, matches remediation steps, generates timelines, and drafts postmortems. Self-contained — works with markdown runbooks and pasted error logs.
Content Writer Agent Blueprint
Multi-step content creation agent with outline, research, draft, edit, and finalization stages. Includes grammar checking, tone adjustment, and SEO optimization tools.
Agent Evaluation & Benchmarking
How to measure agent performance — standard benchmarks (SWE-bench, AgentBench, WebArena), custom evaluation dimensions, trajectory scoring, and building an eval harness.