Multi-Agent
Single-agent systems hit a ceiling. Context fills, tool sets conflict, and latency compounds when one model tries to do everything. Multi-agent architectures break a problem into scoped responsibilities — each handled by an agent with its own context, tools, and constraints. The pattern is not universally better. On sequential reasoning tasks, decomposition can degrade performance by 39-70%. Start with the failure modes of your single agent before you reach for a second one.
Memory Over Messaging
This is the single most important architectural decision for a multi-agent system. The instinct is to design chatty networks that mirror human collaboration — agents sending messages, negotiating, clarifying. That is slow, token-expensive, and brittle.
| Factor | Messaging | Shared State |
|---|---|---|
| Token cost | Duplicates context across every exchange | Each agent reads only what it needs |
| Latency | Sequential round-trips between agents | Agents operate independently on latest state |
| Debugging | Reconstruct from scattered conversation logs | Single source of truth, inspectable at any point |
| Fault tolerance | Lost message breaks the chain | Agent retries by re-reading current state |
| Scalability | O(n²) message paths | O(n) state readers/writers |
Agents read from and write to shared state — filesystem, session dict, graph state, database. They do not talk to each other; they talk to the state. Design around data flow, not conversation flow. Memory engineering — deciding what state to expose, when to persist it, how to scope access — matters more than optimizing inter-agent prompts.
Orchestration Patterns
Every multi-agent system is a variation or composition of four patterns.
| Pattern | One-line description | Implementations |
|---|---|---|
| Orchestrator/Workers | Central agent decomposes a task, delegates to specialists, synthesizes results. | Claude Code Task tool; OpenAI agent-as-tool; ADK AgentTool; LangGraph supervisor node |
| Handoff Chain | Agent A finishes its stage, then transfers full control to Agent B — no central coordinator. | OpenAI handoffs (first-class); ADK transfer_to_agent; LangGraph conditional edges over shared State |
| Parallel Fan-Out | Multiple agents run simultaneously on independent subtasks; a gather step merges outputs. | Claude Code --background; ADK ParallelAgent; CrewAI Process.parallel; LangGraph parallel branches with join |
| Peer Mesh | Agents discover and invoke each other directly without a central orchestrator. Most flexible, hardest to debug. | A2A protocol (framework-agnostic, HTTP + signed Agent Cards) |
Start with the simplest pattern that solves the problem. A single orchestrator with two workers handles most cases. Add complexity only when you observe bottlenecks, not in anticipation of them.
Orchestrator/Workers in action
sequenceDiagram
participant U as User
participant O as Orchestrator
participant W1 as Research Agent
participant W2 as Code Agent
participant W3 as Review Agent
U->>O: "Add OAuth to the API"
O->>W1: Explore auth patterns in codebase
W1-->>O: Found: session-based auth in src/auth/
O->>W2: Implement OAuth flow
W2-->>O: Created 3 files, updated 2
O->>W3: Review changes for security
W3-->>O: LGTM, 1 suggestion
O->>W2: Apply suggestion
W2-->>O: Done
O-->>U: OAuth implemented, PR ready
Delegation With Isolation
Each sub-agent needs strict boundaries defined at spawn time. Sharing a single context window, tool set, and permission level across agents defeats the purpose of decomposition.
- Own context window. Don’t inherit the parent’s full history. Pass only what’s relevant to the subtask.
- Restricted tool set. Explicit allowlist per sub-agent, deny by default. A research agent has no business writing to the filesystem; a deploy agent should not search the web.
- Defined return format. Sub-agents return structured data, not free-form prose. Output schemas make the orchestrator’s synthesis deterministic.
- Optional model override. Match the model to task complexity — a triage agent runs on a smaller, faster model; a reviewer benefits from a reasoning model.
Context isolation and model selection are the easy parts. Tool-registry isolation is the load-bearing one. A sub-agent with the full parent toolset is not a sub-agent — it is the parent under a different name, with a fresh context window and the same blast radius.
Communication Protocols
| Protocol | Scope | Mechanism | When to use |
|---|---|---|---|
| MCP | Agent ↔ tool | JSON-RPC 2.0 over stdio or Streamable HTTP. Universal tool discovery and invocation. | Giving any agent access to any tool. Does not handle agent-to-agent communication. |
| A2A | Agent ↔ agent | Agents publish signed Agent Cards at /.well-known/agent-card.json; peers discover and invoke over HTTP. v1.2, donated to the Linux Foundation June 2025, 150+ adopting organizations. | Peer Mesh, cross-framework or cross-org interop, long-running delegated tasks. |
| Shared State | Implicit coordination | Key-value store, filesystem, database, or graph state. Agents react to state changes. | Most multi-agent architectures (memory-over-messaging). Decoupled, easy to extend. |
| Agent-as-Tool | Parent → child | One agent calls another as a tool: sends a prompt, receives a result. Called agent has no autonomy. | Orchestrator/Workers inside a single framework. Familiar tool-call ergonomics. |
A2A task lifecycle
A2A defines a stateful task lifecycle for cross-agent work:
submitted → working → input-required → auth-required
→ completed | failed | canceled | rejected
input-required and auth-required are the states that make A2A more
than a remote function call — they let a delegated agent suspend, ask
its caller (or a human) for clarification or credentials, and resume.
This matters for long-running cross-org delegation where a task can sit
for hours waiting on a human signature or an OAuth approval.
Performance
Multi-agent architectures show roughly 45% faster problem resolution and 60% more accurate outcomes than equivalent single-agent implementations on suitable tasks, driven by specialization, parallelism, and review loops. Token efficiency improves 30-50% from smaller per-agent context windows. The caveat is sharp: on tasks that are fundamentally sequential — long chains of reasoning where each step depends on the last — decomposition can degrade performance by 39-70%. Multi-agent is task-dependent, not universally superior. If your problem is one long thought, one long thought is what it needs.
Choosing a Pattern
| If your problem… | Use… |
|---|---|
| Has independent subtasks | Parallel Fan-Out |
| Requires sequential stages | Handoff Chain |
| Needs centralized control and synthesis | Orchestrator/Workers |
| Spans multiple frameworks or organizations | Peer Mesh (A2A) |
| Is one long chain of reasoning | A single agent — don’t decompose |