Multi-Agent

Single-agent systems hit a ceiling. Context fills, tool sets conflict, and latency compounds when one model tries to do everything. Multi-agent architectures break a problem into scoped responsibilities — each handled by an agent with its own context, tools, and constraints. The pattern is not universally better. On sequential reasoning tasks, decomposition can degrade performance by 39-70%. Start with the failure modes of your single agent before you reach for a second one.

Memory Over Messaging

This is the single most important architectural decision for a multi-agent system. The instinct is to design chatty networks that mirror human collaboration — agents sending messages, negotiating, clarifying. That is slow, token-expensive, and brittle.

Factor	Messaging	Shared State
Token cost	Duplicates context across every exchange	Each agent reads only what it needs
Latency	Sequential round-trips between agents	Agents operate independently on latest state
Debugging	Reconstruct from scattered conversation logs	Single source of truth, inspectable at any point
Fault tolerance	Lost message breaks the chain	Agent retries by re-reading current state
Scalability	O(n²) message paths	O(n) state readers/writers

Agents read from and write to shared state — filesystem, session dict, graph state, database. They do not talk to each other; they talk to the state. Design around data flow, not conversation flow. Memory engineering — deciding what state to expose, when to persist it, how to scope access — matters more than optimizing inter-agent prompts.

Orchestration Patterns

Every multi-agent system is a variation or composition of four patterns.

Pattern	One-line description	Implementations
Orchestrator/Workers	Central agent decomposes a task, delegates to specialists, synthesizes results.	Claude Code `Task` tool; OpenAI agent-as-tool; ADK `AgentTool`; LangGraph supervisor node
Handoff Chain	Agent A finishes its stage, then transfers full control to Agent B — no central coordinator.	OpenAI `handoffs` (first-class); ADK `transfer_to_agent`; LangGraph conditional edges over shared `State`
Parallel Fan-Out	Multiple agents run simultaneously on independent subtasks; a gather step merges outputs.	Claude Code `--background`; ADK `ParallelAgent`; CrewAI `Process.parallel`; LangGraph parallel branches with join
Peer Mesh	Agents discover and invoke each other directly without a central orchestrator. Most flexible, hardest to debug.	A2A protocol (framework-agnostic, HTTP + signed Agent Cards)

Start with the simplest pattern that solves the problem. A single orchestrator with two workers handles most cases. Add complexity only when you observe bottlenecks, not in anticipation of them.

Orchestrator/Workers in action

sequenceDiagram
    participant U as User
    participant O as Orchestrator
    participant W1 as Research Agent
    participant W2 as Code Agent
    participant W3 as Review Agent

    U->>O: "Add OAuth to the API"
    O->>W1: Explore auth patterns in codebase
    W1-->>O: Found: session-based auth in src/auth/
    O->>W2: Implement OAuth flow
    W2-->>O: Created 3 files, updated 2
    O->>W3: Review changes for security
    W3-->>O: LGTM, 1 suggestion
    O->>W2: Apply suggestion
    W2-->>O: Done
    O-->>U: OAuth implemented, PR ready

Delegation With Isolation

Each sub-agent needs strict boundaries defined at spawn time. Sharing a single context window, tool set, and permission level across agents defeats the purpose of decomposition.

Own context window. Don’t inherit the parent’s full history. Pass only what’s relevant to the subtask.
Restricted tool set. Explicit allowlist per sub-agent, deny by default. A research agent has no business writing to the filesystem; a deploy agent should not search the web.
Defined return format. Sub-agents return structured data, not free-form prose. Output schemas make the orchestrator’s synthesis deterministic.
Optional model override. Match the model to task complexity — a triage agent runs on a smaller, faster model; a reviewer benefits from a reasoning model.

Context isolation and model selection are the easy parts. Tool-registry isolation is the load-bearing one. A sub-agent with the full parent toolset is not a sub-agent — it is the parent under a different name, with a fresh context window and the same blast radius.

Communication Protocols

Protocol	Scope	Mechanism	When to use
MCP	Agent ↔ tool	JSON-RPC 2.0 over stdio or Streamable HTTP. Universal tool discovery and invocation.	Giving any agent access to any tool. Does not handle agent-to-agent communication.
A2A	Agent ↔ agent	Agents publish signed Agent Cards at `/.well-known/agent-card.json`; peers discover and invoke over HTTP. v1.2, donated to the Linux Foundation June 2025, 150+ adopting organizations.	Peer Mesh, cross-framework or cross-org interop, long-running delegated tasks.
Shared State	Implicit coordination	Key-value store, filesystem, database, or graph state. Agents react to state changes.	Most multi-agent architectures (memory-over-messaging). Decoupled, easy to extend.
Agent-as-Tool	Parent → child	One agent calls another as a tool: sends a prompt, receives a result. Called agent has no autonomy.	Orchestrator/Workers inside a single framework. Familiar tool-call ergonomics.

A2A task lifecycle

A2A defines a stateful task lifecycle for cross-agent work:

submitted → working → input-required → auth-required
                    → completed | failed | canceled | rejected

input-required and auth-required are the states that make A2A more than a remote function call — they let a delegated agent suspend, ask its caller (or a human) for clarification or credentials, and resume. This matters for long-running cross-org delegation where a task can sit for hours waiting on a human signature or an OAuth approval.

Performance

Multi-agent architectures show roughly 45% faster problem resolution and 60% more accurate outcomes than equivalent single-agent implementations on suitable tasks, driven by specialization, parallelism, and review loops. Token efficiency improves 30-50% from smaller per-agent context windows. The caveat is sharp: on tasks that are fundamentally sequential — long chains of reasoning where each step depends on the last — decomposition can degrade performance by 39-70%. Multi-agent is task-dependent, not universally superior. If your problem is one long thought, one long thought is what it needs.

Choosing a Pattern

If your problem…	Use…
Has independent subtasks	Parallel Fan-Out
Requires sequential stages	Handoff Chain
Needs centralized control and synthesis	Orchestrator/Workers
Spans multiple frameworks or organizations	Peer Mesh (A2A)
Is one long chain of reasoning	A single agent — don’t decompose