Seven teams. One architecture. No coordination.

Claude Code, OpenAI Codex, Gemini CLI, LangGraph, CrewAI, Google ADK, Amazon Bedrock — built by different companies, in different languages, under different constraints. They converged on the same design.

Not because they copied each other. Because the constraints are physics. Finite context windows. Tools that need a protocol. Safety that can’t depend on the model obeying. Tasks too complex for a single invocation. Any team that builds long enough arrives here.


Which kind of system are you building?

The patterns in this guide apply universally, but their weight depends on which seam in the agent ecosystem you’re working on. Read in the order that matches your problem.

If you are building…You care most about…Start with
A domain context substrate (an MCP server that gives any agent structured access to one domain: a codebase, a screen, a system)Deterministic extraction, fixed ontology, behavior contracts installed at the user’s project/tool-protocols, /instructions, /anti-patterns
A personal AI runtime (an agent that the user owns, that runs in the background, with long-running state)Memory architecture, compaction-resident state, hooks, scheduler-gated background work/memory, /enforcement, /multi-agent
A multi-agent shell (an orchestrator over other people’s agents, with chat-platform reach)Adapter patterns, isolated sub-agent tool registries, settings architecture, cost controls/multi-agent, /enforcement, /cost-management

These categories aren’t airtight — many systems blur them. But knowing which one is your load-bearing concern keeps you from over-applying patterns that don’t fit your seam.


The 8 Postulates

These are not suggestions. They are the load-bearing walls of every production agentic system. Violate them and you will rediscover why they exist.

#PostulateWhat to do
1Start with a persistent instruction fileCreate a CLAUDE.md, AGENTS.md, or GEMINI.md before writing any agent config. Cover conventions, stack, testing, git, and security. Keep it under 200 lines.
2Enforce safety outside the promptPut style preferences in the instruction file. Put linting in hooks. Put destructive command blocking in permissions. Never rely on the model remembering a safety rule.
3Budget your context windowReserve 10-15% for instructions, 30-40% for conversation, 20-30% for tool results. Compact at 70%. Clear at 80%. Separate cacheable content from compactable content.
4Build tools on MCPUse .mcp.json for tool connections. 97M+ downloads/month across every major platform. If you need agent-to-agent communication across systems, add A2A — but start with MCP.
5Coordinate through shared stateWithin a system, agents read from and write to shared state — not messages to each other. Between systems or organizations, use messaging protocols (A2A). Default to state; reach for messaging only when you must.
6Decompose before you hit the cliffAgent coherence degrades after extended sessions. The threshold moves with each model generation. Don’t find the limit — stay well under it. Break work into sub-tasks that complete in the safe zone.
7Track cost per task from day oneSet token budgets per session. Route simple work to cheap models. Cache stable prompts. Monitor with alerts at 50%, 75%, and 90% of budget. Cost management is infrastructure, not optimization.
8Add complexity in weekly incrementsWeek 1: instruction file. Week 2: hooks. Week 3: MCP tools. Week 4: skills. Month 2+: sub-agents. If your team has distributed systems experience, you can move faster — but still validate each layer before adding the next.

The Architecture

graph TD
    A["<b>Instruction Layer</b><br/>CLAUDE.md · AGENTS.md · GEMINI.md<br/><i>user → project → directory (most specific wins)</i>"] --> B
    B["<b>Settings Layer</b><br/>settings.json · config.toml<br/><i>Permissions, hooks, env vars</i>"] --> C
    C["<b>Tool Registry — MCP</b><br/>.mcp.json<br/><i>stdio (local) · http (remote)</i>"] --> D

    subgraph loop ["Agent Execution Loop"]
        D["Input"] --> E["Pre-Hooks"]
        E -->|"BLOCK if gate fails"| F["Reasoning"]
        F --> G["Tool Selection"]
        G --> H["Tool Hooks"]
        H -->|"BLOCK if denied"| I["Execution"]
        I --> J["Post-Hooks"]
        J --> K{"Continue?"}
        K -->|Yes| F
        K -->|No| L["Output"]
    end

    L --> M1
    L --> M2
    L --> M3

    subgraph ext ["Extensions"]
        M1["Skills<br/><i>reusable prompts</i>"]
        M2["Subagents<br/><i>bounded contexts</i>"]
        M3["Memory<br/><i>state, checkpoints</i>"]
    end

Who This Is For

RoleWhat you get
Agent developersPatterns for instruction files, hooks, MCP tools, and context management.
Platform engineersMulti-agent architecture, shared state, delegation, and cost controls.
Infrastructure teamsObservability, token accounting, safety enforcement, and production runbooks.
Engineering managersAdoption roadmaps, cost models, and risk frameworks.

Reading Order

SectionKey questions answered
PromptWhat does the agent read at session start? What does the harness compile around it?
ControlHow do you bind the agent’s behavior outside the prompt?
ContextWhat does the agent remember? How do multiple agents coordinate?
InterfaceHow does the agent talk to tools, code, the web, and editors?
OperateHow do you run it in production — cost, observability, credentials, lifecycle?
Anti-PatternsWhat failure looks like — named and citable.

First agent? Start with Prompt → Control. Skip Context until one agent works reliably.

Scaling? Jump to Context and Operate. That’s where the failure modes live.