Enforcement
The agent reads guidelines. The harness runs rules. The operating system enforces gates. Three tiers, three reliability classes, each with its own failure mode. “Guidelines / Rules / Gates” is our conceptual framing, not canonical industry terminology — OpenAI calls the middle tier Guardrails, Google ADK calls them Callbacks, LangChain calls them Middleware, Claude Code calls them Hooks. The shape underneath is the same.
Why three tiers
An instruction file is the obvious starting point and the obvious ceiling. Soft instructions decay in four predictable ways:
- Compaction discards them. When the context window fills, compaction can reduce 200 lines of rules to a sentence — or drop them entirely.
- Recency wins. Session-start instructions compete with hundreds of recent tool results; models attend more to fresh tokens.
- The agent reasons around them. Given a constraint and a goal in natural language, the model can find a path that satisfies the goal while technically obeying the constraint.
- Costly rules get dropped first. Running tests, reading before editing, verbose commits — the rules that matter most for quality.
graph LR
subgraph g ["Guideline"]
G["Instruction file<br/><i>Soft — agent tries to follow</i>"]
end
subgraph r ["Rule"]
R["Hooks / callbacks / middleware<br/><i>Deterministic — always fires</i>"]
end
subgraph ga ["Gate"]
GA["Permissions / sandbox<br/><i>Hard block — cannot bypass</i>"]
end
G -->|"agent forgets"| R -->|"hook misconfigured"| GA
style g fill:#eef2ff,stroke:#c7d2fe
style r fill:#fefce8,stroke:#fde68a
style ga fill:#fef2f2,stroke:#fecaca
Each arrow is a failure mode the next tier catches. Layers are not alternatives — they are insurance against the layer above.
Tier 1: Guidelines
Natural-language instructions the agent reads at session start and tries to follow. The most expressive tier; the weakest.
| Platform | Mechanism |
|---|---|
| Claude Code | CLAUDE.md (project + ~/.claude/CLAUDE.md global) |
| OpenAI Codex | AGENTS.md / system prompts in the Agents SDK |
| Gemini CLI | GEMINI.md (project + ~/.gemini/GEMINI.md global) |
| Google ADK | System instructions in the Agent constructor |
| LangGraph | System prompts in graph-node configuration |
A guideline followed 95% of the time is fine for “prefer named exports.” It is not fine for “never delete the production database.” See Instructions for authoring discipline.
Tier 2: Rules
Scripts or functions that execute automatically at lifecycle points. They run outside the model — in the host process, harness, or middleware — and fire regardless of what the agent decides.
graph TD
A["User input"] --> B["UserPromptSubmit hook"]
B --> C["Agent reasoning"]
C --> D["Tool selection"]
D --> E["PreToolUse hook"]
E --> F{"permissionDecision"}
F -->|"allow"| G["Tool execution"]
F -->|"deny / ask"| C
G --> H["PostToolUse hook"]
H --> I["Agent response"]
I --> J{"Continue?"}
J -->|"Yes"| C
J -->|"No"| K["Stop hook → output"]
Cross-platform event mapping
Claude Code’s hook event list now exceeds 25 (SessionStart, UserPromptSubmit, PreToolUse, PermissionRequest, PermissionDenied, PostToolUse, PostToolBatch, SubagentStart, SubagentStop, PreCompact, PostCompact, WorktreeCreate, InstructionsLoaded, ConfigChange, and more). The table covers the load-bearing equivalents.
| Concept | Claude Code | OpenAI Agents SDK | Google ADK | LangChain v1 |
|---|---|---|---|---|
| Before tool | PreToolUse | Tool guardrails (input) | before_tool_callback | before_model / wrap_tool_call |
| After tool | PostToolUse | Tool guardrails (output) | after_tool_callback | after_model / wrap_tool_call |
| Session start | SessionStart | Run init | before_agent_callback | before_agent |
| Permission ask | PermissionRequest | Approval policy | Action confirmations | HITL nodes |
| Completion | Stop / SubagentStop | Run completion | after_agent_callback | after_agent |
| Context overflow | PreCompact / PostCompact | Compaction API | (per-agent state) | State summarization |
ADK 2.0 (Python Beta, released 2026-05-19) packages reusable callback bundles as Plugins applied across workflows. LangChain v1.0 (Oct 2025) treats middleware as the supported entry via create_agent — both node-style (before_model, after_model) and wrap-style (wrap_model_call, wrap_tool_call).
Four hook types
Four flavors, ordered by complexity and latency. They default to fail-open; set on_failure: "fail-closed" for security-critical hooks.
Command — shell script reads JSON on stdin, returns an exit code.
{
"hooks": {
"PreToolUse": [
{ "type": "command",
"command": "/usr/local/bin/check-tool-safety.sh",
"timeout_ms": 5000 }
]
}
}
HTTP — event POSTed to an endpoint, response JSON drives the decision.
{
"hooks": {
"PreToolUse": [
{ "type": "http",
"url": "https://policy.internal.corp/v1/evaluate",
"method": "POST",
"timeout_ms": 10000 }
]
}
}
LLM (prompt) — a second model evaluates the action when the call requires judgment beyond regex.
{
"hooks": {
"PreToolUse": [
{ "type": "prompt",
"model": "claude-sonnet-4-6",
"prompt": "Security reviewer. Return {\"decision\":\"allow\"} or {\"decision\":\"block\",\"reason\":...}.",
"timeout_ms": 30000 }
]
}
}
Agent — a full sub-agent with its own tools validates the action (e.g. inspect schema + data before approving a migration).
{
"hooks": {
"PreToolUse": [
{ "type": "agent",
"agent": { "model": "claude-sonnet-4-6",
"system": "Database migration safety reviewer.",
"tools": ["Read", "Bash"], "max_turns": 5 },
"timeout_ms": 120000 }
]
}
}
Input/output protocol
Hooks receive JSON on stdin: session_id, transcript_path, cwd, permission_mode, hook_event_name, effort.level, plus event-specific fields (e.g. tool_input). Exit codes carry the basic verdict: 0 = success (stdout JSON processed), 2 = blocking error (stderr fed back to the agent — only for blockable events: PreToolUse, PermissionRequest, UserPromptSubmit, Stop, SubagentStop, PostToolBatch, ConfigChange, PreCompact, WorktreeCreate). Other non-zero codes are non-blocking.
Two JSON output shapes coexist; confusing them is the most common hook bug.
PreToolUse uses the nested shape with hookSpecificOutput.permissionDecision:
{
"hookSpecificOutput": {
"permissionDecision": "deny",
"permissionDecisionReason": "Force-push to main is prohibited.",
"modifiedToolInput": null
}
}
Valid values: allow, deny, ask, defer. PermissionRequest uses a similar nested hookSpecificOutput.decision object with {behavior, updatedInput, updateRules}.
All other events use the flat shape with a top-level decision:
{
"decision": "block",
"reason": "Tests failed; do not commit.",
"hookSpecificOutput": {
"additionalContext": "3 of 47 tests failing in auth/login.test.ts"
}
}
There is no suggestions field — the current protocol passes advice back via hookSpecificOutput.additionalContext. Two universal escape hatches override everything else: continue: false halts the agent entirely (outranks any per-event decision), and suppressOutput: true keeps hook output out of the transcript.
Matchers
Matchers are regex filters that decide when a hook fires. All matcher fields must match (logical AND); for OR, define multiple entries.
{ "matcher": { "tool_name": "^(Bash|Write)$" },
"type": "command",
"command": "bash hooks/block-destructive.sh" }
Practical example: block destructive commands
One hook, one job — refuse force-pushes to main and destructive production database operations before they execute.
#!/usr/bin/env bash
# hooks/block-destructive.sh
set -euo pipefail
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
[[ "$TOOL_NAME" != "Bash" ]] && exit 0
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command')
if echo "$COMMAND" | grep -qP 'git\s+push\s+.*--force.*\b(main|master)\b'; then
jq -n '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:"Force-push to main/master is not allowed."}}'
exit 2
fi
if echo "$COMMAND" | grep -qPi '(DROP|TRUNCATE|DELETE\s+FROM)\s.*(prod|production)'; then
jq -n '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:"Destructive operations on production databases are prohibited."}}'
exit 2
fi
exit 0
Same pattern — read stdin, branch on tool name, emit a permissionDecision, exit 2 — generalises to auto-formatters and secret scanners. Different content, identical shape.
Tier 3: Gates
Technical barriers enforced by the operating system, network, or platform runtime. Where a rule is a script that decides, a gate is a capability removed. If the agent has no write access to /etc/, no creative tool argument changes that. Settings configure gates declaratively.
Four-tier scope
Settings resolve through four layers, each with a distinct owner.
| Scope | Shared? | Path | Purpose |
|---|---|---|---|
| Managed (Org) | Admin-controlled | /Library/Application Support/ClaudeCode/managed-settings.json (+ managed-settings.d/) on macOS; /etc/claude-code/ on Linux/WSL; MDM/registry on Windows | Enterprise policy, locked keys |
| User (Global) | No | ~/.claude/settings.json | Personal defaults across all projects |
| Project (Team) | Yes (committed) | .claude/settings.json | Team-agreed permissions, hooks |
| Local (Personal) | No (gitignored) | .claude/settings.local.json | Machine-specific overrides, secrets |
The managed-settings.d/ drop-in directory merges files alphabetically, letting a security team ship 00-network.json, 10-permissions.json, 20-audit.json as modular policy fragments instead of one monolith.
Precedence
graph TD
A["User (Global)<br/><i>personal defaults</i>"] --> B["Project (Team)<br/><i>shared overrides</i>"]
B --> C["Local (Personal)<br/><i>final say on this machine</i>"]
C --> D["CLI args<br/><i>per-invocation</i>"]
D --> E["Managed (Org)<br/><i>locks keys — wins on locked keys</i>"]
Local wins on most keys. Managed wins on any key marked as locked — how an org disables outbound network or pins audit logging while individuals still choose their model. Array fields (permissions.allow[], hooks, enabledMcpjsonServers) concatenate and dedup across layers rather than override. model and outputStyle require a restart; everything else hot-reloads and fires ConfigChange.
Project settings example
{
"permissions": {
"allow": [
"Read", "Edit", "Write",
"Bash(npm run lint:*)",
"Bash(npm run test:*)",
"Bash(npx tsc:*)",
"Bash(git add:*)", "Bash(git commit:*)", "Bash(git diff:*)",
"mcp__github__create_pull_request"
],
"deny": [
"Bash(rm -rf:*)",
"Bash(git push:*)",
"Bash(curl:*)", "Bash(wget:*)"
]
},
"hooks": {
"PreToolUse": [
{ "matcher": { "tool_name": "^Bash$" },
"type": "command",
"command": "bash .claude/hooks/block-destructive.sh" }
],
"PostToolUse": [
{ "matcher": { "tool_name": "^(Write|Edit)$" },
"type": "command",
"command": "bash .claude/hooks/auto-format.sh" }
]
},
"env": { "NODE_ENV": "development" }
}
Canonical permission syntax is Tool(pattern) with a colon for “any args”: Bash(npm test:*) matches npm test, npm test --watch, npm test foo. Space-separated Bash(ls *) matches ls -la but not lsof. Evaluation: deny → ask → allow → defaultMode, most specific rule wins over array order.
What goes where
| Setting | Tier | Why |
|---|---|---|
| Permission allow/deny lists | Project | Team-agreed surface; everyone on the repo gets the same. |
| Hook definitions | Project | Deterministic automation the whole team benefits from. |
API keys, secrets, DATABASE_URL | Local | Gitignored by default; never leaves the machine. |
| Model, temperature, cost limits | User or Local | Personal choice; one developer’s smaller-model preference shouldn’t pin the team. |
| Audit logging, network restrictions, locked keys | Managed | Non-negotiable org policy with managed-settings.d/ for modular delivery. |
Context impact
The split between instruction files and settings is the load-bearing decision.
| Instruction files | Settings files | |
|---|---|---|
| Examples | CLAUDE.md, AGENTS.md | settings.json, config.toml |
| Consumed as context? | Yes — injected into the prompt | No — parsed by the harness |
| Token cost | Proportional to file size | Zero |
| Agent visibility | Agent reads and follows | Agent does not see them directly |
| Enforcement | Soft — model may deviate | Hard — harness enforces mechanically |
Writing “never run rm -rf” in CLAUDE.md costs tokens and trusts the model. Putting Bash(rm -rf:*) in deny costs zero tokens and is deterministic. If a behavior can be expressed as a gate or trigger, it belongs in settings — not the prompt.
Layering: “never push to main”
The tiers are layers, not alternatives. Each catches what the layer above missed.
Guideline. CLAUDE.md says “Create a feature branch and open a pull request.” Handles the 90% case where the agent has fresh context and a cooperative goal.
Rule. A PreToolUse hook on Bash matches git push.*main and emits permissionDecision: "deny" with a reason injected back into context. Catches the cases where compaction dropped the guideline or the agent reasoned around it.
Gate. Repository branch protection requires a PR with approval before merging to main. Even if the guideline failed and the hook was misconfigured, the remote rejects the push at the protocol level.
Three independent failures must coincide before damage occurs. That is the point.
Decision principle
If a violation is measured in inconvenience, use a guideline. If it is measured in time spent debugging or reverting, use a rule. If it is measured in incidents — security breach, data loss, downtime — use a gate. The cost of getting it wrong is asymmetric: a guideline where a gate belonged is a future incident; a gate where a guideline belonged is a friction tax. See Sandboxing for OS-level gate mechanics and Anti-Patterns for the failure mode where load-bearing logic ends up in a prompt.
Regulatory context
The EU AI Act entered into force 1 Aug 2024; prohibitions and AI-literacy obligations are live since 2 Feb 2025, GPAI provider obligations and penalties since 2 Aug 2025, and the bulk of high-risk system rules plus full penalty enforcement land on 2 Aug 2026. The OWASP Top 10 for Agentic Applications (released 9 Dec 2025, the 2026 edition) maps directly onto enforcement controls: ASI01 (Agent Goal Hijack), ASI02 (Tool Misuse), ASI03 (Identity & Privilege Abuse), and ASI05 (Unexpected Code Execution) are each mitigated by some combination of PreToolUse rules and permission gates — the three-tier model is how those controls get implemented.