Enforcement

The agent reads guidelines. The harness runs rules. The operating system enforces gates. Three tiers, three reliability classes, each with its own failure mode. “Guidelines / Rules / Gates” is our conceptual framing, not canonical industry terminology — OpenAI calls the middle tier Guardrails, Google ADK calls them Callbacks, LangChain calls them Middleware, Claude Code calls them Hooks. The shape underneath is the same.


Why three tiers

An instruction file is the obvious starting point and the obvious ceiling. Soft instructions decay in four predictable ways:

graph LR
    subgraph g ["Guideline"]
        G["Instruction file<br/><i>Soft — agent tries to follow</i>"]
    end
    subgraph r ["Rule"]
        R["Hooks / callbacks / middleware<br/><i>Deterministic — always fires</i>"]
    end
    subgraph ga ["Gate"]
        GA["Permissions / sandbox<br/><i>Hard block — cannot bypass</i>"]
    end
    G -->|"agent forgets"| R -->|"hook misconfigured"| GA
    style g fill:#eef2ff,stroke:#c7d2fe
    style r fill:#fefce8,stroke:#fde68a
    style ga fill:#fef2f2,stroke:#fecaca

Each arrow is a failure mode the next tier catches. Layers are not alternatives — they are insurance against the layer above.


Tier 1: Guidelines

Natural-language instructions the agent reads at session start and tries to follow. The most expressive tier; the weakest.

PlatformMechanism
Claude CodeCLAUDE.md (project + ~/.claude/CLAUDE.md global)
OpenAI CodexAGENTS.md / system prompts in the Agents SDK
Gemini CLIGEMINI.md (project + ~/.gemini/GEMINI.md global)
Google ADKSystem instructions in the Agent constructor
LangGraphSystem prompts in graph-node configuration

A guideline followed 95% of the time is fine for “prefer named exports.” It is not fine for “never delete the production database.” See Instructions for authoring discipline.


Tier 2: Rules

Scripts or functions that execute automatically at lifecycle points. They run outside the model — in the host process, harness, or middleware — and fire regardless of what the agent decides.

graph TD
    A["User input"] --> B["UserPromptSubmit hook"]
    B --> C["Agent reasoning"]
    C --> D["Tool selection"]
    D --> E["PreToolUse hook"]
    E --> F{"permissionDecision"}
    F -->|"allow"| G["Tool execution"]
    F -->|"deny / ask"| C
    G --> H["PostToolUse hook"]
    H --> I["Agent response"]
    I --> J{"Continue?"}
    J -->|"Yes"| C
    J -->|"No"| K["Stop hook → output"]

Cross-platform event mapping

Claude Code’s hook event list now exceeds 25 (SessionStart, UserPromptSubmit, PreToolUse, PermissionRequest, PermissionDenied, PostToolUse, PostToolBatch, SubagentStart, SubagentStop, PreCompact, PostCompact, WorktreeCreate, InstructionsLoaded, ConfigChange, and more). The table covers the load-bearing equivalents.

ConceptClaude CodeOpenAI Agents SDKGoogle ADKLangChain v1
Before toolPreToolUseTool guardrails (input)before_tool_callbackbefore_model / wrap_tool_call
After toolPostToolUseTool guardrails (output)after_tool_callbackafter_model / wrap_tool_call
Session startSessionStartRun initbefore_agent_callbackbefore_agent
Permission askPermissionRequestApproval policyAction confirmationsHITL nodes
CompletionStop / SubagentStopRun completionafter_agent_callbackafter_agent
Context overflowPreCompact / PostCompactCompaction API(per-agent state)State summarization

ADK 2.0 (Python Beta, released 2026-05-19) packages reusable callback bundles as Plugins applied across workflows. LangChain v1.0 (Oct 2025) treats middleware as the supported entry via create_agent — both node-style (before_model, after_model) and wrap-style (wrap_model_call, wrap_tool_call).

Four hook types

Four flavors, ordered by complexity and latency. They default to fail-open; set on_failure: "fail-closed" for security-critical hooks.

Command — shell script reads JSON on stdin, returns an exit code.

{
  "hooks": {
    "PreToolUse": [
      { "type": "command",
        "command": "/usr/local/bin/check-tool-safety.sh",
        "timeout_ms": 5000 }
    ]
  }
}

HTTP — event POSTed to an endpoint, response JSON drives the decision.

{
  "hooks": {
    "PreToolUse": [
      { "type": "http",
        "url": "https://policy.internal.corp/v1/evaluate",
        "method": "POST",
        "timeout_ms": 10000 }
    ]
  }
}

LLM (prompt) — a second model evaluates the action when the call requires judgment beyond regex.

{
  "hooks": {
    "PreToolUse": [
      { "type": "prompt",
        "model": "claude-sonnet-4-6",
        "prompt": "Security reviewer. Return {\"decision\":\"allow\"} or {\"decision\":\"block\",\"reason\":...}.",
        "timeout_ms": 30000 }
    ]
  }
}

Agent — a full sub-agent with its own tools validates the action (e.g. inspect schema + data before approving a migration).

{
  "hooks": {
    "PreToolUse": [
      { "type": "agent",
        "agent": { "model": "claude-sonnet-4-6",
                   "system": "Database migration safety reviewer.",
                   "tools": ["Read", "Bash"], "max_turns": 5 },
        "timeout_ms": 120000 }
    ]
  }
}

Input/output protocol

Hooks receive JSON on stdin: session_id, transcript_path, cwd, permission_mode, hook_event_name, effort.level, plus event-specific fields (e.g. tool_input). Exit codes carry the basic verdict: 0 = success (stdout JSON processed), 2 = blocking error (stderr fed back to the agent — only for blockable events: PreToolUse, PermissionRequest, UserPromptSubmit, Stop, SubagentStop, PostToolBatch, ConfigChange, PreCompact, WorktreeCreate). Other non-zero codes are non-blocking.

Two JSON output shapes coexist; confusing them is the most common hook bug.

PreToolUse uses the nested shape with hookSpecificOutput.permissionDecision:

{
  "hookSpecificOutput": {
    "permissionDecision": "deny",
    "permissionDecisionReason": "Force-push to main is prohibited.",
    "modifiedToolInput": null
  }
}

Valid values: allow, deny, ask, defer. PermissionRequest uses a similar nested hookSpecificOutput.decision object with {behavior, updatedInput, updateRules}.

All other events use the flat shape with a top-level decision:

{
  "decision": "block",
  "reason": "Tests failed; do not commit.",
  "hookSpecificOutput": {
    "additionalContext": "3 of 47 tests failing in auth/login.test.ts"
  }
}

There is no suggestions field — the current protocol passes advice back via hookSpecificOutput.additionalContext. Two universal escape hatches override everything else: continue: false halts the agent entirely (outranks any per-event decision), and suppressOutput: true keeps hook output out of the transcript.

Matchers

Matchers are regex filters that decide when a hook fires. All matcher fields must match (logical AND); for OR, define multiple entries.

{ "matcher": { "tool_name": "^(Bash|Write)$" },
  "type": "command",
  "command": "bash hooks/block-destructive.sh" }

Practical example: block destructive commands

One hook, one job — refuse force-pushes to main and destructive production database operations before they execute.

#!/usr/bin/env bash
# hooks/block-destructive.sh
set -euo pipefail

INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
[[ "$TOOL_NAME" != "Bash" ]] && exit 0

COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command')

if echo "$COMMAND" | grep -qP 'git\s+push\s+.*--force.*\b(main|master)\b'; then
  jq -n '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:"Force-push to main/master is not allowed."}}'
  exit 2
fi

if echo "$COMMAND" | grep -qPi '(DROP|TRUNCATE|DELETE\s+FROM)\s.*(prod|production)'; then
  jq -n '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:"Destructive operations on production databases are prohibited."}}'
  exit 2
fi

exit 0

Same pattern — read stdin, branch on tool name, emit a permissionDecision, exit 2 — generalises to auto-formatters and secret scanners. Different content, identical shape.


Tier 3: Gates

Technical barriers enforced by the operating system, network, or platform runtime. Where a rule is a script that decides, a gate is a capability removed. If the agent has no write access to /etc/, no creative tool argument changes that. Settings configure gates declaratively.

Four-tier scope

Settings resolve through four layers, each with a distinct owner.

ScopeShared?PathPurpose
Managed (Org)Admin-controlled/Library/Application Support/ClaudeCode/managed-settings.json (+ managed-settings.d/) on macOS; /etc/claude-code/ on Linux/WSL; MDM/registry on WindowsEnterprise policy, locked keys
User (Global)No~/.claude/settings.jsonPersonal defaults across all projects
Project (Team)Yes (committed).claude/settings.jsonTeam-agreed permissions, hooks
Local (Personal)No (gitignored).claude/settings.local.jsonMachine-specific overrides, secrets

The managed-settings.d/ drop-in directory merges files alphabetically, letting a security team ship 00-network.json, 10-permissions.json, 20-audit.json as modular policy fragments instead of one monolith.

Precedence

graph TD
    A["User (Global)<br/><i>personal defaults</i>"] --> B["Project (Team)<br/><i>shared overrides</i>"]
    B --> C["Local (Personal)<br/><i>final say on this machine</i>"]
    C --> D["CLI args<br/><i>per-invocation</i>"]
    D --> E["Managed (Org)<br/><i>locks keys — wins on locked keys</i>"]

Local wins on most keys. Managed wins on any key marked as locked — how an org disables outbound network or pins audit logging while individuals still choose their model. Array fields (permissions.allow[], hooks, enabledMcpjsonServers) concatenate and dedup across layers rather than override. model and outputStyle require a restart; everything else hot-reloads and fires ConfigChange.

Project settings example

{
  "permissions": {
    "allow": [
      "Read", "Edit", "Write",
      "Bash(npm run lint:*)",
      "Bash(npm run test:*)",
      "Bash(npx tsc:*)",
      "Bash(git add:*)", "Bash(git commit:*)", "Bash(git diff:*)",
      "mcp__github__create_pull_request"
    ],
    "deny": [
      "Bash(rm -rf:*)",
      "Bash(git push:*)",
      "Bash(curl:*)", "Bash(wget:*)"
    ]
  },
  "hooks": {
    "PreToolUse": [
      { "matcher": { "tool_name": "^Bash$" },
        "type": "command",
        "command": "bash .claude/hooks/block-destructive.sh" }
    ],
    "PostToolUse": [
      { "matcher": { "tool_name": "^(Write|Edit)$" },
        "type": "command",
        "command": "bash .claude/hooks/auto-format.sh" }
    ]
  },
  "env": { "NODE_ENV": "development" }
}

Canonical permission syntax is Tool(pattern) with a colon for “any args”: Bash(npm test:*) matches npm test, npm test --watch, npm test foo. Space-separated Bash(ls *) matches ls -la but not lsof. Evaluation: deny → ask → allow → defaultMode, most specific rule wins over array order.

What goes where

SettingTierWhy
Permission allow/deny listsProjectTeam-agreed surface; everyone on the repo gets the same.
Hook definitionsProjectDeterministic automation the whole team benefits from.
API keys, secrets, DATABASE_URLLocalGitignored by default; never leaves the machine.
Model, temperature, cost limitsUser or LocalPersonal choice; one developer’s smaller-model preference shouldn’t pin the team.
Audit logging, network restrictions, locked keysManagedNon-negotiable org policy with managed-settings.d/ for modular delivery.

Context impact

The split between instruction files and settings is the load-bearing decision.

Instruction filesSettings files
ExamplesCLAUDE.md, AGENTS.mdsettings.json, config.toml
Consumed as context?Yes — injected into the promptNo — parsed by the harness
Token costProportional to file sizeZero
Agent visibilityAgent reads and followsAgent does not see them directly
EnforcementSoft — model may deviateHard — harness enforces mechanically

Writing “never run rm -rf” in CLAUDE.md costs tokens and trusts the model. Putting Bash(rm -rf:*) in deny costs zero tokens and is deterministic. If a behavior can be expressed as a gate or trigger, it belongs in settings — not the prompt.


Layering: “never push to main”

The tiers are layers, not alternatives. Each catches what the layer above missed.

Guideline. CLAUDE.md says “Create a feature branch and open a pull request.” Handles the 90% case where the agent has fresh context and a cooperative goal.

Rule. A PreToolUse hook on Bash matches git push.*main and emits permissionDecision: "deny" with a reason injected back into context. Catches the cases where compaction dropped the guideline or the agent reasoned around it.

Gate. Repository branch protection requires a PR with approval before merging to main. Even if the guideline failed and the hook was misconfigured, the remote rejects the push at the protocol level.

Three independent failures must coincide before damage occurs. That is the point.


Decision principle

If a violation is measured in inconvenience, use a guideline. If it is measured in time spent debugging or reverting, use a rule. If it is measured in incidents — security breach, data loss, downtime — use a gate. The cost of getting it wrong is asymmetric: a guideline where a gate belonged is a future incident; a gate where a guideline belonged is a friction tax. See Sandboxing for OS-level gate mechanics and Anti-Patterns for the failure mode where load-bearing logic ends up in a prompt.


Regulatory context

The EU AI Act entered into force 1 Aug 2024; prohibitions and AI-literacy obligations are live since 2 Feb 2025, GPAI provider obligations and penalties since 2 Aug 2025, and the bulk of high-risk system rules plus full penalty enforcement land on 2 Aug 2026. The OWASP Top 10 for Agentic Applications (released 9 Dec 2025, the 2026 edition) maps directly onto enforcement controls: ASI01 (Agent Goal Hijack), ASI02 (Tool Misuse), ASI03 (Identity & Privilege Abuse), and ASI05 (Unexpected Code Execution) are each mitigated by some combination of PreToolUse rules and permission gates — the three-tier model is how those controls get implemented.