Sandboxing & Permissions

Why Sandboxing Matters

Agents execute code, modify files, run shell commands, and make network calls --- an agent without execution capabilities is a chatbot. But execution without isolation is indistinguishable from handing root access to a probabilistic system: a compromised agent can exfiltrate secrets, delete production files, install malicious dependencies, or provision cloud resources from a single misinterpreted instruction. Instruction files define what an agent should do; sandboxing defines what it can do.


Cross-Platform Comparison

SystemSandboxing ApproachPermission Model
Claude CodeLocal OS-level process isolationFile-based allow/deny lists + lifecycle hooks
OpenAI CodexOS-enforced containers + cloud sandboxesTrust model + approval modes (Suggest/Auto-Edit/Full-Auto)
OpenAI Responses APIManaged containers, two-phase runtimePolicy-gated; credentials never reach the model
Google ADKCode execution sandbox + VPC Service ControlsAgent-auth vs User-auth + action confirmations
LangGraphTool-level execution restrictionsMiddleware policies + human-in-the-loop gates

Despite the differences in implementation, all five systems enforce the same structural principle: the agent’s execution environment is a strict subset of the host environment’s capabilities. No platform ships with “run anything, anywhere” as the default.


The Two-Phase Runtime

sequenceDiagram
    participant R as Runtime
    participant S as Setup Phase
    participant E as Execution Phase

    R->>S: Start container
    Note over S: Network: ON
    Note over S: Secrets: AVAILABLE
    S->>S: Install dependencies
    S->>S: Clone repos, authenticate APIs
    S->>R: Setup complete

    R->>E: Flip to execution
    Note over E: Network: OFF
    Note over E: Secrets: SCRUBBED
    E->>E: Agent runs in isolation
    E->>E: Read/write local filesystem only
    Note over E: Cannot exfiltrate secrets —<br/>network is structurally severed

OpenAI’s Responses API introduced a runtime architecture that separates agent execution into two distinct phases. During the setup phase, the container has full network access and can read injected secrets to install dependencies, clone repos, and authenticate to APIs. Once setup completes, the runtime flips to the execution phase: network access is severed, environment variables containing secrets are scrubbed, and the agent runs in a fully isolated sandbox with access only to the local filesystem and pre-installed tools. Setup phase (network on, secrets available) then execution phase (network off, secrets removed). This prevents exfiltration structurally.

The tradeoff is reduced flexibility. Agents in the execution phase cannot call external APIs, fetch documentation, or authenticate to services. Tasks that require live network access during reasoning need alternative architectures --- typically MCP servers running outside the sandbox that proxy specific, pre-authorized requests.


Permission Models

ModelHow It WorksDefault PostureExample
Allow/Deny Lists (Claude Code)File-based rules in settings.json control which tools and commands the agent can invoke. Anything not listed triggers a user prompt. Lifecycle hooks add runtime validation.Deny unless explicitly allowed"allow": ["Bash(git *)"], "deny": ["Bash(curl *)"]
Trust-Based (OpenAI Codex)Agent must explicitly trust a project before loading its config. Three operational modes (Suggest/Auto-Edit/Full-Auto) control autonomy level. Even Full-Auto runs inside an OS-enforced sandbox.Untrusted until user approves AGENTS.mdUser clones a repo; Codex displays AGENTS.md and requires explicit approval before loading instructions.
Identity-Based (Google ADK)Distinguishes agent-auth (service account, for agent’s own actions) from user-auth (delegated OAuth, for actions on behalf of the user). Integrates with VPC Service Controls for network-level isolation.Scoped to declared auth context per toolsearch_docs uses agent-auth; send_email uses user-auth --- agent cannot escalate via user credentials.
Role-Based (General Pattern)RBAC with least-privilege defaults. Roles (reader, developer, deployer) define tool access, filesystem scope, and network reach.Zero permissions; explicit grants onlyA reader role gets [read_file, search] with read-only filesystem and no network.

The principle is consistent across all models: start with zero permissions and add only what the task requires.


Human-in-the-Loop Patterns


Implementation Examples

Claude Code: Locked-Down Project Configuration

// .claude/settings.json
{
  "permissions": {
    "allow": [
      "Read",
      "Glob",
      "Grep",
      "Bash(git *)",
      "Bash(npm test)",
      "Bash(npm run build)",
      "Bash(npm run lint)"
    ],
    "deny": [
      "Bash(curl *)",
      "Bash(wget *)",
      "Bash(rm -rf *)",
      "Bash(chmod *)",
      "Bash(ssh *)",
      "Bash(scp *)",
      "Bash(npm publish *)",
      "Bash(git push *)"
    ]
  },
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/check_no_secrets.py \"$TOOL_INPUT\""
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/scan_written_file.py \"$TOOL_INPUT\""
          }
        ]
      }
    ]
  }
}

This configuration allows the agent to read and search freely, run a curated set of safe commands, and blocks anything involving network access, destructive operations, or publishing. Hooks add runtime validation that the static allow/deny lists cannot cover --- a hook returning a non-zero exit code blocks the tool invocation entirely.


Regulatory Context

The EU AI Act (effective August 2026) classifies agentic systems operating in high-risk domains under mandatory requirements for human oversight, record-keeping, risk management, and transparency --- making sandboxing, HITL patterns, and audit trails compliance infrastructure rather than optional hardening. The OWASP Top 10 for AI Agents (2025) maps critical risks like excessive agency, privilege escalation, and secret exfiltration directly to sandboxing controls: least-privilege permissions, two-phase runtimes, identity-based auth separation, and pre-execution hooks.


Key Takeaways