Sandboxing & Permissions
Why Sandboxing Matters
Agents execute code, modify files, run shell commands, and make network calls --- an agent without execution capabilities is a chatbot. But execution without isolation is indistinguishable from handing root access to a probabilistic system: a compromised agent can exfiltrate secrets, delete production files, install malicious dependencies, or provision cloud resources from a single misinterpreted instruction. Instruction files define what an agent should do; sandboxing defines what it can do.
Cross-Platform Comparison
| System | Sandboxing Approach | Permission Model |
|---|---|---|
| Claude Code | Local OS-level process isolation | File-based allow/deny lists + lifecycle hooks |
| OpenAI Codex | OS-enforced containers + cloud sandboxes | Trust model + approval modes (Suggest/Auto-Edit/Full-Auto) |
| OpenAI Responses API | Managed containers, two-phase runtime | Policy-gated; credentials never reach the model |
| Google ADK | Code execution sandbox + VPC Service Controls | Agent-auth vs User-auth + action confirmations |
| LangGraph | Tool-level execution restrictions | Middleware policies + human-in-the-loop gates |
Despite the differences in implementation, all five systems enforce the same structural principle: the agent’s execution environment is a strict subset of the host environment’s capabilities. No platform ships with “run anything, anywhere” as the default.
The Two-Phase Runtime
sequenceDiagram
participant R as Runtime
participant S as Setup Phase
participant E as Execution Phase
R->>S: Start container
Note over S: Network: ON
Note over S: Secrets: AVAILABLE
S->>S: Install dependencies
S->>S: Clone repos, authenticate APIs
S->>R: Setup complete
R->>E: Flip to execution
Note over E: Network: OFF
Note over E: Secrets: SCRUBBED
E->>E: Agent runs in isolation
E->>E: Read/write local filesystem only
Note over E: Cannot exfiltrate secrets —<br/>network is structurally severed
OpenAI’s Responses API introduced a runtime architecture that separates agent execution into two distinct phases. During the setup phase, the container has full network access and can read injected secrets to install dependencies, clone repos, and authenticate to APIs. Once setup completes, the runtime flips to the execution phase: network access is severed, environment variables containing secrets are scrubbed, and the agent runs in a fully isolated sandbox with access only to the local filesystem and pre-installed tools. Setup phase (network on, secrets available) then execution phase (network off, secrets removed). This prevents exfiltration structurally.
The tradeoff is reduced flexibility. Agents in the execution phase cannot call external APIs, fetch documentation, or authenticate to services. Tasks that require live network access during reasoning need alternative architectures --- typically MCP servers running outside the sandbox that proxy specific, pre-authorized requests.
Permission Models
| Model | How It Works | Default Posture | Example |
|---|---|---|---|
| Allow/Deny Lists (Claude Code) | File-based rules in settings.json control which tools and commands the agent can invoke. Anything not listed triggers a user prompt. Lifecycle hooks add runtime validation. | Deny unless explicitly allowed | "allow": ["Bash(git *)"], "deny": ["Bash(curl *)"] |
| Trust-Based (OpenAI Codex) | Agent must explicitly trust a project before loading its config. Three operational modes (Suggest/Auto-Edit/Full-Auto) control autonomy level. Even Full-Auto runs inside an OS-enforced sandbox. | Untrusted until user approves AGENTS.md | User clones a repo; Codex displays AGENTS.md and requires explicit approval before loading instructions. |
| Identity-Based (Google ADK) | Distinguishes agent-auth (service account, for agent’s own actions) from user-auth (delegated OAuth, for actions on behalf of the user). Integrates with VPC Service Controls for network-level isolation. | Scoped to declared auth context per tool | search_docs uses agent-auth; send_email uses user-auth --- agent cannot escalate via user credentials. |
| Role-Based (General Pattern) | RBAC with least-privilege defaults. Roles (reader, developer, deployer) define tool access, filesystem scope, and network reach. | Zero permissions; explicit grants only | A reader role gets [read_file, search] with read-only filesystem and no network. |
The principle is consistent across all models: start with zero permissions and add only what the task requires.
Human-in-the-Loop Patterns
- Permission gates: High-value actions pause execution and require explicit human approval before proceeding. Gates should be specific (approve individual actions, not categories), contextual (show the exact command and reasoning), and blocking (no timeout that auto-approves).
- Approval workflows for destructive operations: Irreversible or production-affecting actions require structured approval --- single approval with undo for file deletion, dual approval for production config changes, multi-party approval with thresholds for financial transactions.
- Time-bounded and action-bounded grants: Scope permissions to a window rather than granting them permanently. Examples: “execute shell commands for 30 minutes,” “write up to 10 files,” or session-scoped grants that reset on the next task. Claude Code’s permission prompt model is inherently session-bounded.
- Audit trails: Every agent action should be logged with timestamp, action, permission basis, and result. Audit trails serve incident investigation, compliance (proving oversight was maintained), and optimization (identifying gates that can be safely automated).
Implementation Examples
Claude Code: Locked-Down Project Configuration
// .claude/settings.json
{
"permissions": {
"allow": [
"Read",
"Glob",
"Grep",
"Bash(git *)",
"Bash(npm test)",
"Bash(npm run build)",
"Bash(npm run lint)"
],
"deny": [
"Bash(curl *)",
"Bash(wget *)",
"Bash(rm -rf *)",
"Bash(chmod *)",
"Bash(ssh *)",
"Bash(scp *)",
"Bash(npm publish *)",
"Bash(git push *)"
]
},
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "python3 .claude/hooks/check_no_secrets.py \"$TOOL_INPUT\""
}
]
}
],
"PostToolUse": [
{
"matcher": "Write",
"hooks": [
{
"type": "command",
"command": "python3 .claude/hooks/scan_written_file.py \"$TOOL_INPUT\""
}
]
}
]
}
}
This configuration allows the agent to read and search freely, run a curated set of safe commands, and blocks anything involving network access, destructive operations, or publishing. Hooks add runtime validation that the static allow/deny lists cannot cover --- a hook returning a non-zero exit code blocks the tool invocation entirely.
Regulatory Context
The EU AI Act (effective August 2026) classifies agentic systems operating in high-risk domains under mandatory requirements for human oversight, record-keeping, risk management, and transparency --- making sandboxing, HITL patterns, and audit trails compliance infrastructure rather than optional hardening. The OWASP Top 10 for AI Agents (2025) maps critical risks like excessive agency, privilege escalation, and secret exfiltration directly to sandboxing controls: least-privilege permissions, two-phase runtimes, identity-based auth separation, and pre-execution hooks.
Key Takeaways
- Agents without sandboxing are security liabilities. Every platform enforces isolation; the only question is how.
- The two-phase runtime (setup with network, execution without) is the strongest current defense against secret exfiltration. If your architecture allows it, adopt it.
- Permission models vary (allow/deny, trust-based, identity-based, role-based) but converge on the same principle: least privilege by default, explicit grants for elevated access.
- Human-in-the-loop is not a fallback --- it is a primary control. Gate destructive actions, bound permissions by time and scope, and log everything.
- Regulatory requirements (EU AI Act, OWASP) are making sandboxing and oversight mandatory, not optional. Build compliance into your architecture now.