"Design a system where an AI agent generates and executes code (Python, SQL, shell scripts). Must be fully sandboxed, time-limited, resource-constrained, and auditable."
| Dimension | Clarification | Assumption |
|---|---|---|
| Languages | Which languages to support? | Python, SQL, shell scripts (Bash). Extensible to JS/R. |
| Use Cases | What kinds of code? | Data queries, report generation, automation scripts, data transforms |
| Data Access | What data can code access? | Tenant's own data only, via controlled data access proxy |
| Execution Limits | Time and resource limits? | 30s max execution, 1 CPU, 512MB RAM, 100MB disk |
| Audit | What needs to be logged? | Every execution: code, input, output, duration, user, status |
SANDBOXED CODE EXECUTION PIPELINE
═══════════════════════════════════════════════════════════════════
AI Agent generates code
│
┌────v──────────┐ ┌──────────────┐ ┌──────────────────┐
│ CODE │────>│ SANDBOX │────>│ EXECUTE │
│ VALIDATOR │ │ POOL │ │ │
│ │ │ │ │ Resource limits: │
│ • Static │ │ • gVisor / │ │ • 1 CPU │
│ analysis │ │ Firecracker│ │ • 512MB RAM │
│ • Whitelist │ │ • Fresh per │ │ • 100MB disk │
│ libs │ │ execution │ │ • 30s timeout │
│ • SQL inject │ │ • Warm pool │ │ • SIGKILL on │
│ prevention │ │ (300) │ │ timeout │
│ • LLM safety │ │ • tmpfs │ │ │
│ review │ │ filesystem │ │ │
└───────────────┘ └──────────────┘ └────────┬─────────┘
│
┌──────────────────────────────────────────────┘
│
┌────v──────────┐ ┌──────────────┐
│ OUTPUT │────>│ AUDIT LOG │
│ SANITIZER │ │ │
│ │ │ • Code │
│ • Truncate │ │ • Input │
│ large output│ │ • Output │
│ • Redact PII │ │ • Duration │
│ • Format │ │ • User │
│ results │ │ • Status │
└───────────────┘ └──────────────┘
Before any code touches a sandbox, it passes through 4 layers of validation:
1 Static Analysis (AST Parsing)
2 Library Whitelist
3 SQL Injection Prevention
4 LLM Safety Review (for complex cases)
Defense in Depth: Validation is the FIRST line of defense, not the ONLY one. Even if validation misses something, the sandbox itself prevents real damage (no network, no persistent filesystem, resource limits, SIGKILL timeout).
| Technology | Isolation Level | Spin-Up Time | Best For |
|---|---|---|---|
| gVisor (runsc) | Kernel-level syscall filtering | <500ms (warm) | Most use cases. Good balance of security + speed. |
| Firecracker | Full microVM isolation | <125ms (warm) | Highest security needs. AWS Lambda uses this. |
| Docker + seccomp | Container-level | <1s | Development/testing. Not recommended for production. |
SANDBOX POOL MANAGEMENT ═══════════════════════════════════════════════════════ WARM POOL (300 pre-created sandboxes) ┌─────────────────────────────────────────────────┐ │ [sandbox-001] IDLE │ Python 3.11 + libs loaded │ │ [sandbox-002] IDLE │ Python 3.11 + libs loaded │ │ [sandbox-003] IN USE│ Running user code... │ │ [sandbox-004] IDLE │ Python 3.11 + libs loaded │ │ ... │ │ [sandbox-300] IDLE │ Python 3.11 + libs loaded │ └─────────────────────────────────────────────────┘ LIFECYCLE: ───────────────────────────────────────────────────── 1. IDLE → Checkout (assign to execution request) 2. IN USE → Code runs inside sandbox 3. COMPLETE → Sandbox DESTROYED (never reused) 4. REPLENISH → New sandbox created to maintain pool size WHY DESTROY? A previous execution might have left state (variables, temp files, modified env). Fresh sandbox = zero leakage.
Code inside the sandbox cannot directly access databases. Instead, it talks to a Data Access Proxy that enforces permissions:
DATA ACCESS ARCHITECTURE ═══════════════════════════════════════════════════════ ┌──────────┐ ┌──────────────────┐ ┌──────────────┐ │ Sandbox │────>│ Data Access │────>│ Read-Only │ │ (Code) │ │ Proxy │ │ Replica │ │ │ │ │ │ (Database) │ │ import │ │ • Validates SQL │ │ │ │ db_client│ │ • Checks perms │ │ SELECT only │ │ │ │ • Enforces row │ │ No writes │ │ result = │ │ limits (10K) │ │ │ │ db.query(│ │ • No raw creds │ │ │ │ "SELECT │ │ • Query timeout │ │ │ │ ...") │ │ (10s) │ │ │ └──────────┘ └──────────────────┘ └──────────────┘
| Field | Example |
|---|---|
| execution_id | exec-2026031510-abc123 |
| tenant_id | acme-corp |
| user_id | jane@acme.com |
| language | python |
| code (sanitized) | import pandas as pd; df = db.query("SELECT...")... |
| duration_ms | 3,450 |
| status | success |
| output_size_bytes | 12,480 |
| data_accessed | ["tickets", "users"] (tables queried) |
| rows_returned | 247 |