---
title: Trainx
emoji: 🔒
colorFrom: red
colorTo: blue
sdk: docker
pinned: true
license: apache-2.0
---

# SecureCodeEnv

**An RL environment for training LLM agents to write production-ready, secure Python code.**

---

## The Problem

Studies show **12–65% of LLM-generated code contains security vulnerabilities**. Secure-pass@1 rates remain below 12% for all frontier models even when functional pass@1 exceeds 50%.

Every existing RL environment trains agents to write code that **works**. None train agents to write code that is **safe, consistent, and production-ready**. SecureCodeEnv closes that gap.

---

## What Makes This Environment Different

| Feature | SecureCodeEnv | Typical RL Code Envs |
|---|---|---|
| Dynamic adversarial grading | ✅ Real attacks fired per episode | ❌ Static patterns only |
| CodeGraph memory | ✅ Cross-step convention tracking | ❌ Single-function only |
| CWE-grounded tasks | ✅ 9 tasks, 12+ CWE IDs | ❌ Generic correctness |
| Security gate on done | ✅ Attack + static thresholds | ❌ Pass/fail only |
| Anti-reward-hacking | ✅ Seeded random payloads | ❌ Fixed test cases |

---

## Reward System — 7 Dimensions

| Dimension | Weight | Tool | What It Measures |
|---|---|---|---|
| correctness | 25% | Custom test runner | Test cases passed |
| attack_resist | 25% | Dynamic harness | Real attack payloads blocked |
| static_security | 20% | bandit + AST | CWE-mapped vulnerability patterns |
| consistency | 10% | CodeGraph | Convention adherence across steps |
| performance | 8% | timeit | Speed vs naive/optimal baselines |
| documentation | 7% | AST | Docstring + type hint coverage |
| code_structure | 5% | AST | Clean code (no bare print/except) |

**Security gate:** episode cannot complete unless `attack_resist ≥ 0.75` AND `static_security ≥ 0.70` AND `correctness ≥ 0.80`.

---

## Tasks — 9 Tasks Across 3 Difficulty Levels

### Easy
| Task | CWE Targets |
|---|---|
| Password Validator | CWE-916, CWE-521 |
| Input Sanitizer | CWE-20, CWE-116 |
| Token Generator | CWE-338, CWE-330 |

### Medium
| Task | CWE Targets |
|---|---|
| SQL Query Builder | CWE-89 |
| File Path Handler | CWE-22 |
| Rate Limiter | CWE-770, CWE-400 |

### Hard
| Task | CWE Targets |
|---|---|
| File Upload Handler | CWE-22, CWE-434 |
| JWT Validator | CWE-347, CWE-613 |
| Auth Middleware | CWE-287, CWE-352 |

---

## Quick Start

```python
import requests

BASE = "http://localhost:7860"

# Start episode
ep = requests.post(f"{BASE}/reset", json={"difficulty": "medium"}).json()
sid = ep["session_id"]
print(ep["problem_statement"])

# Submit code
result = requests.post(f"{BASE}/step", json={
    "session_id": sid,
    "code": "def build_user_query(u, r):\n    return ('SELECT * FROM users WHERE username=%s', (u,))",
    "filename": "solution.py"
}).json()

print(f"reward={result['total_reward']:.3f}")
print(result["feedback"]["summary"])
```

---

## API Endpoints

| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
| POST | /reset | Start new episode |
| POST | /step | Submit code for grading |
| GET | /state | Current episode state |
| GET | /tasks | List all tasks |
| GET | /tasks/{id} | Task detail + starter code |
| GET | /docs | Swagger UI |

---

## Setup

```bash
# Docker (recommended)
docker build -t secure-code-env .
docker run -p 7860:7860 secure-code-env

# Direct
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 7860
```

## Run Baseline Inference

```bash
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o-mini
export HF_TOKEN=your_token
export ENV_URL=http://localhost:7860
python inference.py
```

## Pre-submission Validation

```bash
python validate.py --url http://localhost:7860
```

---

## Environment Variables

| Variable | Required | Description |
|---|---|---|
| `API_BASE_URL` | Yes (inference) | LLM API endpoint |
| `MODEL_NAME` | Yes (inference) | Model identifier |
| `HF_TOKEN` | Yes (inference) | API authentication token |
| `ENV_URL` | No | Override environment URL (default: localhost:7860) |