"""SecureCodeEnv - Interactive HTML Dashboard""" DASHBOARD_HTML = r""" SecureCodeEnv — RL Playground
v2.0.0 RL Environment Live
Episode Control
solution.py
Total Reward
/ 1.000 maximum
Score Breakdown
📊
Submit code to see scores
Feedback
💬
Feedback will appear here
Episode History 0 steps
No submissions yet
9
Security Tasks across 3 difficulty levels
7
Reward Dimensions
12+
CWE IDs Covered
5
Max Steps per Episode
⚔️
Dynamic Attack Grading
Real SQL injection, path traversal, JWT bypass, and XSS payloads are fired at submitted code each episode. Payloads are seeded-random so agents cannot memorise them.
🧠
CodeGraph Memory
The agent's codebase context accumulates across steps. Naming conventions, error handling patterns, and type hint usage are tracked and enforced across submissions.
🔒
Security Gate
An episode cannot be marked done unless attack resistance ≥ 75% AND static security ≥ 70%. Functional code that is insecure will never pass.
📈
Dense Reward Signal
7 dimensions give partial credit at every step. Agents receive granular feedback on what to improve rather than a binary pass/fail signal.
Quick Start
Use the Playground tab for interactive testing, or call the API directly from any HTTP client.
import requests BASE = "http://localhost:7860" # replace with your deployed URL # 1. Start episode ep = requests.post(f"{BASE}/reset", json={"difficulty": "medium"}).json() sid = ep["session_id"] print(ep["problem_statement"]) # 2. Submit code — graded across 7 dimensions result = requests.post(f"{BASE}/step", json={ "session_id": sid, "code": "def build_user_query(u, r):\n return ('SELECT * FROM users WHERE username=%s', (u,))", "filename": "solution.py" }).json() print(f"reward = {result['total_reward']:.3f}") print(result["feedback"]["summary"])
Endpoints
GET /health
Health check. Returns status, version, and tasks_loaded count.
POST /reset
Start new episode. Body: {"difficulty":"medium"} or {"task_id":"..."}. Returns task, starter code, and initial CodeGraph.
POST /step
Submit code. Body: {"session_id":"...","code":"...","filename":"..."}. Returns reward + per-dimension scores + feedback.
GET /state
Current episode state. Query param: session_id
GET /tasks
List all tasks. Optional filter: ?difficulty=easy
GET /tasks/{id}
Full task detail — problem statement, starter code, security checks.
GET /docs
Auto-generated Swagger UI with full schema documentation.
Reward Dimensions
Dimension Weight Tool Measures
correctness 25% Custom runner Test cases passed
attack_resist 25% Dynamic harness Real attack payloads blocked
static_security 20% bandit + AST CWE-mapped vulnerability patterns
consistency 10% CodeGraph Convention adherence across steps
performance 8% timeit Speed vs naive/optimal baselines
documentation 7% AST Docstrings + type hint coverage
code_structure 5% AST No bare print/except, clean structure
⚠ Security gate: episode cannot complete unless attack_resist ≥ 0.75 AND static_security ≥ 0.70 AND correctness ≥ 0.80
Step Response Example
{ "total_reward": 0.847, "scores": { "correctness": 1.0, "attack_resist": 0.875, "static_security": 0.9, "consistency": 0.75, "performance": 0.6, "documentation": 0.75, "code_structure": 0.8 }, "feedback": { "summary": "🟡 Good (0.847) — improve: consistency (0.75)", "attack_resist": "Good — SQL injection attacks blocked (87%)", "security_gate": "PASSED" }, "details": { "correctness": {"passed": 5, "total": 5}, "attacks": {"blocked": 7, "total": 8, "type": "injection"}, "security_gate_passed": true }, "done": false, "step_count": 1 }
"""