name: SecureCodeEnv version: "2.0.0" description: > An RL environment for training LLM agents to write production-ready, secure Python code. Agents are graded on correctness, security attack resistance (dynamic adversarial payloads), CWE-based static analysis, performance, and codebase consistency via a novel CodeGraph memory system. No other public OpenEnv environment combines attack simulation + codebase consistency grading. All grading is 100% automated and deterministic. author: Vishal Dhakad hf_space: vishaldhakad/SecureCodeEnv license: MIT action_space: type: text description: Python source code string submitted by the agent fields: - name: code type: string description: The complete Python function(s) to be graded - name: filename type: string description: Logical filename for CodeGraph tracking (e.g. src/auth/validator.py) - name: session_id type: string description: Session ID returned from /reset observation_space: type: structured fields: - name: total_reward type: float range: [0.0, 1.0] description: Weighted final score across all 7 dimensions - name: scores type: dict description: > Per-dimension scores: correctness, attack_resist, static_security, consistency, performance, documentation, code_structure - name: feedback type: dict description: Human-readable feedback string per grading dimension - name: codegraph type: dict description: > Full codebase context including components, detected conventions, dependency list, and natural-language context prompt for the agent - name: done type: bool description: True if episode is complete (reward >= 0.90 or max steps reached) - name: step_count type: int description: Current step number within the episode reward: type: multi_dimensional range: [0.0, 1.0] dimensions: - name: correctness weight: 0.30 description: Fraction of test cases passed (including edge cases) - name: attack_resistance weight: 0.20 description: Fraction of randomized adversarial payloads blocked - name: static_security weight: 0.15 description: bandit + AST security linter score (CWE-mapped) - name: codegraph_consistency weight: 0.15 description: Adherence to conventions from existing codebase components - name: performance weight: 0.10 description: Relative efficiency vs naive/optimal baselines (timeit) - name: documentation weight: 0.05 description: Docstring + type hint coverage across all functions - name: code_structure weight: 0.05 description: Clean code checks (no bare print, no bare except, etc.) tasks: - id: easy_password_validator difficulty: easy cwe: [CWE-916, CWE-521] description: Validate password strength and hash with bcrypt (not MD5) - id: easy_input_sanitizer difficulty: easy cwe: [CWE-20, CWE-116] description: Sanitize HTML (XSS prevention) and filenames - id: easy_token_generator difficulty: easy cwe: [CWE-338, CWE-330] description: Generate cryptographically secure tokens using secrets module - id: medium_sql_query_builder difficulty: medium cwe: [CWE-89, CWE-20] description: Build parameterized SQL queries — never string-format user input - id: medium_file_path_handler difficulty: medium cwe: [CWE-22, CWE-20] description: Resolve file paths safely — block path traversal attacks - id: medium_rate_limiter difficulty: medium cwe: [CWE-770, CWE-400] description: Thread-safe sliding window rate limiter - id: hard_file_upload_handler difficulty: hard cwe: [CWE-22, CWE-434] description: Validate uploads — block traversal filenames, executable extensions, MIME spoofing - id: hard_jwt_validator difficulty: hard cwe: [CWE-347, CWE-613] description: Validate JWTs — enforce HS256, block none-alg attack, check expiry - id: hard_auth_middleware difficulty: hard cwe: [CWE-287, CWE-352] description: CSRF protection and Bearer auth using hmac.compare_digest (timing-safe) runtime: max_steps_per_episode: 5 done_reward_threshold: 0.90 max_inference_time_minutes: 20 min_vcpu: 2 min_memory_gb: 8 port: 7860 endpoints: health: GET /health reset: POST /reset step: POST /step state: GET /state docs: GET /docs