EntropyEnv / openenv.yaml
immortalindeed's picture
Fix Phase 2 OpenEnv validation traps: add grader paths to openenv.yaml and safe parameterless defaults
699f953
spec_version: 1
name: multi-agent-dev-tools-env
description: >
A multi-domain RL environment for training AI agents on real-world developer
and clinical tasks. Covers MCP security auditing, PyTorch migration debugging,
and clinical workflow chaos recovery. 9 tasks across 3 domains with graded
difficulty (easy/medium/hard).
type: environment
runtime: docker
port: 7860
# Action and Observation spaces use typed Pydantic models
# See server/models/ for full definitions
tasks:
- id: sec_easy
grader: server.graders.security_grader.grade
name: Single vulnerability classification
difficulty: easy
description: Identify vulnerability type, CVSS score, and severity from a tool-call snippet.
- id: sec_medium
grader: server.graders.security_grader.grade
name: Vulnerability identification + fix proposal
difficulty: medium
description: Identify the vulnerability and propose a secure code fix.
- id: sec_hard
grader: server.graders.security_grader.grade
name: Adversarial patch defense with reviewer feedback
difficulty: hard
description: Identify, fix, and iteratively revise based on reviewer feedback.
- id: dep_easy
grader: server.graders.dependency_grader.grade
name: PyTorch 1.x deprecated API detection
difficulty: easy
description: Flag outdated packages and deprecated API usage.
- id: dep_medium
grader: server.graders.dependency_grader.grade
name: Version conflict chain resolution
difficulty: medium
description: Resolve version conflicts using compatibility matrix constraints.
- id: dep_hard
grader: server.graders.dependency_grader.grade
name: torch.compile graph-break hunter
difficulty: hard
description: Fix torch.compile graph-break patterns in dependency order.
- id: cli_easy
grader: server.graders.clinical_grader.grade
name: Single workflow gap detection
difficulty: easy
description: Detect missing steps in a clinical workflow and assess risk.
- id: cli_medium
grader: server.graders.clinical_grader.grade
name: Multi-gap priority ranking
difficulty: medium
description: Detect gaps and rank them by clinical priority.
- id: cli_hard
grader: server.graders.clinical_grader.grade
name: Dependency-ordered recovery planning
difficulty: hard
description: Plan a dependency-safe recovery sequence for a disrupted clinical workflow.