--- title: ForgeEnv emoji: 🔧 colorFrom: indigo colorTo: green sdk: docker app_port: 7860 pinned: true license: apache-2.0 tags: - openenv - self-play - self-improvement - code-repair - schema-drift - reinforcement-learning - huggingface short_description: Self-improving RL env for HF library-drift repair --- # ForgeEnv — OpenEnv Server This Space hosts the **ForgeEnv** OpenEnv-compliant environment as a FastAPI service. It exposes the standard `reset`, `step`, and `state` endpoints and is the runtime that training notebooks (TRL + Unsloth) connect to. > **Theme:** Self-Improvement (Hackathon Theme #4) — Challenger / Solver > co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques. ## What it does ForgeEnv simulates **HuggingFace library version drift**. A *Drift Generator* proposes a realistic breakage to a working training script (renamed APIs, deprecated imports, changed argument signatures, etc.). A *Repair Agent* then emits a unified diff that should restore the script. Reward is computed by an execution simulator + AST checker + held-out evaluator (multi-component to resist reward hacking). ## API The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and follows the Gym-style contract: | Endpoint | Method | Purpose | | -------- | ------ | -------------------------------------------------- | | `/reset` | POST | Sample a fresh task, return drift-gen observation | | `/step` | POST | Apply a `ForgeAction` (breakage or repair) | | `/state` | GET | Inspect the current internal state | | `/health`| GET | Health probe (used by the container HEALTHCHECK) | `ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1) and `RepairAction` (used in phase 2). See [`forgeenv/env/actions.py`](forgeenv/env/actions.py). ## Quick test ```bash curl -X POST https://akhiilll-forgeenv.hf.space/reset curl https://akhiilll-forgeenv.hf.space/state ``` ```python from openenv.core.env_client import EnvClient async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client: obs = await client.reset() print(obs.observation.current_phase, obs.observation.task_id) ``` ## Project links - **Main repo / training notebooks / plots:** - **Repair Agent model (LoRA):** - **Demo (Gradio + ZeroGPU):** ## Citations - Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025) - Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025) - Liu et al., *SPIRAL: Self-Play on Zero-Sum Games* (2025) - [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) — Reward engineering & shaping - [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) — Reward engineering for RL in software tasks