| --- |
| title: ForgeEnv |
| emoji: π§ |
| colorFrom: indigo |
| colorTo: green |
| sdk: docker |
| app_port: 7860 |
| pinned: true |
| license: apache-2.0 |
| tags: |
| - openenv |
| - self-play |
| - self-improvement |
| - code-repair |
| - schema-drift |
| - reinforcement-learning |
| - huggingface |
| short_description: Self-improving RL env for HF library-drift repair |
| --- |
| |
| # ForgeEnv β OpenEnv Server |
|
|
| This Space hosts the **ForgeEnv** OpenEnv-compliant environment as a FastAPI |
| service. It exposes the standard `reset`, `step`, and `state` endpoints and is |
| the runtime that training notebooks (TRL + Unsloth) connect to. |
|
|
| > **Theme:** Self-Improvement (Hackathon Theme #4) β Challenger / Solver |
| > co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques. |
|
|
| ## What it does |
|
|
| ForgeEnv simulates **HuggingFace library version drift**. A *Drift Generator* |
| proposes a realistic breakage to a working training script (renamed APIs, |
| deprecated imports, changed argument signatures, etc.). A *Repair Agent* then |
| emits a unified diff that should restore the script. Reward is computed by an |
| execution simulator + AST checker + held-out evaluator (multi-component to |
| resist reward hacking). |
|
|
| ## API |
|
|
| The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and |
| follows the Gym-style contract: |
|
|
| | Endpoint | Method | Purpose | |
| | -------- | ------ | -------------------------------------------------- | |
| | `/reset` | POST | Sample a fresh task, return drift-gen observation | |
| | `/step` | POST | Apply a `ForgeAction` (breakage or repair) | |
| | `/state` | GET | Inspect the current internal state | |
| | `/health`| GET | Health probe (used by the container HEALTHCHECK) | |
|
|
| `ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1) |
| and `RepairAction` (used in phase 2). See |
| [`forgeenv/env/actions.py`](forgeenv/env/actions.py). |
|
|
| ## Quick test |
|
|
| ```bash |
| curl -X POST https://akhiilll-forgeenv.hf.space/reset |
| curl https://akhiilll-forgeenv.hf.space/state |
| ``` |
|
|
| ```python |
| from openenv.core.env_client import EnvClient |
| |
| async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client: |
| obs = await client.reset() |
| print(obs.observation.current_phase, obs.observation.task_id) |
| ``` |
|
|
| ## Project links |
|
|
| - **Main repo / training notebooks / plots:** |
| <https://github.com/akhiilll/forgeenv> |
| - **Repair Agent model (LoRA):** |
| <https://huggingface.co/akhiilll/forgeenv-repair-agent> |
| - **Demo (Gradio + ZeroGPU):** |
| <https://huggingface.co/spaces/akhiilll/forgeenv-demo> |
|
|
| ## Citations |
|
|
| - Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025) |
| - Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025) |
| - Liu et al., *SPIRAL: Self-Play on Zero-Sum Games* (2025) |
| - [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) β Reward engineering & shaping |
| - [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) β Reward engineering for RL in software tasks |
|
|