| ---
|
| title: ForgeEnv
|
| emoji: π§
|
| colorFrom: indigo
|
| colorTo: green
|
| sdk: docker
|
| app_port: 7860
|
| pinned: true
|
| license: apache-2.0
|
| tags:
|
| - openenv
|
| - self-play
|
| - self-improvement
|
| - code-repair
|
| - schema-drift
|
| - reinforcement-learning
|
| - huggingface
|
| short_description: Self-improving RL env for HF library-drift repair
|
| ---
|
|
|
| # ForgeEnv β OpenEnv Server
|
|
|
| This Space hosts the **ForgeEnv** OpenEnv-compliant environment as a FastAPI
|
| service. It exposes the standard `reset`, `step`, and `state` endpoints and is
|
| the runtime that training notebooks (TRL + Unsloth) connect to.
|
|
|
| > **Theme:** Self-Improvement (Hackathon Theme #4) β Challenger / Solver
|
| > co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.
|
|
|
| ## What it does
|
|
|
| ForgeEnv simulates **HuggingFace library version drift**. A *Drift Generator*
|
| proposes a realistic breakage to a working training script (renamed APIs,
|
| deprecated imports, changed argument signatures, etc.). A *Repair Agent* then
|
| emits a unified diff that should restore the script. Reward is computed by an
|
| execution simulator + AST checker + held-out evaluator (multi-component to
|
| resist reward hacking).
|
|
|
| ## API
|
|
|
| The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and
|
| follows the Gym-style contract:
|
|
|
| | Endpoint | Method | Purpose |
|
| | -------- | ------ | -------------------------------------------------- |
|
| | `/reset` | POST | Sample a fresh task, return drift-gen observation |
|
| | `/step` | POST | Apply a `ForgeAction` (breakage or repair) |
|
| | `/state` | GET | Inspect the current internal state |
|
| | `/health`| GET | Health probe (used by the container HEALTHCHECK) |
|
|
|
| `ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1)
|
| and `RepairAction` (used in phase 2). See
|
| [`forgeenv/env/actions.py`](forgeenv/env/actions.py).
|
|
|
| ## Quick test
|
|
|
| ```bash
|
| curl -X POST https://akhiilll-forgeenv.hf.space/reset
|
| curl https://akhiilll-forgeenv.hf.space/state
|
| ```
|
|
|
| ```python
|
| from openenv.core.env_client import EnvClient
|
|
|
| async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
|
| obs = await client.reset()
|
| print(obs.observation.current_phase, obs.observation.task_id)
|
| ```
|
|
|
| ## Project links
|
|
|
| - **Main repo / training notebooks / plots:**
|
| <https://github.com/akhiilll/forgeenv>
|
| - **Repair Agent model (LoRA):**
|
| <https://huggingface.co/akhiilll/forgeenv-repair-agent>
|
| - **Demo (Gradio + ZeroGPU):**
|
| <https://huggingface.co/spaces/akhiilll/forgeenv-demo>
|
|
|
| ## Citations
|
|
|
| - Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025)
|
| - Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025)
|
| - Liu et al., *SPIRAL: Self-Play on Zero-Sum Games* (2025)
|
| - [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) β Reward engineering & shaping
|
| - [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) β Reward engineering for RL in software tasks
|
|
|