File size: 3,059 Bytes
fb3e132 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ---
title: ForgeEnv
emoji: 🔧
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
tags:
- openenv
- self-play
- self-improvement
- code-repair
- schema-drift
- reinforcement-learning
- huggingface
short_description: Self-improving RL env for HF library-drift repair
---
# ForgeEnv — OpenEnv Server
This Space hosts the **ForgeEnv** OpenEnv-compliant environment as a FastAPI
service. It exposes the standard `reset`, `step`, and `state` endpoints and is
the runtime that training notebooks (TRL + Unsloth) connect to.
> **Theme:** Self-Improvement (Hackathon Theme #4) — Challenger / Solver
> co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.
## What it does
ForgeEnv simulates **HuggingFace library version drift**. A *Drift Generator*
proposes a realistic breakage to a working training script (renamed APIs,
deprecated imports, changed argument signatures, etc.). A *Repair Agent* then
emits a unified diff that should restore the script. Reward is computed by an
execution simulator + AST checker + held-out evaluator (multi-component to
resist reward hacking).
## API
The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and
follows the Gym-style contract:
| Endpoint | Method | Purpose |
| -------- | ------ | -------------------------------------------------- |
| `/reset` | POST | Sample a fresh task, return drift-gen observation |
| `/step` | POST | Apply a `ForgeAction` (breakage or repair) |
| `/state` | GET | Inspect the current internal state |
| `/health`| GET | Health probe (used by the container HEALTHCHECK) |
`ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1)
and `RepairAction` (used in phase 2). See
[`forgeenv/env/actions.py`](forgeenv/env/actions.py).
## Quick test
```bash
curl -X POST https://akhiilll-forgeenv.hf.space/reset
curl https://akhiilll-forgeenv.hf.space/state
```
```python
from openenv.core.env_client import EnvClient
async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
obs = await client.reset()
print(obs.observation.current_phase, obs.observation.task_id)
```
## Project links
- **Main repo / training notebooks / plots:**
<https://github.com/akhiilll/forgeenv>
- **Repair Agent model (LoRA):**
<https://huggingface.co/akhiilll/forgeenv-repair-agent>
- **Demo (Gradio + ZeroGPU):**
<https://huggingface.co/spaces/akhiilll/forgeenv-demo>
## Citations
- Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025)
- Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025)
- Liu et al., *SPIRAL: Self-Play on Zero-Sum Games* (2025)
- [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) — Reward engineering & shaping
- [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) — Reward engineering for RL in software tasks
|