title: ForgeEnv
emoji: 🔧
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
tags:
- openenv
- self-play
- self-improvement
- code-repair
- schema-drift
- reinforcement-learning
- huggingface
short_description: Self-improving RL env for HF library-drift repair
ForgeEnv — OpenEnv Server
This Space hosts the ForgeEnv OpenEnv-compliant environment as a FastAPI
service. It exposes the standard reset, step, and state endpoints and is
the runtime that training notebooks (TRL + Unsloth) connect to.
Theme: Self-Improvement (Hackathon Theme #4) — Challenger / Solver co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.
What it does
ForgeEnv simulates HuggingFace library version drift. A Drift Generator proposes a realistic breakage to a working training script (renamed APIs, deprecated imports, changed argument signatures, etc.). A Repair Agent then emits a unified diff that should restore the script. Reward is computed by an execution simulator + AST checker + held-out evaluator (multi-component to resist reward hacking).
API
The server uses openenv-core and
follows the Gym-style contract:
| Endpoint | Method | Purpose |
|---|---|---|
/reset |
POST | Sample a fresh task, return drift-gen observation |
/step |
POST | Apply a ForgeAction (breakage or repair) |
/state |
GET | Inspect the current internal state |
/health |
GET | Health probe (used by the container HEALTHCHECK) |
ForgeAction is a discriminated union of BreakageAction (used in phase 1)
and RepairAction (used in phase 2). See
forgeenv/env/actions.py.
Quick test
curl -X POST https://akhiilll-forgeenv.hf.space/reset
curl https://akhiilll-forgeenv.hf.space/state
from openenv.core.env_client import EnvClient
async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
obs = await client.reset()
print(obs.observation.current_phase, obs.observation.task_id)
Project links
- Main repo / training notebooks / plots: https://github.com/akhiilll/forgeenv
- Repair Agent model (LoRA): https://huggingface.co/akhiilll/forgeenv-repair-agent
- Demo (Gradio + ZeroGPU): https://huggingface.co/spaces/akhiilll/forgeenv-demo
Citations
- Huang et al., R-Zero: Self-Evolving Reasoning LLM From Zero Data (2025)
- Zhao et al., Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2025)
- Liu et al., SPIRAL: Self-Play on Zero-Sum Games (2025)
- arXiv:2408.10215 — Reward engineering & shaping
- arXiv:2601.19100 — Reward engineering for RL in software tasks