Spaces:

akhiilll
/

forgeenv

Running

App Files Files Community

forgeenv / README.md

akhiilll

ForgeEnv deploy

fb3e132 verified 13 days ago

preview code

raw

history blame contribute delete

3.06 kB

metadata

title: ForgeEnv
emoji: 🔧
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
tags:
  - openenv
  - self-play
  - self-improvement
  - code-repair
  - schema-drift
  - reinforcement-learning
  - huggingface
short_description: Self-improving RL env for HF library-drift repair

ForgeEnv — OpenEnv Server

This Space hosts the ForgeEnv OpenEnv-compliant environment as a FastAPI service. It exposes the standard reset, step, and state endpoints and is the runtime that training notebooks (TRL + Unsloth) connect to.

Theme: Self-Improvement (Hackathon Theme #4) — Challenger / Solver co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.

What it does

ForgeEnv simulates HuggingFace library version drift. A Drift Generator proposes a realistic breakage to a working training script (renamed APIs, deprecated imports, changed argument signatures, etc.). A Repair Agent then emits a unified diff that should restore the script. Reward is computed by an execution simulator + AST checker + held-out evaluator (multi-component to resist reward hacking).

API

The server uses openenv-core and follows the Gym-style contract:

Endpoint	Method	Purpose
`/reset`	POST	Sample a fresh task, return drift-gen observation
`/step`	POST	Apply a `ForgeAction` (breakage or repair)
`/state`	GET	Inspect the current internal state
`/health`	GET	Health probe (used by the container HEALTHCHECK)

ForgeAction is a discriminated union of BreakageAction (used in phase 1) and RepairAction (used in phase 2). See forgeenv/env/actions.py.

Quick test

curl -X POST https://akhiilll-forgeenv.hf.space/reset
curl https://akhiilll-forgeenv.hf.space/state

from openenv.core.env_client import EnvClient

async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
    obs = await client.reset()
    print(obs.observation.current_phase, obs.observation.task_id)

Project links

Main repo / training notebooks / plots: https://github.com/akhiilll/forgeenv
Repair Agent model (LoRA): https://huggingface.co/akhiilll/forgeenv-repair-agent
Demo (Gradio + ZeroGPU): https://huggingface.co/spaces/akhiilll/forgeenv-demo

Citations

Huang et al., R-Zero: Self-Evolving Reasoning LLM From Zero Data (2025)
Zhao et al., Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2025)
Liu et al., SPIRAL: Self-Play on Zero-Sum Games (2025)
arXiv:2408.10215 — Reward engineering & shaping
arXiv:2601.19100 — Reward engineering for RL in software tasks