forgeenv / README.md
akhiilll's picture
ForgeEnv deploy
fb3e132 verified
metadata
title: ForgeEnv
emoji: 🔧
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
tags:
  - openenv
  - self-play
  - self-improvement
  - code-repair
  - schema-drift
  - reinforcement-learning
  - huggingface
short_description: Self-improving RL env for HF library-drift repair

ForgeEnv — OpenEnv Server

This Space hosts the ForgeEnv OpenEnv-compliant environment as a FastAPI service. It exposes the standard reset, step, and state endpoints and is the runtime that training notebooks (TRL + Unsloth) connect to.

Theme: Self-Improvement (Hackathon Theme #4) — Challenger / Solver co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.

What it does

ForgeEnv simulates HuggingFace library version drift. A Drift Generator proposes a realistic breakage to a working training script (renamed APIs, deprecated imports, changed argument signatures, etc.). A Repair Agent then emits a unified diff that should restore the script. Reward is computed by an execution simulator + AST checker + held-out evaluator (multi-component to resist reward hacking).

API

The server uses openenv-core and follows the Gym-style contract:

Endpoint Method Purpose
/reset POST Sample a fresh task, return drift-gen observation
/step POST Apply a ForgeAction (breakage or repair)
/state GET Inspect the current internal state
/health GET Health probe (used by the container HEALTHCHECK)

ForgeAction is a discriminated union of BreakageAction (used in phase 1) and RepairAction (used in phase 2). See forgeenv/env/actions.py.

Quick test

curl -X POST https://akhiilll-forgeenv.hf.space/reset
curl https://akhiilll-forgeenv.hf.space/state
from openenv.core.env_client import EnvClient

async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
    obs = await client.reset()
    print(obs.observation.current_phase, obs.observation.task_id)

Project links

Citations

  • Huang et al., R-Zero: Self-Evolving Reasoning LLM From Zero Data (2025)
  • Zhao et al., Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2025)
  • Liu et al., SPIRAL: Self-Play on Zero-Sum Games (2025)
  • arXiv:2408.10215 — Reward engineering & shaping
  • arXiv:2601.19100 — Reward engineering for RL in software tasks