akhiilll
/

forgeenv-source

Model card Files Files and versions

forgeenv-source / forgeenv-space /README.md

akhiilll's picture

forgeenv source snapshot for training job

a15535e verified 13 days ago

|

history blame contribute delete

2.97 kB

	---
	title: ForgeEnv
	emoji: 🔧
	colorFrom: indigo
	colorTo: green
	sdk: docker
	app_port: 7860
	pinned: true
	license: apache-2.0
	tags:
	- openenv
	- self-play
	- self-improvement
	- code-repair
	- schema-drift
	- reinforcement-learning
	- huggingface
	short_description: Self-improving RL env for HF library-drift repair
	---

	# ForgeEnv — OpenEnv Server

	This Space hosts the ForgeEnv OpenEnv-compliant environment as a FastAPI
	service. It exposes the standard `reset`, `step`, and `state` endpoints and is
	the runtime that training notebooks (TRL + Unsloth) connect to.

	> Theme: Self-Improvement (Hackathon Theme #4) — Challenger / Solver
	> co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.

	## What it does

	ForgeEnv simulates HuggingFace library version drift. A Drift Generator
	proposes a realistic breakage to a working training script (renamed APIs,
	deprecated imports, changed argument signatures, etc.). A Repair Agent then
	emits a unified diff that should restore the script. Reward is computed by an
	execution simulator + AST checker + held-out evaluator (multi-component to
	resist reward hacking).

	## API

	The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and
	follows the Gym-style contract:

	\| Endpoint \| Method \| Purpose \|
	\| -------- \| ------ \| -------------------------------------------------- \|
	\| `/reset` \| POST \| Sample a fresh task, return drift-gen observation \|
	\| `/step` \| POST \| Apply a `ForgeAction` (breakage or repair) \|
	\| `/state` \| GET \| Inspect the current internal state \|
	\| `/health`\| GET \| Health probe (used by the container HEALTHCHECK) \|

	`ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1)
	and `RepairAction` (used in phase 2). See
	[`forgeenv/env/actions.py`](forgeenv/env/actions.py).

	## Quick test

	```bash
	curl -X POST https://akhiilll-forgeenv.hf.space/reset
	curl https://akhiilll-forgeenv.hf.space/state
	```

	```python
	from openenv.core.env_client import EnvClient

	async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:
	obs = await client.reset()
	print(obs.observation.current_phase, obs.observation.task_id)
	```

	## Project links

	- Main repo / training notebooks / plots:
	<https://github.com/akhiilll/forgeenv>
	- Repair Agent model (LoRA):
	<https://huggingface.co/akhiilll/forgeenv-repair-agent>
	- Demo (Gradio + ZeroGPU):
	<https://huggingface.co/spaces/akhiilll/forgeenv-demo>

	## Citations

	- Huang et al., R-Zero: Self-Evolving Reasoning LLM From Zero Data (2025)
	- Zhao et al., Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2025)
	- Liu et al., SPIRAL: Self-Play on Zero-Sum Games (2025)
	- [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) — Reward engineering & shaping
	- [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) — Reward engineering for RL in software tasks