File size: 3,059 Bytes
fb3e132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---

title: ForgeEnv
emoji: 🔧
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: true
license: apache-2.0
tags:
  - openenv
  - self-play
  - self-improvement
  - code-repair
  - schema-drift
  - reinforcement-learning
  - huggingface
short_description: Self-improving RL env for HF library-drift repair
---


# ForgeEnv — OpenEnv Server

This Space hosts the **ForgeEnv** OpenEnv-compliant environment as a FastAPI
service. It exposes the standard `reset`, `step`, and `state` endpoints and is
the runtime that training notebooks (TRL + Unsloth) connect to.

> **Theme:** Self-Improvement (Hackathon Theme #4) — Challenger / Solver
> co-evolution via R-Zero, SPIRAL, and Absolute Zero Reasoner techniques.

## What it does

ForgeEnv simulates **HuggingFace library version drift**. A *Drift Generator*
proposes a realistic breakage to a working training script (renamed APIs,
deprecated imports, changed argument signatures, etc.). A *Repair Agent* then
emits a unified diff that should restore the script. Reward is computed by an
execution simulator + AST checker + held-out evaluator (multi-component to
resist reward hacking).

## API

The server uses [`openenv-core`](https://pypi.org/project/openenv-core/) and
follows the Gym-style contract:

| Endpoint | Method | Purpose                                            |
| -------- | ------ | -------------------------------------------------- |
| `/reset` | POST   | Sample a fresh task, return drift-gen observation  |
| `/step`  | POST   | Apply a `ForgeAction` (breakage or repair)         |
| `/state` | GET    | Inspect the current internal state                 |
| `/health`| GET    | Health probe (used by the container HEALTHCHECK)   |

`ForgeAction` is a discriminated union of `BreakageAction` (used in phase 1)
and `RepairAction` (used in phase 2). See
[`forgeenv/env/actions.py`](forgeenv/env/actions.py).

## Quick test

```bash

curl -X POST https://akhiilll-forgeenv.hf.space/reset

curl https://akhiilll-forgeenv.hf.space/state

```

```python

from openenv.core.env_client import EnvClient



async with EnvClient(base_url="https://akhiilll-forgeenv.hf.space") as client:

    obs = await client.reset()

    print(obs.observation.current_phase, obs.observation.task_id)

```

## Project links

- **Main repo / training notebooks / plots:**
  <https://github.com/akhiilll/forgeenv>
- **Repair Agent model (LoRA):**
  <https://huggingface.co/akhiilll/forgeenv-repair-agent>
- **Demo (Gradio + ZeroGPU):**
  <https://huggingface.co/spaces/akhiilll/forgeenv-demo>

## Citations

- Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025)
- Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025)
- Liu et al., *SPIRAL: Self-Play on Zero-Sum Games* (2025)
- [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) — Reward engineering & shaping
- [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) — Reward engineering for RL in software tasks