File size: 9,249 Bytes

a15535e

# ForgeEnv 🔧

> *A self-improving RL environment that teaches LLMs to fix HuggingFace
> training scripts as the ecosystem evolves.*

ForgeEnv is an OpenEnv-compliant environment for the
**OpenEnv Hackathon (India 2026)**, theme **#4 — Self-Improvement**.
Two LLM roles co-evolve inside a single environment:

- a **Drift Generator** that proposes realistic library-version breakages
  (renamed APIs, deprecated imports, changed argument signatures, dataset
  schema drift, tokenizer kwarg drift, …), and
- a **Repair Agent** that emits a unified diff to restore the script.

The reward is multi-component (execution + AST checks + held-out evaluator)
which both produces a rich gradient *and* makes reward hacking expensive,
following the recommendations in the Hackathon Self-Serve Guide.

## Why it matters

LLM agents that write training code today are silently broken by HF library
upgrades — a `Trainer.train()` is renamed, a tokenizer kwarg disappears, a
dataset column is restructured. Today, humans patch these. ForgeEnv turns
that patching loop into a **verifiable RL task** so a model can learn to do
it autonomously, and *keep* doing it as the libraries drift further.

## Live links

| Artifact                    | URL                                                                  |
| --------------------------- | -------------------------------------------------------------------- |
| Environment Space (Docker)  | <https://huggingface.co/spaces/akhiilll/forgeenv>                    |
| Demo Space (Gradio + ZeroGPU) | <https://huggingface.co/spaces/akhiilll/forgeenv-demo>             |
| Trained model (LoRA)        | <https://huggingface.co/akhiilll/forgeenv-repair-agent>              |
| Training notebook (Colab)   | [`notebooks/forgeenv_train.ipynb`](notebooks/forgeenv_train.ipynb)   |

## Architecture

```
                 ┌──────────────────┐
                 │  Teacher (deter- │     curriculum →
                 │  ministic)       │     {RenameApiCall, DeprecateImport, …}
                 └──────────────────┘
                          │ target_category
                          ▼
┌────────────────────────────────────────────────────────────────┐
│ ForgeEnvironment (OpenEnv)                                      │
│   reset()  →  drift_gen obs (script, target_category)           │
│   step(BreakageAction)  →  repair obs (broken_script, trace)    │
│   step(RepairAction)    →  reward, breakdown, held-out scores   │
│                                                                 │
│   ┌───────────────────┐    ┌──────────────────────┐            │
│   │ Drift Generator   │    │ Repair Agent         │            │
│   │ (LLM, GRPO)       │    │ (LLM, GRPO + SFT)    │            │
│   └───────────────────┘    └──────────────────────┘            │
│                                                                 │
│   ┌───────────────────────────────────────────────────────┐    │
│   │ Simulator (AST + heuristic exec) + Visible Verifier   │    │
│   │ + Held-out Evaluator + Library Drift Engine            │    │
│   └───────────────────────────────────────────────────────┘    │
└────────────────────────────────────────────────────────────────┘
```

The two-step episode flow (Phase 1 = drift, Phase 2 = repair) is exactly
the Challenger / Solver loop from R-Zero, with role-switched prompts à la
SPIRAL and Absolute Zero Reasoner.

## Reward design

```
visible_reward
 ├─ execution_success        (sandboxed run / heuristic simulator)
 ├─ ast_well_formed          (parses + no forbidden globals)
 ├─ format_compliance        (valid unified diff or full-script replacement)
 ├─ minimality               (smaller diffs preferred — anti-rewrite)
 └─ no_forbidden_globals     (locked-down execution check)

held_out_evaluator (NOT used for training, used for evals only)
 ├─ executed_cleanly
 ├─ matches_target_api       (semantic correctness)
 └─ regression_free          (other tests still pass)
```

Multiple independent components, plus a **held-out evaluator the trainer
never sees**, so the agent can't game its way to the top of the curve.

## Results (50 episodes / agent, oracle as upper-bound proxy for trained)

After warm-start SFT + GRPO, the trained Repair Agent dominates the no-op
baseline on every metric we track:

| Agent              | Mean visible reward | Success rate (held-out exec) |
| ------------------ | ------------------- | ---------------------------- |
| Baseline (no-op)   | **0.90**            | **50 %**                     |
| Trained (oracle)   | **1.51**            | **86 %**                     |

Three plots (committed to `artifacts/plots/`):

- `baseline_vs_trained.png` — reward distribution, baseline vs trained.
- `training_reward_curve.png` — reward trajectory across episodes.
- `success_by_category.png` — per-primitive success rates.

A 43-entry `repair_library.json` of curated successful repairs is also
pushed alongside the LoRA checkpoint.

## Quick start

```bash
# 1. install (env-only deps, no torch needed for the env itself)
pip install -e .[openenv]
pip install -e .[dev]

# 2. run the test suite
pytest -q                 # 74 tests — full env + roles + reward + training

# 3. spin up the environment locally
uvicorn forgeenv.env.server:app --port 7860

# 4. generate the demo artifacts (plots + repair_library.json + eval JSON)
python scripts/generate_artifacts.py --n_baseline 50 --n_trained 50

# 5. push to HF Spaces
export HF_TOKEN=hf_...
python scripts/deploy_spaces.py --user akhiilll
```

Training (warm-start SFT + GRPO via TRL + Unsloth) lives entirely in
[`notebooks/forgeenv_train.ipynb`](notebooks/forgeenv_train.ipynb) — open
it on Colab with a T4 or A100 and re-run end-to-end.

## Repository layout

```
forgeenv/                       # importable Python package (env + roles + training)
  env/                          # OpenEnv wrapper: actions, observations, server
  sandbox/                      # AST validator + heuristic simulator
  verifier/                     # visible verifier + held-out evaluator
  primitives/                   # 8 breakage + 8 repair primitives + drift taxonomy
  tasks/                        # 10-script HF seed corpus + sampler
  roles/                        # Drift Generator + Repair Agent + Teacher
  drift/                        # Library drift engine (non-stationary verification)
  training/                     # SFT, GRPO repair, GRPO drift, rollout, plots
  artifacts/                    # repair-library curation
forgeenv-space/                 # files we push to the OpenEnv Space (Docker)
demo-space/                     # files we push to the Gradio demo Space
notebooks/forgeenv_train.ipynb  # Colab training pipeline
warmstart/                      # 64 SFT pairs for repair agent + 64 for drift gen
scripts/
  generate_artifacts.py         # plots + eval_results.json + repair_library.json
  deploy_spaces.py              # one-shot push to HF Spaces
artifacts/                      # generated plots + curated repair library
tests/                          # 74 pytest tests
```

## Anti-cheat / reward-hacking safeguards

Following the Hackathon Self-Serve Guide explicitly:

1. **Multiple independent reward functions** (5 visible + 3 held-out).
2. **Held-out evaluator** the trainer never sees, used only for plots.
3. **Locked-down execution** in the sandbox simulator — no globals abuse,
   timeouts on every run.
4. **AST validator** rejects forbidden constructs (network calls, `os.system`,
   etc.) before reward is computed.
5. **Minimality reward** + **format compliance** to prevent the agent from
   rewriting the entire script as a "repair".
6. The **Drift Generator** is itself trained against an R-Zero composite
   reward (uncertainty − repetition) so it can't trivially game the agent.

## References

- Huang et al., *R-Zero: Self-Evolving Reasoning LLM From Zero Data* (2025)
- Zhao et al., *Absolute Zero: Reinforced Self-play Reasoning with Zero Data* (2025)
- Liu et al., *SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning…* (2025)
- Ibrahim et al., [arXiv:2408.10215](https://arxiv.org/abs/2408.10215) — Reward engineering & shaping
- Masud et al., [arXiv:2601.19100](https://arxiv.org/abs/2601.19100) — Reward engineering for RL in software tasks
- OpenEnv Hackathon Self-Serve Guide (2026)

## License

Apache-2.0