Spaces:
Sleeping
Sleeping
File size: 3,993 Bytes
5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 1db8346 5f78183 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | ---
title: CERNenv Trainer
emoji: ⚛️
colorFrom: indigo
colorTo: pink
sdk: docker
suggested_hardware: a100x4
suggested_storage: medium
pinned: false
license: bsd-3-clause
short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
---
# CERNenv Trainer (Hugging Face Space, A100)
Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as
an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv
environment using **GRPO** (Group-Relative Policy Optimization),
**Unsloth**, and **LoRA** (Low-Rank Adaptation).
## Hardware
- Recommended: **4× A100 (`a100x4`, 320 GB VRAM, ~$10/hr)**
- Single GPU also supported: `a100-large` (slower, fewer episodes recommended)
- Minimum: T4 / L4 (use the Colab notebook fallback)
## Required Space secrets
| Secret | Purpose |
| --- | --- |
| `HF_TOKEN` | Hugging Face token with `write` access for model push |
| `HF_USERNAME` | Hub username, used as the default model-repo owner |
## Optional environment variables
| Variable | Default | Notes |
| --- | --- | --- |
| `MODEL_NAME` | `unsloth/Qwen2.5-3B-Instruct` | Any chat model Unsloth supports |
| `TOTAL_EPISODES` | `1500` | Prompts × generations rollouts |
| `DIFFICULTY` | `easy` | `easy` / `medium` / `hard` |
| `MAX_STEPS` | `18` | Max steps per episode |
| `NUM_GENERATIONS` | `8` | GRPO group size (bigger = better signal) |
| `NUM_GPUS` | auto-detected | `accelerate launch --num_processes` value |
| `CHECKPOINT_EVAL_STEPS` | `25` | Run a held-out eval every N updates |
| `CHECKPOINT_EVAL_EPISODES` | `8` | Episodes per mid-training eval |
| `EVAL_EPISODES` | `32` | Episodes for pre/post eval (statistical power) |
| `OUTPUT_DIR` | `runs/unsloth-grpo` | LoRA adapter output |
| `EVIDENCE_DIR` | `evidence` | Where curves, CSVs, plots are written |
| `PUSH_REPO` | `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` | Hub repo for adapters + evidence |
| `AUTOSTART` | `0` | Set to `1` to start training on Space boot |
## How to use
This Space exposes a tiny FastAPI control panel:
- `GET /` — status + run info + **live training-progress evidence** (curves, before/after metrics, plots)
- `POST /train` — start / restart a training run
- `GET /logs?tail=N` — live tail of `training.log`
- `GET /metrics` — pre / post / Δ metrics JSON
- `GET /evidence` — list of evidence artifacts on disk
- `GET /evidence/{name}` — download an artifact (`training_curve.png`, `training_log.csv`, etc.)
### Training-progress evidence saved (and pushed to Hub)
- `training_log.csv` — per-step reward, loss, KL, lr, grad-norm
- `training_curve.png` — reward + loss vs step
- `checkpoint_evals.csv` — held-out eval every `CHECKPOINT_EVAL_STEPS` updates
- `checkpoint_progression.png` — mean reward + success/mass/channel accuracy vs step
- `pre_eval.jsonl` / `post_eval.jsonl` — full per-episode rollouts before vs after
- `before_after_summary.png` — pre/post bar chart with Δ annotations
- `reward_distribution.png` — pre vs post reward histogram
- `before_after_metrics.json` — machine-readable metrics + deltas
- `sample_trajectories.md` — cherry-picked pre vs post agent traces
Click **"Start training"** in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot.
When training finishes, the LoRA adapters are pushed to `PUSH_REPO`.
## Local equivalent
The same training run is reproducible locally with:
```bash
# single GPU
PYTHONPATH=. python -m training.training_unsloth \
--model_name unsloth/Qwen2.5-3B-Instruct \
--difficulty easy --total_episodes 1500 --max_steps 18 \
--num_generations 8 --output_dir runs/unsloth-grpo \
--evidence_dir evidence
# multi-GPU (e.g. 4× A100)
PYTHONPATH=. accelerate launch --num_processes 4 --mixed_precision bf16 \
-m training.training_unsloth \
--total_episodes 1500 --num_generations 8 \
--output_dir runs/unsloth-grpo --evidence_dir evidence
```
|