Spaces:
Sleeping
Sleeping
| title: CERNenv Trainer | |
| emoji: ⚛️ | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: docker | |
| suggested_hardware: a100x4 | |
| suggested_storage: medium | |
| pinned: false | |
| license: bsd-3-clause | |
| short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100) | |
| # CERNenv Trainer (Hugging Face Space, A100) | |
| Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as | |
| an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv | |
| environment using **GRPO** (Group-Relative Policy Optimization), | |
| **Unsloth**, and **LoRA** (Low-Rank Adaptation). | |
| ## Hardware | |
| - Recommended: **4× A100 (`a100x4`, 320 GB VRAM, ~$10/hr)** | |
| - Single GPU also supported: `a100-large` (slower, fewer episodes recommended) | |
| - Minimum: T4 / L4 (use the Colab notebook fallback) | |
| ## Required Space secrets | |
| | Secret | Purpose | | |
| | --- | --- | | |
| | `HF_TOKEN` | Hugging Face token with `write` access for model push | | |
| | `HF_USERNAME` | Hub username, used as the default model-repo owner | | |
| ## Optional environment variables | |
| | Variable | Default | Notes | | |
| | --- | --- | --- | | |
| | `MODEL_NAME` | `unsloth/Qwen2.5-3B-Instruct` | Any chat model Unsloth supports | | |
| | `TOTAL_EPISODES` | `1500` | Prompts × generations rollouts | | |
| | `DIFFICULTY` | `easy` | `easy` / `medium` / `hard` | | |
| | `MAX_STEPS` | `18` | Max steps per episode | | |
| | `NUM_GENERATIONS` | `8` | GRPO group size (bigger = better signal) | | |
| | `NUM_GPUS` | auto-detected | `accelerate launch --num_processes` value | | |
| | `CHECKPOINT_EVAL_STEPS` | `25` | Run a held-out eval every N updates | | |
| | `CHECKPOINT_EVAL_EPISODES` | `8` | Episodes per mid-training eval | | |
| | `EVAL_EPISODES` | `32` | Episodes for pre/post eval (statistical power) | | |
| | `OUTPUT_DIR` | `runs/unsloth-grpo` | LoRA adapter output | | |
| | `EVIDENCE_DIR` | `evidence` | Where curves, CSVs, plots are written | | |
| | `PUSH_REPO` | `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` | Hub repo for adapters + evidence | | |
| | `AUTOSTART` | `0` | Set to `1` to start training on Space boot | | |
| ## How to use | |
| This Space exposes a tiny FastAPI control panel: | |
| - `GET /` — status + run info + **live training-progress evidence** (curves, before/after metrics, plots) | |
| - `POST /train` — start / restart a training run | |
| - `GET /logs?tail=N` — live tail of `training.log` | |
| - `GET /metrics` — pre / post / Δ metrics JSON | |
| - `GET /evidence` — list of evidence artifacts on disk | |
| - `GET /evidence/{name}` — download an artifact (`training_curve.png`, `training_log.csv`, etc.) | |
| ### Training-progress evidence saved (and pushed to Hub) | |
| - `training_log.csv` — per-step reward, loss, KL, lr, grad-norm | |
| - `training_curve.png` — reward + loss vs step | |
| - `checkpoint_evals.csv` — held-out eval every `CHECKPOINT_EVAL_STEPS` updates | |
| - `checkpoint_progression.png` — mean reward + success/mass/channel accuracy vs step | |
| - `pre_eval.jsonl` / `post_eval.jsonl` — full per-episode rollouts before vs after | |
| - `before_after_summary.png` — pre/post bar chart with Δ annotations | |
| - `reward_distribution.png` — pre vs post reward histogram | |
| - `before_after_metrics.json` — machine-readable metrics + deltas | |
| - `sample_trajectories.md` — cherry-picked pre vs post agent traces | |
| Click **"Start training"** in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot. | |
| When training finishes, the LoRA adapters are pushed to `PUSH_REPO`. | |
| ## Local equivalent | |
| The same training run is reproducible locally with: | |
| ```bash | |
| # single GPU | |
| PYTHONPATH=. python -m training.training_unsloth \ | |
| --model_name unsloth/Qwen2.5-3B-Instruct \ | |
| --difficulty easy --total_episodes 1500 --max_steps 18 \ | |
| --num_generations 8 --output_dir runs/unsloth-grpo \ | |
| --evidence_dir evidence | |
| # multi-GPU (e.g. 4× A100) | |
| PYTHONPATH=. accelerate launch --num_processes 4 --mixed_precision bf16 \ | |
| -m training.training_unsloth \ | |
| --total_episodes 1500 --num_generations 8 \ | |
| --output_dir runs/unsloth-grpo --evidence_dir evidence | |
| ``` | |