Spaces:
Sleeping
Sleeping
metadata
title: CERNenv Trainer
emoji: ⚛️
colorFrom: indigo
colorTo: pink
sdk: docker
suggested_hardware: a100x4
suggested_storage: medium
pinned: false
license: bsd-3-clause
short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
CERNenv Trainer (Hugging Face Space, A100)
Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the CERNenv OpenEnv environment using GRPO (Group-Relative Policy Optimization), Unsloth, and LoRA (Low-Rank Adaptation).
Hardware
- Recommended: 4× A100 (
a100x4, 320 GB VRAM, ~$10/hr) - Single GPU also supported:
a100-large(slower, fewer episodes recommended) - Minimum: T4 / L4 (use the Colab notebook fallback)
Required Space secrets
| Secret | Purpose |
|---|---|
HF_TOKEN |
Hugging Face token with write access for model push |
HF_USERNAME |
Hub username, used as the default model-repo owner |
Optional environment variables
| Variable | Default | Notes |
|---|---|---|
MODEL_NAME |
unsloth/Qwen2.5-3B-Instruct |
Any chat model Unsloth supports |
TOTAL_EPISODES |
1500 |
Prompts × generations rollouts |
DIFFICULTY |
easy |
easy / medium / hard |
MAX_STEPS |
18 |
Max steps per episode |
NUM_GENERATIONS |
8 |
GRPO group size (bigger = better signal) |
NUM_GPUS |
auto-detected | accelerate launch --num_processes value |
CHECKPOINT_EVAL_STEPS |
25 |
Run a held-out eval every N updates |
CHECKPOINT_EVAL_EPISODES |
8 |
Episodes per mid-training eval |
EVAL_EPISODES |
32 |
Episodes for pre/post eval (statistical power) |
OUTPUT_DIR |
runs/unsloth-grpo |
LoRA adapter output |
EVIDENCE_DIR |
evidence |
Where curves, CSVs, plots are written |
PUSH_REPO |
${HF_USERNAME}/cernenv-grpo-qwen2.5-3b |
Hub repo for adapters + evidence |
AUTOSTART |
0 |
Set to 1 to start training on Space boot |
How to use
This Space exposes a tiny FastAPI control panel:
GET /— status + run info + live training-progress evidence (curves, before/after metrics, plots)POST /train— start / restart a training runGET /logs?tail=N— live tail oftraining.logGET /metrics— pre / post / Δ metrics JSONGET /evidence— list of evidence artifacts on diskGET /evidence/{name}— download an artifact (training_curve.png,training_log.csv, etc.)
Training-progress evidence saved (and pushed to Hub)
training_log.csv— per-step reward, loss, KL, lr, grad-normtraining_curve.png— reward + loss vs stepcheckpoint_evals.csv— held-out eval everyCHECKPOINT_EVAL_STEPSupdatescheckpoint_progression.png— mean reward + success/mass/channel accuracy vs steppre_eval.jsonl/post_eval.jsonl— full per-episode rollouts before vs afterbefore_after_summary.png— pre/post bar chart with Δ annotationsreward_distribution.png— pre vs post reward histogrambefore_after_metrics.json— machine-readable metrics + deltassample_trajectories.md— cherry-picked pre vs post agent traces
Click "Start training" in the UI, or set AUTOSTART=1 in the Space variables to kick off immediately on boot.
When training finishes, the LoRA adapters are pushed to PUSH_REPO.
Local equivalent
The same training run is reproducible locally with:
# single GPU
PYTHONPATH=. python -m training.training_unsloth \
--model_name unsloth/Qwen2.5-3B-Instruct \
--difficulty easy --total_episodes 1500 --max_steps 18 \
--num_generations 8 --output_dir runs/unsloth-grpo \
--evidence_dir evidence
# multi-GPU (e.g. 4× A100)
PYTHONPATH=. accelerate launch --num_processes 4 --mixed_precision bf16 \
-m training.training_unsloth \
--total_episodes 1500 --num_generations 8 \
--output_dir runs/unsloth-grpo --evidence_dir evidence