metadata
title: CERNenv Trainer
emoji: ⚛️
colorFrom: indigo
colorTo: pink
sdk: docker
suggested_hardware: a100-large
suggested_storage: medium
pinned: false
license: bsd-3-clause
short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
CERNenv Trainer (Hugging Face Space, A100)
Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the CERNenv OpenEnv environment using GRPO (Group-Relative Policy Optimization), Unsloth, and LoRA (Low-Rank Adaptation).
Hardware
- Recommended: A100 large (80 GB)
- Minimum: T4 / L4 (will use a smaller model + fewer episodes)
Required Space secrets
| Secret | Purpose |
|---|---|
HF_TOKEN |
Hugging Face token with write access for model push |
HF_USERNAME |
Hub username, used as the default model-repo owner |
Optional environment variables
| Variable | Default | Notes |
|---|---|---|
MODEL_NAME |
unsloth/Qwen2.5-3B-Instruct |
Any chat model Unsloth supports |
TOTAL_EPISODES |
400 |
Prompts × generations rollouts |
DIFFICULTY |
easy |
easy / medium / hard |
MAX_STEPS |
18 |
Steps per episode |
NUM_GENERATIONS |
4 |
GRPO group size |
OUTPUT_DIR |
runs/unsloth-grpo |
LoRA adapter output |
PUSH_REPO |
${HF_USERNAME}/cernenv-grpo-qwen2.5-3b |
Hub repo for adapters |
AUTOSTART |
0 |
Set to 1 to start training on Space boot |
How to use
This Space exposes a tiny FastAPI control panel:
GET /— status + current run infoPOST /train— start / restart a training runGET /logs— live tail oftraining.logGET /metrics— reward + success-rate snapshots
Click "Start training" in the UI, or set AUTOSTART=1 in the Space variables to kick off immediately on boot.
When training finishes, the LoRA adapters are pushed to PUSH_REPO.
Local equivalent
The same training run is reproducible locally with:
PYTHONPATH=. python -m training.training_unsloth \
--model_name unsloth/Qwen2.5-3B-Instruct \
--difficulty easy --total_episodes 400 --max_steps 18 \
--output_dir runs/unsloth-grpo