--- title: CERNenv Trainer emoji: ⚛️ colorFrom: indigo colorTo: pink sdk: docker suggested_hardware: a100-large suggested_storage: medium pinned: false license: bsd-3-clause short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100) --- # CERNenv Trainer (Hugging Face Space, A100) Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv environment using **GRPO** (Group-Relative Policy Optimization), **Unsloth**, and **LoRA** (Low-Rank Adaptation). ## Hardware - Recommended: **A100 large (80 GB)** - Minimum: T4 / L4 (will use a smaller model + fewer episodes) ## Required Space secrets | Secret | Purpose | | --- | --- | | `HF_TOKEN` | Hugging Face token with `write` access for model push | | `HF_USERNAME` | Hub username, used as the default model-repo owner | ## Optional environment variables | Variable | Default | Notes | | --- | --- | --- | | `MODEL_NAME` | `unsloth/Qwen2.5-3B-Instruct` | Any chat model Unsloth supports | | `TOTAL_EPISODES` | `400` | Prompts × generations rollouts | | `DIFFICULTY` | `easy` | `easy` / `medium` / `hard` | | `MAX_STEPS` | `18` | Steps per episode | | `NUM_GENERATIONS` | `4` | GRPO group size | | `OUTPUT_DIR` | `runs/unsloth-grpo` | LoRA adapter output | | `PUSH_REPO` | `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` | Hub repo for adapters | | `AUTOSTART` | `0` | Set to `1` to start training on Space boot | ## How to use This Space exposes a tiny FastAPI control panel: - `GET /` — status + current run info - `POST /train` — start / restart a training run - `GET /logs` — live tail of `training.log` - `GET /metrics` — reward + success-rate snapshots Click **"Start training"** in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot. When training finishes, the LoRA adapters are pushed to `PUSH_REPO`. ## Local equivalent The same training run is reproducible locally with: ```bash PYTHONPATH=. python -m training.training_unsloth \ --model_name unsloth/Qwen2.5-3B-Instruct \ --difficulty easy --total_episodes 400 --max_steps 18 \ --output_dir runs/unsloth-grpo ```