| ---
|
| title: CERNenv Trainer
|
| emoji: ⚛️
|
| colorFrom: indigo
|
| colorTo: pink
|
| sdk: docker
|
| suggested_hardware: a100-large
|
| suggested_storage: medium
|
| pinned: false
|
| license: bsd-3-clause
|
| short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
|
| ---
|
|
|
| # CERNenv Trainer (Hugging Face Space, A100)
|
|
|
| Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as
|
| an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv
|
| environment using **GRPO** (Group-Relative Policy Optimization),
|
| **Unsloth**, and **LoRA** (Low-Rank Adaptation).
|
|
|
| ## Hardware
|
| - Recommended: **A100 large (80 GB)**
|
| - Minimum: T4 / L4 (will use a smaller model + fewer episodes)
|
|
|
| ## Required Space secrets
|
| | Secret | Purpose |
|
| | --- | --- |
|
| | `HF_TOKEN` | Hugging Face token with `write` access for model push |
|
| | `HF_USERNAME` | Hub username, used as the default model-repo owner |
|
|
|
| ## Optional environment variables
|
| | Variable | Default | Notes |
|
| | --- | --- | --- |
|
| | `MODEL_NAME` | `unsloth/Qwen2.5-3B-Instruct` | Any chat model Unsloth supports |
|
| | `TOTAL_EPISODES` | `400` | Prompts × generations rollouts |
|
| | `DIFFICULTY` | `easy` | `easy` / `medium` / `hard` |
|
| | `MAX_STEPS` | `18` | Steps per episode |
|
| | `NUM_GENERATIONS` | `4` | GRPO group size |
|
| | `OUTPUT_DIR` | `runs/unsloth-grpo` | LoRA adapter output |
|
| | `PUSH_REPO` | `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` | Hub repo for adapters |
|
| | `AUTOSTART` | `0` | Set to `1` to start training on Space boot |
|
|
|
| ## How to use
|
|
|
| This Space exposes a tiny FastAPI control panel:
|
| - `GET /` — status + current run info
|
| - `POST /train` — start / restart a training run
|
| - `GET /logs` — live tail of `training.log`
|
| - `GET /metrics` — reward + success-rate snapshots
|
|
|
| Click **"Start training"** in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot.
|
|
|
| When training finishes, the LoRA adapters are pushed to `PUSH_REPO`.
|
|
|
| ## Local equivalent
|
|
|
| The same training run is reproducible locally with:
|
|
|
| ```bash
|
| PYTHONPATH=. python -m training.training_unsloth \
|
| --model_name unsloth/Qwen2.5-3B-Instruct \
|
| --difficulty easy --total_episodes 400 --max_steps 18 \
|
| --output_dir runs/unsloth-grpo
|
| ```
|
|
|