File size: 2,243 Bytes
2b0bffa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---

title: CERNenv Trainer
emoji: ⚛️
colorFrom: indigo
colorTo: pink
sdk: docker
suggested_hardware: a100-large
suggested_storage: medium
pinned: false
license: bsd-3-clause
short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
---


# CERNenv Trainer (Hugging Face Space, A100)

Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as
an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv
environment using **GRPO** (Group-Relative Policy Optimization),
**Unsloth**, and **LoRA** (Low-Rank Adaptation).

## Hardware
- Recommended: **A100 large (80 GB)**
- Minimum: T4 / L4 (will use a smaller model + fewer episodes)

## Required Space secrets
| Secret | Purpose |
| --- | --- |
| `HF_TOKEN` | Hugging Face token with `write` access for model push |
| `HF_USERNAME` | Hub username, used as the default model-repo owner |

## Optional environment variables
| Variable | Default | Notes |
| --- | --- | --- |
| `MODEL_NAME` | `unsloth/Qwen2.5-3B-Instruct` | Any chat model Unsloth supports |
| `TOTAL_EPISODES` | `400` | Prompts × generations rollouts |
| `DIFFICULTY` | `easy` | `easy` / `medium` / `hard` |
| `MAX_STEPS` | `18` | Steps per episode |
| `NUM_GENERATIONS` | `4` | GRPO group size |
| `OUTPUT_DIR` | `runs/unsloth-grpo` | LoRA adapter output |
| `PUSH_REPO` | `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` | Hub repo for adapters |
| `AUTOSTART` | `0` | Set to `1` to start training on Space boot |

## How to use

This Space exposes a tiny FastAPI control panel:
- `GET  /` — status + current run info
- `POST /train` — start / restart a training run
- `GET  /logs` — live tail of `training.log`
- `GET  /metrics` — reward + success-rate snapshots

Click **"Start training"** in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot.

When training finishes, the LoRA adapters are pushed to `PUSH_REPO`.

## Local equivalent

The same training run is reproducible locally with:

```bash

PYTHONPATH=. python -m training.training_unsloth \

  --model_name unsloth/Qwen2.5-3B-Instruct \

  --difficulty easy --total_episodes 400 --max_steps 18 \

  --output_dir runs/unsloth-grpo

```