Spaces:

anugrah55
/

cernenv

Sleeping

App Files Files Community

cernenv / space /training /README.md

anugrah55

Update CERNenv Space

2b0bffa verified 13 days ago

preview code

raw

history blame contribute delete

2.24 kB

	---
	title: CERNenv Trainer
	emoji: ⚛️
	colorFrom: indigo
	colorTo: pink
	sdk: docker
	suggested_hardware: a100-large
	suggested_storage: medium
	pinned: false
	license: bsd-3-clause
	short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
	---

	# CERNenv Trainer (Hugging Face Space, A100)

	Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as
	an LHC (Large Hadron Collider) physicist inside the CERNenv OpenEnv
	environment using GRPO (Group-Relative Policy Optimization),
	Unsloth, and LoRA (Low-Rank Adaptation).

	## Hardware
	- Recommended: A100 large (80 GB)
	- Minimum: T4 / L4 (will use a smaller model + fewer episodes)

	## Required Space secrets
	\| Secret \| Purpose \|
	\| --- \| --- \|
	\| `HF_TOKEN` \| Hugging Face token with `write` access for model push \|
	\| `HF_USERNAME` \| Hub username, used as the default model-repo owner \|

	## Optional environment variables
	\| Variable \| Default \| Notes \|
	\| --- \| --- \| --- \|
	\| `MODEL_NAME` \| `unsloth/Qwen2.5-3B-Instruct` \| Any chat model Unsloth supports \|
	\| `TOTAL_EPISODES` \| `400` \| Prompts × generations rollouts \|
	\| `DIFFICULTY` \| `easy` \| `easy` / `medium` / `hard` \|
	\| `MAX_STEPS` \| `18` \| Steps per episode \|
	\| `NUM_GENERATIONS` \| `4` \| GRPO group size \|
	\| `OUTPUT_DIR` \| `runs/unsloth-grpo` \| LoRA adapter output \|
	\| `PUSH_REPO` \| `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` \| Hub repo for adapters \|
	\| `AUTOSTART` \| `0` \| Set to `1` to start training on Space boot \|

	## How to use

	This Space exposes a tiny FastAPI control panel:
	- `GET /` — status + current run info
	- `POST /train` — start / restart a training run
	- `GET /logs` — live tail of `training.log`
	- `GET /metrics` — reward + success-rate snapshots

	Click "Start training" in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot.

	When training finishes, the LoRA adapters are pushed to `PUSH_REPO`.

	## Local equivalent

	The same training run is reproducible locally with:

	```bash
	PYTHONPATH=. python -m training.training_unsloth \
	--model_name unsloth/Qwen2.5-3B-Instruct \
	--difficulty easy --total_episodes 400 --max_steps 18 \
	--output_dir runs/unsloth-grpo
	```