Spaces:

anugrah55
/

cernenv-trainer

Sleeping

App Files Files Community

cernenv-trainer / space /training /README.md

anugrah55

Update CERNenv Space

1db8346 verified 13 days ago

preview code

raw

history blame contribute delete

3.99 kB

	---
	title: CERNenv Trainer
	emoji: ⚛️
	colorFrom: indigo
	colorTo: pink
	sdk: docker
	suggested_hardware: a100x4
	suggested_storage: medium
	pinned: false
	license: bsd-3-clause
	short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)
	---

	# CERNenv Trainer (Hugging Face Space, A100)

	Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as
	an LHC (Large Hadron Collider) physicist inside the CERNenv OpenEnv
	environment using GRPO (Group-Relative Policy Optimization),
	Unsloth, and LoRA (Low-Rank Adaptation).

	## Hardware
	- Recommended: 4× A100 (`a100x4`, 320 GB VRAM, ~$10/hr)
	- Single GPU also supported: `a100-large` (slower, fewer episodes recommended)
	- Minimum: T4 / L4 (use the Colab notebook fallback)

	## Required Space secrets
	\| Secret \| Purpose \|
	\| --- \| --- \|
	\| `HF_TOKEN` \| Hugging Face token with `write` access for model push \|
	\| `HF_USERNAME` \| Hub username, used as the default model-repo owner \|

	## Optional environment variables
	\| Variable \| Default \| Notes \|
	\| --- \| --- \| --- \|
	\| `MODEL_NAME` \| `unsloth/Qwen2.5-3B-Instruct` \| Any chat model Unsloth supports \|
	\| `TOTAL_EPISODES` \| `1500` \| Prompts × generations rollouts \|
	\| `DIFFICULTY` \| `easy` \| `easy` / `medium` / `hard` \|
	\| `MAX_STEPS` \| `18` \| Max steps per episode \|
	\| `NUM_GENERATIONS` \| `8` \| GRPO group size (bigger = better signal) \|
	\| `NUM_GPUS` \| auto-detected \| `accelerate launch --num_processes` value \|
	\| `CHECKPOINT_EVAL_STEPS` \| `25` \| Run a held-out eval every N updates \|
	\| `CHECKPOINT_EVAL_EPISODES` \| `8` \| Episodes per mid-training eval \|
	\| `EVAL_EPISODES` \| `32` \| Episodes for pre/post eval (statistical power) \|
	\| `OUTPUT_DIR` \| `runs/unsloth-grpo` \| LoRA adapter output \|
	\| `EVIDENCE_DIR` \| `evidence` \| Where curves, CSVs, plots are written \|
	\| `PUSH_REPO` \| `${HF_USERNAME}/cernenv-grpo-qwen2.5-3b` \| Hub repo for adapters + evidence \|
	\| `AUTOSTART` \| `0` \| Set to `1` to start training on Space boot \|

	## How to use

	This Space exposes a tiny FastAPI control panel:
	- `GET /` — status + run info + live training-progress evidence (curves, before/after metrics, plots)
	- `POST /train` — start / restart a training run
	- `GET /logs?tail=N` — live tail of `training.log`
	- `GET /metrics` — pre / post / Δ metrics JSON
	- `GET /evidence` — list of evidence artifacts on disk
	- `GET /evidence/{name}` — download an artifact (`training_curve.png`, `training_log.csv`, etc.)

	### Training-progress evidence saved (and pushed to Hub)
	- `training_log.csv` — per-step reward, loss, KL, lr, grad-norm
	- `training_curve.png` — reward + loss vs step
	- `checkpoint_evals.csv` — held-out eval every `CHECKPOINT_EVAL_STEPS` updates
	- `checkpoint_progression.png` — mean reward + success/mass/channel accuracy vs step
	- `pre_eval.jsonl` / `post_eval.jsonl` — full per-episode rollouts before vs after
	- `before_after_summary.png` — pre/post bar chart with Δ annotations
	- `reward_distribution.png` — pre vs post reward histogram
	- `before_after_metrics.json` — machine-readable metrics + deltas
	- `sample_trajectories.md` — cherry-picked pre vs post agent traces

	Click "Start training" in the UI, or set `AUTOSTART=1` in the Space variables to kick off immediately on boot.

	When training finishes, the LoRA adapters are pushed to `PUSH_REPO`.

	## Local equivalent

	The same training run is reproducible locally with:

	```bash
	# single GPU
	PYTHONPATH=. python -m training.training_unsloth \
	--model_name unsloth/Qwen2.5-3B-Instruct \
	--difficulty easy --total_episodes 1500 --max_steps 18 \
	--num_generations 8 --output_dir runs/unsloth-grpo \
	--evidence_dir evidence

	# multi-GPU (e.g. 4× A100)
	PYTHONPATH=. accelerate launch --num_processes 4 --mixed_precision bf16 \
	-m training.training_unsloth \
	--total_episodes 1500 --num_generations 8 \
	--output_dir runs/unsloth-grpo --evidence_dir evidence
	```