cernenv / space /training /README.md
anugrah55's picture
Update CERNenv Space
2b0bffa verified
metadata
title: CERNenv Trainer
emoji: ⚛️
colorFrom: indigo
colorTo: pink
sdk: docker
suggested_hardware: a100-large
suggested_storage: medium
pinned: false
license: bsd-3-clause
short_description: GRPO trainer for CERNenv (Unsloth + LoRA, A100)

CERNenv Trainer (Hugging Face Space, A100)

Fine-tunes a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the CERNenv OpenEnv environment using GRPO (Group-Relative Policy Optimization), Unsloth, and LoRA (Low-Rank Adaptation).

Hardware

  • Recommended: A100 large (80 GB)
  • Minimum: T4 / L4 (will use a smaller model + fewer episodes)

Required Space secrets

Secret Purpose
HF_TOKEN Hugging Face token with write access for model push
HF_USERNAME Hub username, used as the default model-repo owner

Optional environment variables

Variable Default Notes
MODEL_NAME unsloth/Qwen2.5-3B-Instruct Any chat model Unsloth supports
TOTAL_EPISODES 400 Prompts × generations rollouts
DIFFICULTY easy easy / medium / hard
MAX_STEPS 18 Steps per episode
NUM_GENERATIONS 4 GRPO group size
OUTPUT_DIR runs/unsloth-grpo LoRA adapter output
PUSH_REPO ${HF_USERNAME}/cernenv-grpo-qwen2.5-3b Hub repo for adapters
AUTOSTART 0 Set to 1 to start training on Space boot

How to use

This Space exposes a tiny FastAPI control panel:

  • GET / — status + current run info
  • POST /train — start / restart a training run
  • GET /logs — live tail of training.log
  • GET /metrics — reward + success-rate snapshots

Click "Start training" in the UI, or set AUTOSTART=1 in the Space variables to kick off immediately on boot.

When training finishes, the LoRA adapters are pushed to PUSH_REPO.

Local equivalent

The same training run is reproducible locally with:

PYTHONPATH=. python -m training.training_unsloth \
  --model_name unsloth/Qwen2.5-3B-Instruct \
  --difficulty easy --total_episodes 400 --max_steps 18 \
  --output_dir runs/unsloth-grpo