{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CERNenv — Unsloth + LoRA + GRPO training\n", "\n", "Trains a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv environment, using **GRPO** (Group-Relative Policy Optimization) with **Unsloth** + **LoRA** (Low-Rank Adaptation).\n", "\n", "Runs on:\n", "- a **Hugging Face Space** with an A100 GPU (recommended)\n", "- Google **Colab** (T4 / L4) as a fallback\n", "\n", "Outputs:\n", "- LoRA adapters at `runs/unsloth-grpo`\n", "- Reward / success-rate curves at `training/plots/`\n", "- Final adapters pushed to your Hugging Face Hub repo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Environment setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "import sys, os\n", "IN_COLAB = 'google.colab' in sys.modules\n", "IN_HF_SPACE = os.environ.get('SPACE_ID') is not None\n", "print('Colab:', IN_COLAB, '| HF Space:', IN_HF_SPACE)\n", "\n", "if IN_COLAB:\n", " !git clone https://github.com/YOUR_HF_USERNAME/CERNenv.git\n", " %cd CERNenv\n", "elif IN_HF_SPACE:\n", " %cd /home/user/app\n", "else:\n", " pass\n", "\n", "!pip install -q -r requirements-unsloth.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os, json, subprocess, sys\n", "from pathlib import Path\n", "import torch\n", "print('CUDA:', torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)\n", "Path('training/plots').mkdir(parents=True, exist_ok=True)\n", "Path('training/runs').mkdir(parents=True, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Hugging Face authentication\n", "\n", "On a Space, set the `HF_TOKEN` Space-secret. Locally / on Colab, paste a token below. The token must have **write** access to your model repo." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import login\n", "HF_TOKEN = os.environ.get('HF_TOKEN')\n", "if HF_TOKEN:\n", " login(HF_TOKEN)\n", " print('logged in via HF_TOKEN env var')\n", "else:\n", " from getpass import getpass\n", " login(getpass('Paste HF token: '))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Configure the run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HF_USERNAME = os.environ.get('HF_USERNAME', 'YOUR_HF_USERNAME')\n", "MODEL_NAME = os.environ.get('MODEL_NAME', 'unsloth/Qwen2.5-3B-Instruct')\n", "TOTAL_EPISODES = int(os.environ.get('TOTAL_EPISODES', '400'))\n", "DIFFICULTY = os.environ.get('DIFFICULTY', 'easy')\n", "MAX_STEPS = int(os.environ.get('MAX_STEPS', '18'))\n", "OUTPUT_DIR = os.environ.get('OUTPUT_DIR', 'training/runs/unsloth-grpo')\n", "PUSH_REPO = os.environ.get('PUSH_REPO', f'{HF_USERNAME}/cernenv-grpo-qwen2.5-3b')\n", "print({'model': MODEL_NAME, 'episodes': TOTAL_EPISODES, 'difficulty': DIFFICULTY,\n", " 'max_steps': MAX_STEPS, 'out': OUTPUT_DIR, 'repo': PUSH_REPO})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Quick sanity check: heuristic vs random baseline\n", "\n", "Before training, confirm the environment + reward signal are working." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m scripts.run_agent --agent random --difficulty $DIFFICULTY --episodes 3 --quiet\n", "!PYTHONPATH=. python -m scripts.run_agent --agent heuristic --difficulty $DIFFICULTY --episodes 3 --quiet\n", "!PYTHONPATH=. python -m scripts.run_agent --agent oracle --difficulty $DIFFICULTY --episodes 3 --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Pre-training evaluation (zero-shot LLM)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m training.evaluate \\\n", " --model_name $MODEL_NAME \\\n", " --difficulty $DIFFICULTY \\\n", " --episodes 16 \\\n", " --max_steps $MAX_STEPS \\\n", " --tag pre_train \\\n", " --out training/runs/eval_pre_train.jsonl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Train with Unsloth + LoRA + GRPO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m training.training_unsloth \\\n", " --model_name $MODEL_NAME \\\n", " --difficulty $DIFFICULTY \\\n", " --total_episodes $TOTAL_EPISODES \\\n", " --max_steps $MAX_STEPS \\\n", " --num_generations 4 \\\n", " --output_dir $OUTPUT_DIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Post-training evaluation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m training.evaluate \\\n", " --model_name $MODEL_NAME \\\n", " --adapter_dir $OUTPUT_DIR \\\n", " --difficulty $DIFFICULTY \\\n", " --episodes 16 \\\n", " --max_steps $MAX_STEPS \\\n", " --tag post_train \\\n", " --out training/runs/eval_post_train.jsonl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Plot before / after" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m training.plots \\\n", " --pre training/runs/eval_pre_train.jsonl \\\n", " --post training/runs/eval_post_train.jsonl \\\n", " --out_dir training/plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Push trained adapters to the Hugging Face Hub" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!PYTHONPATH=. python -m scripts.push_to_hub model \\\n", " --adapter_dir $OUTPUT_DIR \\\n", " --repo_id $PUSH_REPO \\\n", " --base_model $MODEL_NAME" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Done. Reward + success-rate plots live in `training/plots/`, model adapters at `OUTPUT_DIR`, and a copy is pushed to `PUSH_REPO`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11" } }, "nbformat": 4, "nbformat_minor": 5 }