{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CERNenv — Unsloth + LoRA + GRPO training\n",
    "\n",
    "Trains a small instruction-tuned LLM (Large Language Model) to act as an LHC (Large Hadron Collider) physicist inside the **CERNenv** OpenEnv environment, using **GRPO** (Group-Relative Policy Optimization) with **Unsloth** + **LoRA** (Low-Rank Adaptation).\n",
    "\n",
    "Runs on:\n",
    "- a **Hugging Face Space** with an A100 GPU (recommended)\n",
    "- Google **Colab** (T4 / L4) as a fallback\n",
    "\n",
    "Outputs:\n",
    "- LoRA adapters at `runs/unsloth-grpo`\n",
    "- Reward / success-rate curves at `training/plots/`\n",
    "- Final adapters pushed to your Hugging Face Hub repo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Environment setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "import sys, os\n",
    "IN_COLAB = 'google.colab' in sys.modules\n",
    "IN_HF_SPACE = os.environ.get('SPACE_ID') is not None\n",
    "print('Colab:', IN_COLAB, '| HF Space:', IN_HF_SPACE)\n",
    "\n",
    "if IN_COLAB:\n",
    "    !git clone https://github.com/YOUR_HF_USERNAME/CERNenv.git\n",
    "    %cd CERNenv\n",
    "elif IN_HF_SPACE:\n",
    "    %cd /home/user/app\n",
    "else:\n",
    "    pass\n",
    "\n",
    "!pip install -q -r requirements-unsloth.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, json, subprocess, sys\n",
    "from pathlib import Path\n",
    "import torch\n",
    "print('CUDA:', torch.cuda.is_available(), torch.cuda.get_device_name(0) if torch.cuda.is_available() else None)\n",
    "Path('training/plots').mkdir(parents=True, exist_ok=True)\n",
    "Path('training/runs').mkdir(parents=True, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Hugging Face authentication\n",
    "\n",
    "On a Space, set the `HF_TOKEN` Space-secret. Locally / on Colab, paste a token below. The token must have **write** access to your model repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import login\n",
    "HF_TOKEN = os.environ.get('HF_TOKEN')\n",
    "if HF_TOKEN:\n",
    "    login(HF_TOKEN)\n",
    "    print('logged in via HF_TOKEN env var')\n",
    "else:\n",
    "    from getpass import getpass\n",
    "    login(getpass('Paste HF token: '))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Configure the run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HF_USERNAME = os.environ.get('HF_USERNAME', 'YOUR_HF_USERNAME')\n",
    "MODEL_NAME = os.environ.get('MODEL_NAME', 'unsloth/Qwen2.5-3B-Instruct')\n",
    "TOTAL_EPISODES = int(os.environ.get('TOTAL_EPISODES', '400'))\n",
    "DIFFICULTY = os.environ.get('DIFFICULTY', 'easy')\n",
    "MAX_STEPS = int(os.environ.get('MAX_STEPS', '18'))\n",
    "OUTPUT_DIR = os.environ.get('OUTPUT_DIR', 'training/runs/unsloth-grpo')\n",
    "PUSH_REPO = os.environ.get('PUSH_REPO', f'{HF_USERNAME}/cernenv-grpo-qwen2.5-3b')\n",
    "print({'model': MODEL_NAME, 'episodes': TOTAL_EPISODES, 'difficulty': DIFFICULTY,\n",
    "       'max_steps': MAX_STEPS, 'out': OUTPUT_DIR, 'repo': PUSH_REPO})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Quick sanity check: heuristic vs random baseline\n",
    "\n",
    "Before training, confirm the environment + reward signal are working."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m scripts.run_agent --agent random    --difficulty $DIFFICULTY --episodes 3 --quiet\n",
    "!PYTHONPATH=. python -m scripts.run_agent --agent heuristic --difficulty $DIFFICULTY --episodes 3 --quiet\n",
    "!PYTHONPATH=. python -m scripts.run_agent --agent oracle    --difficulty $DIFFICULTY --episodes 3 --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Pre-training evaluation (zero-shot LLM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m training.evaluate \\\n",
    "  --model_name $MODEL_NAME \\\n",
    "  --difficulty $DIFFICULTY \\\n",
    "  --episodes 16 \\\n",
    "  --max_steps $MAX_STEPS \\\n",
    "  --tag pre_train \\\n",
    "  --out training/runs/eval_pre_train.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Train with Unsloth + LoRA + GRPO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m training.training_unsloth \\\n",
    "  --model_name $MODEL_NAME \\\n",
    "  --difficulty $DIFFICULTY \\\n",
    "  --total_episodes $TOTAL_EPISODES \\\n",
    "  --max_steps $MAX_STEPS \\\n",
    "  --num_generations 4 \\\n",
    "  --output_dir $OUTPUT_DIR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Post-training evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m training.evaluate \\\n",
    "  --model_name $MODEL_NAME \\\n",
    "  --adapter_dir $OUTPUT_DIR \\\n",
    "  --difficulty $DIFFICULTY \\\n",
    "  --episodes 16 \\\n",
    "  --max_steps $MAX_STEPS \\\n",
    "  --tag post_train \\\n",
    "  --out training/runs/eval_post_train.jsonl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Plot before / after"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m training.plots \\\n",
    "  --pre training/runs/eval_pre_train.jsonl \\\n",
    "  --post training/runs/eval_post_train.jsonl \\\n",
    "  --out_dir training/plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Push trained adapters to the Hugging Face Hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PYTHONPATH=. python -m scripts.push_to_hub model \\\n",
    "  --adapter_dir $OUTPUT_DIR \\\n",
    "  --repo_id $PUSH_REPO \\\n",
    "  --base_model $MODEL_NAME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Done. Reward + success-rate plots live in `training/plots/`, model adapters at `OUTPUT_DIR`, and a copy is pushed to `PUSH_REPO`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}