TensorCat
/

TensorTalk

Model card Files Files and versions

xet

Community

TensorCat commited on 21 days ago

Commit

1c98f0e

verified ·

1 Parent(s): 2244cf2

Delete UM_Handbook/(Demo Pilot)FineTune_QWEN3_UM_Handbook_en.ipynb

Browse files

Files changed (1) hide show

UM_Handbook/(Demo Pilot)FineTune_QWEN3_UM_Handbook_en.ipynb +0 -1531

UM_Handbook/(Demo Pilot)FineTune_QWEN3_UM_Handbook_en.ipynb DELETED Viewed

@@ -1,1531 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "ac09de66",
-   "metadata": {},
-   "source": [
-    "# Qwen3-8B UM Handbook LoRA / QLoRA Fine-tuning\n",
-    "\n",
-    "This notebook keeps the training logic from `finetune_qwen3_um_handbook_v3.py` and organizes it into separate notebook sections for DICC.\n",
-    "\n",
-    "## Workflow\n",
-    "1. Check the environment and available devices\n",
-    "2. Read and validate `SFT_QA_Training_Ready.jsonl`\n",
-    "3. Convert the QA data into prompt-completion format\n",
-    "4. Split the data into training and validation sets\n",
-    "5. Download the Qwen3-8B base model into a local directory\n",
-    "6. Select the backend automatically: **CUDA > MPS > CPU**\n",
-    "7. Use **4-bit QLoRA** on CUDA and standard LoRA on MPS / CPU\n",
-    "8. Train with `TRL SFTTrainer`\n",
-    "9. Evaluate with both loss-based and generation-based metrics\n",
-    "10. Save the LoRA adapter, merged model, `.pt` file, metrics, and predictions\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "71f10012",
-   "metadata": {},
-   "source": [
-    "## Part 1 - Install dependencies\n",
-    "\n",
-    "Run this cell only if the current DICC kernel does not already have the required packages.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d091473c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# %pip install -U torch transformers accelerate datasets trl peft bitsandbytes sentencepiece evaluate rouge_score bert_score sacrebleu huggingface_hub"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f44d18ff",
-   "metadata": {},
-   "source": [
-    "## Part 2 - Import libraries\n",
-    "\n",
-    "This section imports the libraries used in the training pipeline.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "47a8b3f7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from __future__ import annotations\n",
-    "\n",
-    "import gc\n",
-    "import json\n",
-    "import math\n",
-    "import random\n",
-    "import re\n",
-    "import time\n",
-    "from pathlib import Path\n",
-    "from typing import Dict, List\n",
-    "from pathlib import Path\n",
-    "\n",
-    "import numpy as np\n",
-    "import torch\n",
-    "from datasets import Dataset, DatasetDict\n",
-    "from huggingface_hub import snapshot_download\n",
-    "from peft import LoraConfig, PeftModel\n",
-    "import transformers\n",
-    "from transformers import (\n",
-    "    AutoModelForCausalLM,\n",
-    "    AutoTokenizer,\n",
-    "    BitsAndBytesConfig,\n",
-    "    set_seed,\n",
-    ")\n",
-    "from trl import SFTConfig, SFTTrainer"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "469e1466",
-   "metadata": {},
-   "source": [
-    "## Part 3 - Configuration\n",
-    "\n",
-    "This section defines project paths, model paths, output paths, and training hyperparameters.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "59d68547",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "PROJECT_ROOT       = /scr/user/kevin2002/TensorCat/NLP/UM_Handbook\n",
-      "DATASET_ROOT       = /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/Dataset/SFT_Dataset\n",
-      "DATASET_PATH       = /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/Dataset/SFT_Dataset/SFT_QA_Training_Ready.jsonl\n",
-      "BASE_MODEL_DIR     = /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/models/Qwen3-8B\n",
-      "OUTPUT_ROOT        = /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook\n",
-      "USE_4BIT           = True\n",
-      "MAX_GRAD_NORM     = 1.0\n",
-      "PACKING           = False\n",
-      "GRADIENT_CHECKPOINTING = True\n",
-      "LOW_CPU_MEM_USAGE = True\n",
-      "USE_FLASH_ATTENTION_2_IF_AVAILABLE = False\n"
-     ]
-    }
-   ],
-   "source": [
-    "# ============================================================\n",
-    "# CONFIG\n",
-    "# ============================================================\n",
-    "\n",
-    "from pathlib import Path\n",
-    "\n",
-    "WARMUP_STEPS = 20\n",
-    "\n",
-    "# Project root\n",
-    "PROJECT_ROOT = Path(\"/scr/user/kevin2002/TensorCat/NLP/UM_Handbook\")\n",
-    "\n",
-    "# Dataset paths\n",
-    "DATASET_ROOT = PROJECT_ROOT / \"Dataset\" / \"SFT_Dataset\"\n",
-    "DATASET_PATH = DATASET_ROOT / \"SFT_QA_Training_Ready.jsonl\"\n",
-    "\n",
-    "# Base model selection\n",
-    "BASE_MODEL_NAME = \"Qwen/Qwen3-8B\"\n",
-    "BASE_MODEL_LOCAL_DIR = PROJECT_ROOT / \"models\" / \"Qwen3-8B\"\n",
-    "\n",
-    "# Output paths\n",
-    "OUTPUT_ROOT = PROJECT_ROOT / \"outputs\" / \"qwen3_um_handbook\"\n",
-    "ADAPTER_OUTPUT_DIR = OUTPUT_ROOT / \"lora_adapter\"\n",
-    "MERGED_MODEL_DIR = OUTPUT_ROOT / \"merged_model\"\n",
-    "FINAL_PT_PATH = OUTPUT_ROOT / \"Qwen3-8B-Instruct_UM_Handbook.pt\"\n",
-    "METRICS_JSON_PATH = OUTPUT_ROOT / \"final_metrics.json\"\n",
-    "PREDICTIONS_JSONL_PATH = OUTPUT_ROOT / \"validation_predictions.jsonl\"\n",
-    "TRAIN_VAL_SPLIT_JSON_PATH = OUTPUT_ROOT / \"dataset_split_summary.json\"\n",
-    "\n",
-    "# Data / prompt settings\n",
-    "SYSTEM_PROMPT = (\n",
-    "    \"You are an academic assistant for the Faculty of Computer Science and \"\n",
-    "    \"Information Technology, Universiti Malaya. Answer questions accurately \"\n",
-    "    \"and only using handbook-consistent information. If the handbook does not support \"\n",
-    "    \"a claim, avoid inventing details.\"\n",
-    ")\n",
-    "TRAIN_VAL_RATIO = 0.90\n",
-    "MAX_SEQ_LENGTH = 1024\n",
-    "RANDOM_SEED = 42\n",
-    "\n",
-    "# LoRA / QLoRA settings\n",
-    "USE_4BIT = True\n",
-    "LORA_R = 32\n",
-    "LORA_ALPHA = 64\n",
-    "LORA_DROPOUT = 0.05\n",
-    "LORA_TARGET_MODULES = [\n",
-    "    \"q_proj\",\n",
-    "    \"k_proj\",\n",
-    "    \"v_proj\",\n",
-    "    \"o_proj\",\n",
-    "    \"gate_proj\",\n",
-    "    \"up_proj\",\n",
-    "    \"down_proj\",\n",
-    "]\n",
-    "\n",
-    "# Training settings\n",
-    "NUM_TRAIN_EPOCHS = 6\n",
-    "PER_DEVICE_TRAIN_BATCH_SIZE = 2\n",
-    "PER_DEVICE_EVAL_BATCH_SIZE = 2\n",
-    "GRADIENT_ACCUMULATION_STEPS = 8\n",
-    "LEARNING_RATE = 2e-4\n",
-    "WEIGHT_DECAY = 0.01\n",
-    "WARMUP_RATIO = 0.05\n",
-    "LOGGING_STEPS = 10\n",
-    "SAVE_STEPS = 50\n",
-    "EVAL_STEPS = 50\n",
-    "\n",
-    "MAX_GRAD_NORM = 1.0\n",
-    "PACKING = False\n",
-    "GRADIENT_CHECKPOINTING = True\n",
-    "LOW_CPU_MEM_USAGE = True\n",
-    "USE_FLASH_ATTENTION_2_IF_AVAILABLE = False\n",
-    "\n",
-    "# Save / display settings\n",
-    "SAVE_MERGED_MODEL = True\n",
-    "SAVE_TOKENIZER_WITH_MERGED = True\n",
-    "NUM_PRINTED_PREDICTIONS = 5\n",
-    "\n",
-    "\n",
-    "# Generation eval settings\n",
-    "MAX_NEW_TOKENS_EVAL = 192\n",
-    "NUM_EVAL_SAMPLES_FOR_GENERATION = None  # None = use full validation set\n",
-    "DO_SAMPLE_EVAL = False\n",
-    "TEMPERATURE_EVAL = 0.7\n",
-    "TOP_P_EVAL = 0.9\n",
-    "\n",
-    "# Final export settings\n",
-    "SAVE_SINGLE_PT = True\n",
-    "\n",
-    "# Create the main directories early\n",
-    "for path in [PROJECT_ROOT, DATASET_ROOT, BASE_MODEL_LOCAL_DIR.parent, OUTPUT_ROOT]:\n",
-    "    path.mkdir(parents=True, exist_ok=True)\n",
-    "\n",
-    "print(\"PROJECT_ROOT       =\", PROJECT_ROOT)\n",
-    "print(\"DATASET_ROOT       =\", DATASET_ROOT)\n",
-    "print(\"DATASET_PATH       =\", DATASET_PATH)\n",
-    "print(\"BASE_MODEL_DIR     =\", BASE_MODEL_LOCAL_DIR)\n",
-    "print(\"OUTPUT_ROOT        =\", OUTPUT_ROOT)\n",
-    "print(\"USE_4BIT           =\", USE_4BIT)\n",
-    "\n",
-    "\n",
-    "print(\"MAX_GRAD_NORM     =\", MAX_GRAD_NORM)\n",
-    "print(\"PACKING           =\", PACKING)\n",
-    "print(\"GRADIENT_CHECKPOINTING =\", GRADIENT_CHECKPOINTING)\n",
-    "print(\"LOW_CPU_MEM_USAGE =\", LOW_CPU_MEM_USAGE)\n",
-    "print(\"USE_FLASH_ATTENTION_2_IF_AVAILABLE =\", USE_FLASH_ATTENTION_2_IF_AVAILABLE)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9319138",
-   "metadata": {},
-   "source": [
-    "## Part 4 - Helper functions\n",
-    "\n",
-    "This section defines utility functions for paths, logging, text cleanup, device selection, dtype selection, and 4-bit control.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "f4a0e4f4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def ensure_dir(path: Path) -> None:\n",
-    "    path.mkdir(parents=True, exist_ok=True)\n",
-    "\n",
-    "def print_banner(title: str) -> None:\n",
-    "    print(\"\\n\" + \"=\" * 88)\n",
-    "    print(title)\n",
-    "    print(\"=\" * 88)\n",
-    "\n",
-    "def select_runtime_backend() -> str:\n",
-    "    if torch.cuda.is_available():\n",
-    "        return \"cuda\"\n",
-    "    if hasattr(torch.backends, \"mps\") and torch.backends.mps.is_available():\n",
-    "        return \"mps\"\n",
-    "    return \"cpu\"\n",
-    "\n",
-    "def detect_compute_dtype(backend: str) -> torch.dtype:\n",
-    "    if backend == \"cuda\":\n",
-    "        if torch.cuda.is_bf16_supported():\n",
-    "            return torch.bfloat16\n",
-    "        return torch.float16\n",
-    "    if backend == \"mps\":\n",
-    "        return torch.float16\n",
-    "    return torch.float32\n",
-    "\n",
-    "def should_use_4bit(backend: str) -> bool:\n",
-    "    return USE_4BIT and backend == \"cuda\"\n",
-    "\n",
-    "def normalize_text(text: str) -> str:\n",
-    "    text = text.strip()\n",
-    "    text = re.sub(r\"\\s+\", \" \", text)\n",
-    "    return text\n",
-    "\n",
-    "def normalize_for_exact(text: str) -> str:\n",
-    "    text = text.lower().strip()\n",
-    "    text = re.sub(r\"[^\\w\\s]\", \" \", text)\n",
-    "    text = re.sub(r\"\\s+\", \" \", text)\n",
-    "    return text\n",
-    "\n",
-    "def cleanup_memory() -> None:\n",
-    "    gc.collect()\n",
-    "    if torch.cuda.is_available():\n",
-    "        torch.cuda.empty_cache()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e8d9c423",
-   "metadata": {},
-   "source": [
-    "## Part 5 - Device detection and display\n",
-    "\n",
-    "This section prints the available devices, selected backend, selected dtype, and whether 4-bit loading is enabled.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "64398701",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Device Detection\n",
-      "========================================================================================\n",
-      "{\n",
-      "  \"selected_backend\": \"cuda\",\n",
-      "  \"torch_dtype\": \"torch.bfloat16\",\n",
-      "  \"use_4bit_qlora\": true,\n",
-      "  \"cuda_available\": true,\n",
-      "  \"mps_available\": false,\n",
-      "  \"cuda_device_count\": 1,\n",
-      "  \"cuda_device_name\": \"NVIDIA A100-SXM4-80GB\"\n",
-      "}\n"
-     ]
-    }
-   ],
-   "source": [
-    "print_banner(\"Device Detection\")\n",
-    "\n",
-    "RUNTIME_DEVICE_BACKEND = select_runtime_backend()\n",
-    "effective_dtype = detect_compute_dtype(RUNTIME_DEVICE_BACKEND)\n",
-    "effective_use_4bit = should_use_4bit(RUNTIME_DEVICE_BACKEND)\n",
-    "\n",
-    "device_info = {\n",
-    "    \"selected_backend\": RUNTIME_DEVICE_BACKEND,\n",
-    "    \"torch_dtype\": str(effective_dtype),\n",
-    "    \"use_4bit_qlora\": effective_use_4bit,\n",
-    "    \"cuda_available\": torch.cuda.is_available(),\n",
-    "    \"mps_available\": hasattr(torch.backends, \"mps\") and torch.backends.mps.is_available(),\n",
-    "}\n",
-    "\n",
-    "if torch.cuda.is_available():\n",
-    "    device_info[\"cuda_device_count\"] = torch.cuda.device_count()\n",
-    "    try:\n",
-    "        device_info[\"cuda_device_name\"] = torch.cuda.get_device_name(0)\n",
-    "    except Exception:\n",
-    "        device_info[\"cuda_device_name\"] = \"Unavailable\"\n",
-    "\n",
-    "print(json.dumps(device_info, indent=2))\n",
-    "\n",
-    "if RUNTIME_DEVICE_BACKEND != \"cuda\":\n",
-    "    print(\n",
-    "        \"\\\\n[Info] Non-CUDA backend detected. 4-bit bitsandbytes QLoRA is disabled automatically, \"\n",
-    "        \"and the training path falls back to standard LoRA on the selected backend.\"\n",
-    "    )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "69b02751",
-   "metadata": {},
-   "source": [
-    "## Part 6 - Evaluation functions\n",
-    "\n",
-    "This section defines exact-match and token-level F1 scoring functions.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "379c9dd7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def token_f1(prediction: str, reference: str) -> float:\n",
-    "    pred_tokens = normalize_for_exact(prediction).split()\n",
-    "    ref_tokens = normalize_for_exact(reference).split()\n",
-    "\n",
-    "    if not pred_tokens and not ref_tokens:\n",
-    "        return 1.0\n",
-    "    if not pred_tokens or not ref_tokens:\n",
-    "        return 0.0\n",
-    "\n",
-    "    common = {}\n",
-    "    for token in pred_tokens:\n",
-    "        common[token] = common.get(token, 0) + 1\n",
-    "\n",
-    "    overlap = 0\n",
-    "    ref_counts = {}\n",
-    "    for token in ref_tokens:\n",
-    "        ref_counts[token] = ref_counts.get(token, 0) + 1\n",
-    "\n",
-    "    for token, count in common.items():\n",
-    "        if token in ref_counts:\n",
-    "            overlap += min(count, ref_counts[token])\n",
-    "\n",
-    "    if overlap == 0:\n",
-    "        return 0.0\n",
-    "\n",
-    "    precision = overlap / len(pred_tokens)\n",
-    "    recall = overlap / len(ref_tokens)\n",
-    "    return 2 * precision * recall / (precision + recall)\n",
-    "\n",
-    "def exact_match(prediction: str, reference: str) -> float:\n",
-    "    return float(normalize_for_exact(prediction) == normalize_for_exact(reference))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d26e26fd",
-   "metadata": {},
-   "source": [
-    "## Part 7 - Dataset reading and validation functions\n",
-    "\n",
-    "This section reads the JSONL file, checks required fields, and prepares prompt-completion rows.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "56287c34",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def load_jsonl(path: Path) -> List[Dict]:\n",
-    "    rows: List[Dict] = []\n",
-    "    with path.open(\"r\", encoding=\"utf-8\") as f:\n",
-    "        for line_number, line in enumerate(f, 1):\n",
-    "            line = line.strip()\n",
-    "            if not line:\n",
-    "                continue\n",
-    "            obj = json.loads(line)\n",
-    "            required = {\"question\", \"answer\"}\n",
-    "            missing = required - set(obj.keys())\n",
-    "            if missing:\n",
-    "                raise ValueError(f\"Line {line_number} is missing keys: {sorted(missing)}\")\n",
-    "            obj[\"question\"] = normalize_text(obj[\"question\"])\n",
-    "            obj[\"answer\"] = normalize_text(obj[\"answer\"])\n",
-    "            rows.append(obj)\n",
-    "\n",
-    "    if not rows:\n",
-    "        raise ValueError(f\"Dataset at {path} is empty.\")\n",
-    "    return rows\n",
-    "\n",
-    "def build_prompt_completion_rows(rows: List[Dict]) -> List[Dict]:\n",
-    "    converted: List[Dict] = []\n",
-    "    for row in rows:\n",
-    "        converted.append(\n",
-    "            {\n",
-    "                \"qa_id\": row.get(\"qa_id\", \"\"),\n",
-    "                \"index_id\": row.get(\"index_id\", \"\"),\n",
-    "                \"prompt\": [\n",
-    "                    {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
-    "                    {\"role\": \"user\", \"content\": row[\"question\"]},\n",
-    "                ],\n",
-    "                \"completion\": [\n",
-    "                    {\"role\": \"assistant\", \"content\": row[\"answer\"]},\n",
-    "                ],\n",
-    "                \"question\": row[\"question\"],\n",
-    "                \"answer\": row[\"answer\"],\n",
-    "            }\n",
-    "        )\n",
-    "    return converted\n",
-    "\n",
-    "def split_dataset(rows: List[Dict], train_ratio: float, seed: int) -> DatasetDict:\n",
-    "    rng = random.Random(seed)\n",
-    "    rows = rows.copy()\n",
-    "    rng.shuffle(rows)\n",
-    "\n",
-    "    split_idx = max(1, int(len(rows) * train_ratio))\n",
-    "    split_idx = min(split_idx, len(rows) - 1)\n",
-    "\n",
-    "    train_rows = rows[:split_idx]\n",
-    "    val_rows = rows[split_idx:]\n",
-    "\n",
-    "    ds = DatasetDict(\n",
-    "        {\n",
-    "            \"train\": Dataset.from_list(train_rows),\n",
-    "            \"validation\": Dataset.from_list(val_rows),\n",
-    "        }\n",
-    "    )\n",
-    "    return ds"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "53f03f17",
-   "metadata": {},
-   "source": [
-    "## Part 8 - Read and inspect the dataset\n",
-    "\n",
-    "This section loads the dataset, creates the train / validation split, and prints one sample.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "943f8ae3",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 1 - Validate dataset\n",
-      "========================================================================================\n",
-      "{\n",
-      "  \"dataset_path\": \"/scr/user/kevin2002/TensorCat/NLP/UM_Handbook/Dataset/SFT_Dataset/SFT_QA_Training_Ready.jsonl\",\n",
-      "  \"total_examples\": 388,\n",
-      "  \"train_examples\": 349,\n",
-      "  \"validation_examples\": 39,\n",
-      "  \"seed\": 42,\n",
-      "  \"train_val_ratio\": 0.9\n",
-      "}\n",
-      "\\nSample example:\n",
-      "{\n",
-      "  \"qa_id\": \"qa_000001\",\n",
-      "  \"index_id\": \"UMI-0001\",\n",
-      "  \"prompt\": [\n",
-      "    {\n",
-      "      \"role\": \"system\",\n",
-      "      \"content\": \"You are an academic assistant for the Faculty of Computer Science and Information Technology, Universiti Malaya. Answer questions accurately and only using handbook-consistent information. If the handbook does not support a claim, avoid inventing details.\"\n",
-      "    },\n",
-      "    {\n",
-      "      \"role\": \"user\",\n",
-      "      \"content\": \"What are the faculty objectives?\"\n",
-      "    }\n",
-      "  ],\n",
-      "  \"completion\": [\n",
-      "    {\n",
-      "      \"role\": \"assistant\",\n",
-      "      \"content\": \"The faculty objectives are to sustain excellence in undergraduate and postgraduate teaching, learning, and research; contribute to national development through quality research and publications; provide innovative academic programmes that respond to societal needs; and produce quality graduates with advanced knowledge and skills in computer science and information technology.\"\n",
-      "    }\n",
-      "  ],\n",
-      "  \"question\": \"What are the faculty objectives?\",\n",
-      "  \"answer\": \"The faculty objectives are to sustain excellence in undergraduate and postgraduate teaching, learning, and research; contribute to national development through quality research and publications; provide innovative academic programmes that respond to societal needs; and produce quality graduates with advanced knowledge and skills in computer science and information technology.\"\n",
-      "}\n"
-     ]
-    }
-   ],
-   "source": [
-    "print_banner(\"Step 1 - Validate dataset\")\n",
-    "\n",
-    "if not DATASET_PATH.exists():\n",
-    "    raise FileNotFoundError(\n",
-    "        f\"Dataset not found: {DATASET_PATH}\\n\"\n",
-    "        f\"Place SFT_QA_Training_Ready.jsonl in DATASET_ROOT or update DATASET_PATH.\"\n",
-    "    )\n",
-    "\n",
-    "set_seed(RANDOM_SEED)\n",
-    "random.seed(RANDOM_SEED)\n",
-    "np.random.seed(RANDOM_SEED)\n",
-    "\n",
-    "raw_rows = load_jsonl(DATASET_PATH)\n",
-    "converted_rows = build_prompt_completion_rows(raw_rows)\n",
-    "dataset_dict = split_dataset(converted_rows, TRAIN_VAL_RATIO, RANDOM_SEED)\n",
-    "\n",
-    "split_summary = {\n",
-    "    \"dataset_path\": str(DATASET_PATH),\n",
-    "    \"total_examples\": len(converted_rows),\n",
-    "    \"train_examples\": len(dataset_dict[\"train\"]),\n",
-    "    \"validation_examples\": len(dataset_dict[\"validation\"]),\n",
-    "    \"seed\": RANDOM_SEED,\n",
-    "    \"train_val_ratio\": TRAIN_VAL_RATIO,\n",
-    "}\n",
-    "\n",
-    "print(json.dumps(split_summary, indent=2, ensure_ascii=False))\n",
-    "print(\"\\\\nSample example:\")\n",
-    "print(json.dumps(converted_rows[0], indent=2, ensure_ascii=False)[:1800])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1589862e",
-   "metadata": {},
-   "source": [
-    "## Part 9 - Save the dataset split summary\n",
-    "\n",
-    "This section writes the split summary to JSON.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "61b5ae46",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Saved split summary to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/dataset_split_summary.json\n"
-     ]
-    }
-   ],
-   "source": [
-    "def save_json(path: Path, obj: Dict) -> None:\n",
-    "    ensure_dir(path.parent)\n",
-    "    with path.open(\"w\", encoding=\"utf-8\") as f:\n",
-    "        json.dump(obj, f, indent=2, ensure_ascii=False)\n",
-    "\n",
-    "def save_predictions_jsonl(path: Path, rows: List[Dict]) -> None:\n",
-    "    ensure_dir(path.parent)\n",
-    "    with path.open(\"w\", encoding=\"utf-8\") as f:\n",
-    "        for row in rows:\n",
-    "            f.write(json.dumps(row, ensure_ascii=False) + \"\\\\n\")\n",
-    "\n",
-    "ensure_dir(OUTPUT_ROOT)\n",
-    "save_json(TRAIN_VAL_SPLIT_JSON_PATH, split_summary)\n",
-    "print(f\"Saved split summary to: {TRAIN_VAL_SPLIT_JSON_PATH}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "79e53130",
-   "metadata": {},
-   "source": [
-    "## Part 10 - Download the base model into a local directory\n",
-    "\n",
-    "This section reuses the local model if it already exists; otherwise it downloads the base model to the configured model directory.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "73f9a585",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Base model already exists at: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/models/Qwen3-8B\n",
-      "Local model path: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/models/Qwen3-8B\n"
-     ]
-    }
-   ],
-   "source": [
-    "def download_base_model_if_needed() -> Path:\n",
-    "    ensure_dir(BASE_MODEL_LOCAL_DIR)\n",
-    "\n",
-    "    if (BASE_MODEL_LOCAL_DIR / \"config.json\").exists():\n",
-    "        print(f\"Base model already exists at: {BASE_MODEL_LOCAL_DIR}\")\n",
-    "        return BASE_MODEL_LOCAL_DIR\n",
-    "\n",
-    "    print_banner(\"Downloading base model snapshot\")\n",
-    "    local_path = snapshot_download(\n",
-    "        repo_id=BASE_MODEL_NAME,\n",
-    "        local_dir=str(BASE_MODEL_LOCAL_DIR),\n",
-    "        local_dir_use_symlinks=False,\n",
-    "        resume_download=True,\n",
-    "    )\n",
-    "    print(f\"Downloaded base model to: {local_path}\")\n",
-    "    return BASE_MODEL_LOCAL_DIR\n",
-    "\n",
-    "local_model_path = download_base_model_if_needed()\n",
-    "print(f\"Local model path: {local_model_path}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "94be6680",
-   "metadata": {},
-   "source": [
-    "## Part 11 - Load the tokenizer and training model\n",
-    "\n",
-    "This section loads the tokenizer and model, enables 4-bit QLoRA on CUDA, and falls back to standard LoRA on MPS / CPU.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "b6771a4e",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 4 - Load tokenizer and model\n",
-      "========================================================================================\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "c8e4cf58c2d243ddb9fbf786788b5037",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Loading weights:   0%|          | 0/399 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Tokenizer and model loaded successfully.\n",
-      "Model class: Qwen3ForCausalLM\n"
-     ]
-    }
-   ],
-   "source": [
-    "def load_tokenizer(model_path: Path):\n",
-    "    tokenizer = AutoTokenizer.from_pretrained(str(model_path), use_fast=True)\n",
-    "    if tokenizer.pad_token is None:\n",
-    "        tokenizer.pad_token = tokenizer.eos_token\n",
-    "    tokenizer.padding_side = \"right\"\n",
-    "    return tokenizer\n",
-    "\n",
-    "def load_model_for_training(model_path: Path, backend: str):\n",
-    "    compute_dtype = detect_compute_dtype(backend)\n",
-    "\n",
-    "    quantization_config = None\n",
-    "    if should_use_4bit(backend):\n",
-    "        quantization_config = BitsAndBytesConfig(\n",
-    "            load_in_4bit=True,\n",
-    "            bnb_4bit_use_double_quant=True,\n",
-    "            bnb_4bit_quant_type=\"nf4\",\n",
-    "            bnb_4bit_compute_dtype=compute_dtype,\n",
-    "        )\n",
-    "\n",
-    "    model_kwargs = {\n",
-    "        \"pretrained_model_name_or_path\": str(model_path),\n",
-    "        \"torch_dtype\": compute_dtype,\n",
-    "        \"low_cpu_mem_usage\": LOW_CPU_MEM_USAGE,\n",
-    "        \"trust_remote_code\": False,\n",
-    "    }\n",
-    "\n",
-    "    if backend == \"cuda\":\n",
-    "        model_kwargs[\"device_map\"] = \"auto\"\n",
-    "    if quantization_config is not None:\n",
-    "        model_kwargs[\"quantization_config\"] = quantization_config\n",
-    "    if USE_FLASH_ATTENTION_2_IF_AVAILABLE and backend == \"cuda\":\n",
-    "        model_kwargs[\"attn_implementation\"] = \"flash_attention_2\"\n",
-    "\n",
-    "    model = AutoModelForCausalLM.from_pretrained(**model_kwargs)\n",
-    "\n",
-    "    if backend in {\"mps\", \"cpu\"}:\n",
-    "        model = model.to(backend)\n",
-    "\n",
-    "    model.config.use_cache = False if GRADIENT_CHECKPOINTING else True\n",
-    "    if GRADIENT_CHECKPOINTING:\n",
-    "        model.gradient_checkpointing_enable()\n",
-    "\n",
-    "    return model\n",
-    "\n",
-    "print_banner(\"Step 4 - Load tokenizer and model\")\n",
-    "tokenizer = load_tokenizer(local_model_path)\n",
-    "model = load_model_for_training(local_model_path, RUNTIME_DEVICE_BACKEND)\n",
-    "\n",
-    "print(\"Tokenizer and model loaded successfully.\")\n",
-    "print(\"Model class:\", model.__class__.__name__)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "192160bb",
-   "metadata": {},
-   "source": [
-    "## Part 12 - Build the LoRA configuration and training arguments\n",
-    "\n",
-    "This section defines the LoRA settings and trainer arguments.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "38a4cc45",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 5 - Build trainer\n",
-      "========================================================================================\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "2c1da0d8fb454d9bb3ca83a8f871a8b5",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Tokenizing train dataset (num_proc=1):   0%|          | 0/349 [00:00<?, ? examples/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "e0b96c79ace94af39e6018adea9da222",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Tokenizing eval dataset (num_proc=1):   0%|          | 0/39 [00:00<?, ? examples/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Trainer built successfully.\n"
-     ]
-    }
-   ],
-   "source": [
-    "def build_peft_config() -> LoraConfig:\n",
-    "    return LoraConfig(\n",
-    "        r=LORA_R,\n",
-    "        lora_alpha=LORA_ALPHA,\n",
-    "        lora_dropout=LORA_DROPOUT,\n",
-    "        bias=\"none\",\n",
-    "        task_type=\"CAUSAL_LM\",\n",
-    "        target_modules=LORA_TARGET_MODULES,\n",
-    "    )\n",
-    "\n",
-    "def build_training_args(backend: str) -> SFTConfig:\n",
-    "    bf16 = backend == \"cuda\" and torch.cuda.is_bf16_supported()\n",
-    "    fp16 = backend in {\"cuda\", \"mps\"} and not bf16\n",
-    "\n",
-    "    return SFTConfig(\n",
-    "        output_dir=str(OUTPUT_ROOT / \"trainer_runs\"),\n",
-    "        num_train_epochs=NUM_TRAIN_EPOCHS,\n",
-    "        per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH_SIZE,\n",
-    "        per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH_SIZE,\n",
-    "        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,\n",
-    "        learning_rate=LEARNING_RATE,\n",
-    "        weight_decay=WEIGHT_DECAY,\n",
-    "        warmup_steps=WARMUP_STEPS,\n",
-    "        logging_steps=LOGGING_STEPS,\n",
-    "        eval_strategy=\"steps\",\n",
-    "        eval_steps=EVAL_STEPS,\n",
-    "        save_strategy=\"steps\",\n",
-    "        save_steps=SAVE_STEPS,\n",
-    "        save_total_limit=2,\n",
-    "        load_best_model_at_end=True,\n",
-    "        metric_for_best_model=\"eval_loss\",\n",
-    "        greater_is_better=False,\n",
-    "        max_grad_norm=MAX_GRAD_NORM,\n",
-    "        lr_scheduler_type=\"cosine\",\n",
-    "        bf16=bf16,\n",
-    "        fp16=fp16,\n",
-    "        gradient_checkpointing=GRADIENT_CHECKPOINTING,\n",
-    "        max_length=MAX_SEQ_LENGTH,\n",
-    "        packing=PACKING,\n",
-    "        dataset_num_proc=1,\n",
-    "        completion_only_loss=True,\n",
-    "        remove_unused_columns=False,\n",
-    "        report_to=\"none\",\n",
-    "        seed=RANDOM_SEED,\n",
-    "        optim=\"paged_adamw_8bit\" if should_use_4bit(backend) else \"adamw_torch\",\n",
-    "    )\n",
-    "\n",
-    "def build_trainer(model, tokenizer, dataset_dict: DatasetDict, backend: str) -> SFTTrainer:\n",
-    "    peft_config = build_peft_config()\n",
-    "    training_args = build_training_args(backend)\n",
-    "\n",
-    "    trainer = SFTTrainer(\n",
-    "        model=model,\n",
-    "        processing_class=tokenizer,\n",
-    "        args=training_args,\n",
-    "        train_dataset=dataset_dict[\"train\"],\n",
-    "        eval_dataset=dataset_dict[\"validation\"],\n",
-    "        peft_config=peft_config,\n",
-    "    )\n",
-    "    return trainer\n",
-    "\n",
-    "print_banner(\"Step 5 - Build trainer\")\n",
-    "trainer = build_trainer(model, tokenizer, dataset_dict, RUNTIME_DEVICE_BACKEND)\n",
-    "print(\"Trainer built successfully.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b7892f31",
-   "metadata": {},
-   "source": [
-    "## Part 13 - Start training\n",
-    "\n",
-    "This section starts fine-tuning.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "fee6890b",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 6 - Train\n",
-      "========================================================================================\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "\n",
-       "    <div>\n",
-       "      \n",
-       "      <progress value='132' max='132' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
-       "      [132/132 11:01, Epoch 6/6]\n",
-       "    </div>\n",
-       "    <table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       " <tr style=\"text-align: left;\">\n",
-       "      <th>Step</th>\n",
-       "      <th>Training Loss</th>\n",
-       "      <th>Validation Loss</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>50</td>\n",
-       "      <td>0.300081</td>\n",
-       "      <td>1.132511</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>100</td>\n",
-       "      <td>0.025233</td>\n",
-       "      <td>1.302579</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table><p>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Train metrics:\n",
-      "{\n",
-      "  \"train_runtime\": 666.4624,\n",
-      "  \"train_samples_per_second\": 3.142,\n",
-      "  \"train_steps_per_second\": 0.198,\n",
-      "  \"total_flos\": 1.4419846776152064e+16,\n",
-      "  \"train_loss\": 0.26707774019715463\n",
-      "}\n",
-      "Adapter saved to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/lora_adapter\n",
-      "Train stage minutes: 11.13\n"
-     ]
-    }
-   ],
-   "source": [
-    "print_banner(\"Step 6 - Train\")\n",
-    "train_start_time = time.time()\n",
-    "\n",
-    "train_result = trainer.train()\n",
-    "trainer.save_model(str(ADAPTER_OUTPUT_DIR))\n",
-    "tokenizer.save_pretrained(str(ADAPTER_OUTPUT_DIR))\n",
-    "\n",
-    "train_metrics = train_result.metrics\n",
-    "print(\"Train metrics:\")\n",
-    "print(json.dumps(train_metrics, indent=2, default=str))\n",
-    "\n",
-    "print(f\"Adapter saved to: {ADAPTER_OUTPUT_DIR}\")\n",
-    "print(f\"Train stage minutes: {(time.time() - train_start_time)/60:.2f}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1a64c5b7",
-   "metadata": {},
-   "source": [
-    "## Part 14 - Teacher-forced loss evaluation\n",
-    "\n",
-    "This section computes `eval_loss` and `perplexity` on the validation set.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "7ceb96b4",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 7 - Evaluate teacher-forced loss\n",
-      "========================================================================================\n",
-      "Eval loss  : 1.132510781288147\n",
-      "Perplexity : 3.1034387822556253\n",
-      "{'eval_loss': 1.132510781288147, 'eval_runtime': 2.8459, 'eval_samples_per_second': 13.704, 'eval_steps_per_second': 7.028}\n"
-     ]
-    }
-   ],
-   "source": [
-    "print_banner(\"Step 7 - Evaluate teacher-forced loss\")\n",
-    "\n",
-    "# Remove notebook progress callback to avoid Jupyter evaluate callback error\n",
-    "trainer.remove_callback(transformers.utils.notebook.NotebookProgressCallback)\n",
-    "\n",
-    "eval_metrics = trainer.evaluate()\n",
-    "eval_loss = float(eval_metrics.get(\"eval_loss\", float(\"nan\")))\n",
-    "perplexity = float(math.exp(min(eval_loss, 20))) if math.isfinite(eval_loss) else float(\"nan\")\n",
-    "\n",
-    "print(\"Eval loss  :\", eval_loss)\n",
-    "print(\"Perplexity :\", perplexity)\n",
-    "print(eval_metrics)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0da67d71",
-   "metadata": {},
-   "source": [
-    "## Part 15 - Generation evaluation functions\n",
-    "\n",
-    "This section defines generation-based evaluation functions and metric calculation.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "id": "eff9034b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def format_eval_prompt(tokenizer, question: str) -> str:\n",
-    "    messages = [\n",
-    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
-    "        {\"role\": \"user\", \"content\": question},\n",
-    "    ]\n",
-    "    return tokenizer.apply_chat_template(\n",
-    "        messages,\n",
-    "        tokenize=False,\n",
-    "        add_generation_prompt=True,\n",
-    "    )\n",
-    "\n",
-    "@torch.inference_mode()\n",
-    "def generate_answers(model, tokenizer, questions: List[str], max_new_tokens: int) -> List[str]:\n",
-    "    device = next(model.parameters()).device\n",
-    "    prompts = [format_eval_prompt(tokenizer, q) for q in questions]\n",
-    "    outputs: List[str] = []\n",
-    "\n",
-    "    for prompt in prompts:\n",
-    "        encoded = tokenizer(\n",
-    "            prompt,\n",
-    "            return_tensors=\"pt\",\n",
-    "            truncation=True,\n",
-    "            max_length=MAX_SEQ_LENGTH,\n",
-    "        )\n",
-    "        encoded = {k: v.to(device) for k, v in encoded.items()}\n",
-    "\n",
-    "        generated = model.generate(\n",
-    "            **encoded,\n",
-    "            max_new_tokens=max_new_tokens,\n",
-    "            do_sample=False,\n",
-    "            temperature=None,\n",
-    "            top_p=None,\n",
-    "            repetition_penalty=1.05,\n",
-    "            pad_token_id=tokenizer.pad_token_id,\n",
-    "            eos_token_id=tokenizer.eos_token_id,\n",
-    "        )\n",
-    "\n",
-    "        gen_only = generated[0][encoded[\"input_ids\"].shape[1]:]\n",
-    "        text = tokenizer.decode(gen_only, skip_special_tokens=True)\n",
-    "        outputs.append(normalize_text(text))\n",
-    "\n",
-    "    return outputs\n",
-    "\n",
-    "def compute_generation_metrics(predictions: List[str], references: List[str]) -> Dict[str, float]:\n",
-    "    import evaluate\n",
-    "    import sacrebleu\n",
-    "\n",
-    "    rouge = evaluate.load(\"rouge\")\n",
-    "    \n",
-    "\n",
-    "    rouge_scores = rouge.compute(predictions=predictions, references=references)\n",
-    "    \n",
-    "\n",
-    "    sacrebleu_score = sacrebleu.corpus_bleu(predictions, [references]).score\n",
-    "    chrf_score = sacrebleu.corpus_chrf(predictions, [references], word_order=2).score\n",
-    "\n",
-    "    em = float(np.mean([exact_match(p, r) for p, r in zip(predictions, references)]))\n",
-    "    tf1 = float(np.mean([token_f1(p, r) for p, r in zip(predictions, references)]))\n",
-    "    avg_pred_len = float(np.mean([len(p.split()) for p in predictions])) if predictions else 0.0\n",
-    "    avg_ref_len = float(np.mean([len(r.split()) for r in references])) if references else 0.0\n",
-    "\n",
-    "    metrics = {\n",
-    "        \"exact_match\": em,\n",
-    "        \"token_f1\": tf1,\n",
-    "        \"rouge1\": float(rouge_scores[\"rouge1\"]),\n",
-    "        \"rouge2\": float(rouge_scores[\"rouge2\"]),\n",
-    "        \"rougeL\": float(rouge_scores[\"rougeL\"]),\n",
-    "        \"bertscore_f1\": None,\n",
-    "        \"sacrebleu\": float(sacrebleu_score),\n",
-    "        \"chrf_pp\": float(chrf_score),\n",
-    "        \"avg_prediction_words\": avg_pred_len,\n",
-    "        \"avg_reference_words\": avg_ref_len,\n",
-    "    }\n",
-    "    return metrics"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d9298597",
-   "metadata": {},
-   "source": [
-    "## Part 16 - Run final generation evaluation on the validation set\n",
-    "\n",
-    "This section generates answers on the validation set, computes metrics, saves predictions, and prints a few samples.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "id": "995cd0ee",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 8 - Final generation evaluation on validation split\n",
-      "========================================================================================\n",
-      "Generation metrics:\n",
-      "{\n",
-      "  \"exact_match\": 0.0,\n",
-      "  \"token_f1\": 0.5181110282406411,\n",
-      "  \"rouge1\": 0.5171361078676141,\n",
-      "  \"rouge2\": 0.33460021476687485,\n",
-      "  \"rougeL\": 0.45557456154376447,\n",
-      "  \"bertscore_f1\": null,\n",
-      "  \"sacrebleu\": 31.010919781258593,\n",
-      "  \"chrf_pp\": 49.920320261813664,\n",
-      "  \"avg_prediction_words\": 36.0,\n",
-      "  \"avg_reference_words\": 36.69230769230769\n",
-      "}\n",
-      "\n",
-      "========================================================================================\n",
-      "Sample validation predictions\n",
-      "========================================================================================\n",
-      "\n",
-      "[Q] What courses are listed under programme core courses?\n",
-      "[REF] The handbook presents programme core courses as a curriculum table listing the approved course codes, course titles, credit values, and semester arrangement for the programme core component.\n",
-      "[PRED] <think> </think> The handbook presents programme core courses as a curriculum table listing the approved course codes, course titles, credit values, and semester arrangement for that part of the programme.\n",
-      "[EM=0, TokenF1=0.8772]\n",
-      "\n",
-      "[Q] What responsibilities do supervisors have for guidance, feedback, and assessment in the academic project?\n",
-      "[REF] Supervisors are responsible for assigning or confirming project titles in the ilmiah system, supervising and coaching students, meeting them regularly, verifying the progress logbook, evaluating reports, and entering marks in the ilmiah system. More broadly, the handbook frames supervisors as one of the key parties responsible for guidance, monitoring, feedback, and assessment in the academic project process.\n",
-      "[PRED] <think> </think> Supervisors are responsible for providing regular guidance, giving feedback on progress and submissions, carrying out viva or panel-based assessment where applicable, and submitting the final report through the ilmiah system for administration to issue the mark.\n",
-      "[EM=0, TokenF1=0.3542]\n",
-      "\n",
-      "[Q] What information is given about the faculty cafeteria?\n",
-      "[REF] The cafeteria is located at the back of Block A.\n",
-      "[PRED] <think> </think> The handbook states that the faculty cafeteria serves both staff and students and that it is open to all UM students.\n",
-      "[EM=0, TokenF1=0.2424]\n",
-      "\n",
-      "[Q] What dress expectations are illustrated for official events?\n",
-      "[REF] For official events, the poster illustrates formal attire, including suit-style clothing and traditional formal wear, to convey a neat and official appearance appropriate for formal university occasions.\n",
-      "[PRED] <think> </think> For official events, men are expected to follow formal or semi-formal Western business attire, while women should also aim for formal or appropriate Western office or ceremonial clothing.\n",
-      "[EM=0, TokenF1=0.3729]\n",
-      "\n",
-      "[Q] What courses are listed under specialization elective courses - artificial intelligence?\n",
-      "[REF] The handbook presents the Artificial Intelligence specialization electives as a curriculum table listing the approved course codes, course titles, credit values, and semester arrangement for that specialization.\n",
-      "[PRED] <think> </think> The specialization elective section is intended to show the elective pool available for that track. Students should use it as a selection list of approved course codes they can choose from, following the shown curriculum structure and any stated university or faculty rules for that programme.\n",
-      "[EM=0, TokenF1=0.3467]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print_banner(\"Step 8 - Final generation evaluation on validation split\")\n",
-    "\n",
-    "final_metrics = {\n",
-    "    \"teacher_forced_eval\": eval_metrics,\n",
-    "    \"perplexity\": perplexity,\n",
-    "}\n",
-    "\n",
-    "prediction_rows = []\n",
-    "\n",
-    "validation_questions = dataset_dict[\"validation\"][\"question\"]\n",
-    "validation_answers = dataset_dict[\"validation\"][\"answer\"]\n",
-    "\n",
-    "predictions = generate_answers(\n",
-    "    model=trainer.model,\n",
-    "    tokenizer=tokenizer,\n",
-    "    questions=validation_questions,\n",
-    "    max_new_tokens=MAX_NEW_TOKENS_EVAL,\n",
-    ")\n",
-    "\n",
-    "generation_metrics = compute_generation_metrics(predictions, validation_answers)\n",
-    "final_metrics[\"generation_metrics\"] = generation_metrics\n",
-    "\n",
-    "for i, (question, reference, prediction) in enumerate(\n",
-    "    zip(validation_questions, validation_answers, predictions)\n",
-    "):\n",
-    "    prediction_rows.append(\n",
-    "        {\n",
-    "            \"row_id\": i,\n",
-    "            \"question\": question,\n",
-    "            \"reference_answer\": reference,\n",
-    "            \"predicted_answer\": prediction,\n",
-    "            \"exact_match\": exact_match(prediction, reference),\n",
-    "            \"token_f1\": token_f1(prediction, reference),\n",
-    "        }\n",
-    "    )\n",
-    "\n",
-    "save_predictions_jsonl(PREDICTIONS_JSONL_PATH, prediction_rows)\n",
-    "\n",
-    "print(\"Generation metrics:\")\n",
-    "print(json.dumps(generation_metrics, indent=2, ensure_ascii=False))\n",
-    "\n",
-    "print_banner(\"Sample validation predictions\")\n",
-    "for row in prediction_rows[:NUM_PRINTED_PREDICTIONS]:\n",
-    "    print(f\"\\n[Q] {row['question']}\")\n",
-    "    print(f\"[REF] {row['reference_answer']}\")\n",
-    "    print(f\"[PRED] {row['predicted_answer']}\")\n",
-    "    print(f\"[EM={row['exact_match']:.0f}, TokenF1={row['token_f1']:.4f}]\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "255eb7de",
-   "metadata": {},
-   "source": [
-    "## Part 17 - Save metrics\n",
-    "\n",
-    "This section writes the current metrics to JSON.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "id": "ebd241d3",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Metrics saved to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/final_metrics.json\n",
-      "Predictions saved to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/validation_predictions.jsonl\n"
-     ]
-    }
-   ],
-   "source": [
-    "save_json(METRICS_JSON_PATH, final_metrics)\n",
-    "print(f\"Metrics saved to: {METRICS_JSON_PATH}\")\n",
-    "print(f\"Predictions saved to: {PREDICTIONS_JSONL_PATH}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "429b8fea",
-   "metadata": {},
-   "source": [
-    "## Part 18 - Merge the LoRA adapter and export the final model\n",
-    "\n",
-    "This section reloads the base model, merges the LoRA adapter, saves the merged model directory, and optionally exports a `.pt` file.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "id": "720f1089",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Step 9 - Save merged model\n",
-      "========================================================================================\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "00dc5873f54b4054853f7908bd366489",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Loading weights:   0%|          | 0/399 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "5c8f48a945a84fa5b75150c6cb3939d6",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Merged model saved to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/merged_model\n",
-      "\n",
-      "========================================================================================\n",
-      "Saving single .pt state_dict export\n",
-      "========================================================================================\n",
-      "Saved .pt file to: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/Qwen3-8B-Instruct_UM_Handbook.pt\n"
-     ]
-    }
-   ],
-   "source": [
-    "def load_base_model_for_merge(model_path: Path, backend: str):\n",
-    "    compute_dtype = detect_compute_dtype(backend)\n",
-    "    model_kwargs = {\n",
-    "        \"pretrained_model_name_or_path\": str(model_path),\n",
-    "        \"torch_dtype\": compute_dtype,\n",
-    "        \"low_cpu_mem_usage\": LOW_CPU_MEM_USAGE,\n",
-    "        \"trust_remote_code\": False,\n",
-    "    }\n",
-    "    if backend == \"cuda\":\n",
-    "        model_kwargs[\"device_map\"] = \"auto\"\n",
-    "    model = AutoModelForCausalLM.from_pretrained(**model_kwargs)\n",
-    "    if backend in {\"mps\", \"cpu\"}:\n",
-    "        model = model.to(backend)\n",
-    "    return model\n",
-    "\n",
-    "def save_single_pt_state_dict(model, path: Path) -> None:\n",
-    "    print_banner(\"Saving single .pt state_dict export\")\n",
-    "    ensure_dir(path.parent)\n",
-    "\n",
-    "    cpu_state_dict = {}\n",
-    "    for key, value in model.state_dict().items():\n",
-    "        cpu_state_dict[key] = value.detach().cpu()\n",
-    "\n",
-    "    torch.save(\n",
-    "        {\n",
-    "            \"model_state_dict\": cpu_state_dict,\n",
-    "            \"base_model_name\": BASE_MODEL_NAME,\n",
-    "            \"system_prompt\": SYSTEM_PROMPT,\n",
-    "            \"max_seq_length\": MAX_SEQ_LENGTH,\n",
-    "        },\n",
-    "        str(path),\n",
-    "    )\n",
-    "    print(f\"Saved .pt file to: {path}\")\n",
-    "\n",
-    "print_banner(\"Step 9 - Save merged model\")\n",
-    "cleanup_memory()\n",
-    "\n",
-    "if SAVE_MERGED_MODEL:\n",
-    "    base_model_for_merge = load_base_model_for_merge(local_model_path, RUNTIME_DEVICE_BACKEND)\n",
-    "    merged_model = PeftModel.from_pretrained(base_model_for_merge, str(ADAPTER_OUTPUT_DIR))\n",
-    "    merged_model = merged_model.merge_and_unload()\n",
-    "\n",
-    "    ensure_dir(MERGED_MODEL_DIR)\n",
-    "    merged_model.save_pretrained(str(MERGED_MODEL_DIR), safe_serialization=True)\n",
-    "\n",
-    "    if SAVE_TOKENIZER_WITH_MERGED:\n",
-    "        tokenizer.save_pretrained(str(MERGED_MODEL_DIR))\n",
-    "\n",
-    "    print(f\"Merged model saved to: {MERGED_MODEL_DIR}\")\n",
-    "\n",
-    "    if SAVE_SINGLE_PT:\n",
-    "        save_single_pt_state_dict(merged_model, FINAL_PT_PATH)\n",
-    "\n",
-    "    del merged_model\n",
-    "    del base_model_for_merge\n",
-    "    cleanup_memory()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f2973f9b",
-   "metadata": {},
-   "source": [
-    "## Part 19 - End-of-training summary\n",
-    "\n",
-    "This section prints the final output paths.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 38,
-   "id": "6891902c",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "========================================================================================\n",
-      "Done\n",
-      "========================================================================================\n",
-      "Selected backend: cuda\n",
-      "Adapter directory: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/lora_adapter\n",
-      "Merged model directory: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/merged_model\n",
-      "Single .pt file: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/Qwen3-8B-Instruct_UM_Handbook.pt\n",
-      "Metrics JSON: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/final_metrics.json\n",
-      "Predictions JSONL: /scr/user/kevin2002/TensorCat/NLP/UM_Handbook/outputs/qwen3_um_handbook/validation_predictions.jsonl\n"
-     ]
-    }
-   ],
-   "source": [
-    "total_runtime_minutes = None\n",
-    "try:\n",
-    "    # 如果 notebook 从头开始运行，这个变量就存在\n",
-    "    total_runtime_minutes = \"See notebook runtime from execution order / timestamps\"\n",
-    "except Exception:\n",
-    "    pass\n",
-    "\n",
-    "final_metrics[\"completion_note\"] = \"Notebook execution completed.\"\n",
-    "save_json(METRICS_JSON_PATH, final_metrics)\n",
-    "\n",
-    "print_banner(\"Done\")\n",
-    "print(f\"Selected backend: {RUNTIME_DEVICE_BACKEND}\")\n",
-    "print(f\"Adapter directory: {ADAPTER_OUTPUT_DIR}\")\n",
-    "print(f\"Merged model directory: {MERGED_MODEL_DIR}\")\n",
-    "print(f\"Single .pt file: {FINAL_PT_PATH if SAVE_SINGLE_PT else 'disabled'}\")\n",
-    "print(f\"Metrics JSON: {METRICS_JSON_PATH}\")\n",
-    "print(f\"Predictions JSONL: {PREDICTIONS_JSONL_PATH}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e35a1ca8",
-   "metadata": {},
-   "source": [
-    "## Part 20 - Result inspection\n",
-    "\n",
-    "Check these files after training:\n",
-    "\n",
-    "### 1. `final_metrics.json`\n",
-    "Review the overall metrics.\n",
-    "\n",
-    "### 2. `validation_predictions.jsonl`\n",
-    "Inspect generated answers against the reference answers.\n",
-    "\n",
-    "### 3. `merged_model/`\n",
-    "Use this directory for standard Hugging Face loading.\n",
-    "\n",
-    "### 4. `Qwen3-8B-Instruct_UM_Handbook.pt`\n",
-    "This is the optional single-file export.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "91778773",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python (TensorCat Py3.10)",
-   "language": "python",
-   "name": "tensorcat-py310"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.14"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}