{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
  "language_info": {"name": "python", "version": "3.10.0"},
  "accelerator": "GPU",
  "colab": {"provenance": [], "gpuType": "A100"}
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AgentDebuggerEnv — GRPO Training\n",
    "\n",
    "**Training Qwen2.5-Coder-7B-Instruct on structured hypothesis-driven debugging**\n",
    "\n",
    "- **Algorithm:** GRPO (same as DeepSeek-R1) via HuggingFace TRL\n",
    "- **Dataset:** 90 hand-validated bugs across 3 difficulty tiers\n",
    "- **Curriculum:** Tier 1 (steps 0–150) → Tier 1+2 (150–350) → All tiers (350–500)\n",
    "- **Model:** Qwen2.5-Coder-7B-Instruct + LoRA (float16/bfloat16, no quantization)\n",
    "\n",
    "> **Requirements:** GPU runtime. In Colab: Runtime → Change runtime type → **A100**."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Verify GPU is available\n",
    "import subprocess, sys\n",
    "result = subprocess.run([\"nvidia-smi\"], capture_output=True, text=True)\n",
    "if result.returncode != 0:\n",
    "    raise RuntimeError(\"No GPU detected. Go to Runtime → Change runtime type → GPU (A100 recommended)\")\n",
    "print(result.stdout[:600])"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Clone the environment repository\n",
    "!git clone https://huggingface.co/spaces/shashaank0707/AgentDebugger-env agentdebugger\n",
    "%cd agentdebugger"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Install CUDA-enabled PyTorch first (must precede all other imports)\n",
    "!pip install -q torch --index-url https://download.pytorch.org/whl/cu121\n",
    "\n",
    "# Install training dependencies\n",
    "!pip install -q \\\n",
    "    wandb==0.18.7 \\\n",
    "    datasets==3.0.2 \\\n",
    "    transformers==4.48.3 \\\n",
    "    accelerate==1.0.1 \\\n",
    "    \"trl==0.15.2\" \\\n",
    "    peft==0.13.2\n",
    "\n",
    "import torch\n",
    "print(f\"PyTorch:        {torch.__version__}\")\n",
    "print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
    "if torch.cuda.is_available():\n",
    "    props = torch.cuda.get_device_properties(0)\n",
    "    print(f\"GPU:            {props.name}\")\n",
    "    print(f\"VRAM:           {props.total_memory / 1e9:.1f} GB\")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import os\n",
    "\n",
    "# Weights & Biases — get a free API key at https://wandb.ai\n",
    "WANDB_API_KEY = \"\"  # @param {type:\"string\"}\n",
    "if WANDB_API_KEY:\n",
    "    os.environ[\"WANDB_API_KEY\"] = WANDB_API_KEY\n",
    "    import wandb; wandb.login(key=WANDB_API_KEY)\n",
    "    print(\"W&B login successful — training curves will be logged\")\n",
    "else:\n",
    "    print(\"No W&B key — set WANDB_API_KEY above to get loss/reward plots\")\n",
    "\n",
    "# Hugging Face token — needed to push the final model\n",
    "HF_TOKEN = \"\"  # @param {type:\"string\"}\n",
    "if HF_TOKEN:\n",
    "    os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
    "    from huggingface_hub import login; login(token=HF_TOKEN)\n",
    "    print(\"HF login successful — trained model will be pushed to Hub\")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 — Sanity Check (10 steps, ~2 min)\n",
    "\n",
    "Runs 10 training steps to verify GPU, dependencies, and reward function all work before the full run."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!python training/train_grpo.py --test"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 — Full Training (500 steps, ~45 min on A100)\n",
    "\n",
    "Runs the complete curriculum:\n",
    "- **Steps 0–150:** Tier 1 only (easy bugs — off-by-one, simple logic)\n",
    "- **Steps 150–350:** Tier 1 + Tier 2 (adds red-herring auth bugs)\n",
    "- **Steps 350–500:** All tiers (adds concurrency race conditions)\n",
    "\n",
    "Checkpoints saved every 50 steps. Final model pushed to HF Hub if `HF_TOKEN` is set."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "!python training/train_grpo.py"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results — Baseline vs Trained"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import json, os\n",
    "\n",
    "baseline, final = None, None\n",
    "\n",
    "if os.path.exists(\"baseline_results.json\"):\n",
    "    with open(\"baseline_results.json\") as f:\n",
    "        baseline = json.load(f)\n",
    "    print(f\"Baseline  | solve_rate: {baseline['solve_rate']:.1%} | avg_reward: {baseline['avg_reward']:.3f}\")\n",
    "\n",
    "if os.path.exists(\"final_results.json\"):\n",
    "    with open(\"final_results.json\") as f:\n",
    "        final = json.load(f)\n",
    "    print(f\"Trained   | solve_rate: {final['solve_rate']:.1%} | avg_reward: {final['avg_reward']:.3f}\")\n",
    "    if baseline:\n",
    "        delta = final['avg_reward'] - baseline['avg_reward']\n",
    "        print(f\"\\nImprovement: {delta:+.3f} ({delta / baseline['avg_reward'] * 100:+.1f}% relative)\")\n",
    "else:\n",
    "    print(\"final_results.json not written yet — run training first\")"
   ],
   "outputs": [],
   "execution_count": null
  }
 ]
}