{ "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"name": "python", "version": "3.10.0"}, "accelerator": "GPU", "colab": {"provenance": [], "gpuType": "A100"} }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AgentDebuggerEnv — GRPO Training\n", "\n", "**Training Qwen2.5-Coder-7B-Instruct on structured hypothesis-driven debugging**\n", "\n", "- **Algorithm:** GRPO (same as DeepSeek-R1) via HuggingFace TRL\n", "- **Dataset:** 90 hand-validated bugs across 3 difficulty tiers\n", "- **Curriculum:** Tier 1 (steps 0–150) → Tier 1+2 (150–350) → All tiers (350–500)\n", "- **Model:** Qwen2.5-Coder-7B-Instruct + LoRA (float16/bfloat16, no quantization)\n", "\n", "> **Requirements:** GPU runtime. In Colab: Runtime → Change runtime type → **A100**." ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Verify GPU is available\n", "import subprocess, sys\n", "result = subprocess.run([\"nvidia-smi\"], capture_output=True, text=True)\n", "if result.returncode != 0:\n", " raise RuntimeError(\"No GPU detected. Go to Runtime → Change runtime type → GPU (A100 recommended)\")\n", "print(result.stdout[:600])" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "# Clone the environment repository\n", "!git clone https://huggingface.co/spaces/shashaank0707/AgentDebugger-env agentdebugger\n", "%cd agentdebugger" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "# Install CUDA-enabled PyTorch first (must precede all other imports)\n", "!pip install -q torch --index-url https://download.pytorch.org/whl/cu121\n", "\n", "# Install training dependencies\n", "!pip install -q \\\n", " wandb==0.18.7 \\\n", " datasets==3.0.2 \\\n", " transformers==4.48.3 \\\n", " accelerate==1.0.1 \\\n", " \"trl==0.15.2\" \\\n", " peft==0.13.2\n", "\n", "import torch\n", "print(f\"PyTorch: {torch.__version__}\")\n", "print(f\"CUDA available: {torch.cuda.is_available()}\")\n", "if torch.cuda.is_available():\n", " props = torch.cuda.get_device_properties(0)\n", " print(f\"GPU: {props.name}\")\n", " print(f\"VRAM: {props.total_memory / 1e9:.1f} GB\")" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "import os\n", "\n", "# Weights & Biases — get a free API key at https://wandb.ai\n", "WANDB_API_KEY = \"\" # @param {type:\"string\"}\n", "if WANDB_API_KEY:\n", " os.environ[\"WANDB_API_KEY\"] = WANDB_API_KEY\n", " import wandb; wandb.login(key=WANDB_API_KEY)\n", " print(\"W&B login successful — training curves will be logged\")\n", "else:\n", " print(\"No W&B key — set WANDB_API_KEY above to get loss/reward plots\")\n", "\n", "# Hugging Face token — needed to push the final model\n", "HF_TOKEN = \"\" # @param {type:\"string\"}\n", "if HF_TOKEN:\n", " os.environ[\"HF_TOKEN\"] = HF_TOKEN\n", " from huggingface_hub import login; login(token=HF_TOKEN)\n", " print(\"HF login successful — trained model will be pushed to Hub\")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 — Sanity Check (10 steps, ~2 min)\n", "\n", "Runs 10 training steps to verify GPU, dependencies, and reward function all work before the full run." ] }, { "cell_type": "code", "metadata": {}, "source": [ "!python training/train_grpo.py --test" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 — Full Training (500 steps, ~45 min on A100)\n", "\n", "Runs the complete curriculum:\n", "- **Steps 0–150:** Tier 1 only (easy bugs — off-by-one, simple logic)\n", "- **Steps 150–350:** Tier 1 + Tier 2 (adds red-herring auth bugs)\n", "- **Steps 350–500:** All tiers (adds concurrency race conditions)\n", "\n", "Checkpoints saved every 50 steps. Final model pushed to HF Hub if `HF_TOKEN` is set." ] }, { "cell_type": "code", "metadata": {}, "source": [ "!python training/train_grpo.py" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results — Baseline vs Trained" ] }, { "cell_type": "code", "metadata": {}, "source": [ "import json, os\n", "\n", "baseline, final = None, None\n", "\n", "if os.path.exists(\"baseline_results.json\"):\n", " with open(\"baseline_results.json\") as f:\n", " baseline = json.load(f)\n", " print(f\"Baseline | solve_rate: {baseline['solve_rate']:.1%} | avg_reward: {baseline['avg_reward']:.3f}\")\n", "\n", "if os.path.exists(\"final_results.json\"):\n", " with open(\"final_results.json\") as f:\n", " final = json.load(f)\n", " print(f\"Trained | solve_rate: {final['solve_rate']:.1%} | avg_reward: {final['avg_reward']:.3f}\")\n", " if baseline:\n", " delta = final['avg_reward'] - baseline['avg_reward']\n", " print(f\"\\nImprovement: {delta:+.3f} ({delta / baseline['avg_reward'] * 100:+.1f}% relative)\")\n", "else:\n", " print(\"final_results.json not written yet — run training first\")" ], "outputs": [], "execution_count": null } ] }