{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "gpuType": "T4" }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" }, "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# πŸ€– MCP-Agent-1.7B β€” Training Notebook\n", "\n", "**What we're building:** The first open-source small language model that natively speaks the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). It plans and executes multi-step tool chains with DAG dependencies.\n", "\n", "**Base model:** [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) (2B params, Apache 2.0)\n", "\n", "**Method:** LoRA SFT (rank=16, all linear layers)\n", "\n", "**Cost:** $0 (Google Colab free T4 GPU)\n", "\n", "**Time:** ~2 hours\n", "\n", "---\n", "\n", "## πŸŽ“ ML Concepts You'll Learn\n", "1. **LoRA** β€” How to fine-tune a 2B model by only training 2% of parameters\n", "2. **SFT** β€” Supervised Fine-Tuning: teaching a model with inputβ†’output examples\n", "3. **bf16** β€” Half-precision training to cut memory usage in half\n", "4. **Gradient Checkpointing** β€” Trading compute for memory\n", "5. **Cosine LR Schedule** β€” Why we slow down learning over time\n", "\n", "---\n", "\n", "⚑ **Before you start:** Go to `Runtime β†’ Change runtime type β†’ T4 GPU`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0: Verify GPU & Install Dependencies\n", "\n", "πŸŽ“ **What's happening:** We check that Colab gave us a GPU, then install the ML libraries.\n", "- `transformers` β€” HuggingFace's core library for loading/using AI models\n", "- `trl` β€” Training library specifically for fine-tuning language models (SFT, RLHF, DPO)\n", "- `peft` β€” Parameter-Efficient Fine-Tuning (LoRA lives here)\n", "- `datasets` β€” For loading our training data from HuggingFace Hub\n", "- `accelerate` β€” Makes training work on any hardware (CPU, GPU, multi-GPU)\n", "- `bitsandbytes` β€” Memory-efficient optimizers and quantization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Check GPU β€” this MUST show \"Tesla T4\" or similar\n", "!nvidia-smi\n", "\n", "import torch\n", "print(f\"\\nβœ… PyTorch version: {torch.__version__}\")\n", "print(f\"βœ… CUDA available: {torch.cuda.is_available()}\")\n", "if torch.cuda.is_available():\n", " print(f\"βœ… GPU: {torch.cuda.get_device_name(0)}\")\n", " print(f\"βœ… VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB\")\n", "else:\n", " raise RuntimeError(\"❌ No GPU! Go to Runtime β†’ Change runtime type β†’ T4 GPU\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install all dependencies (takes ~2-3 minutes)\n", "!pip install -q transformers trl peft datasets accelerate bitsandbytes huggingface_hub\n", "print(\"\\nβœ… All packages installed!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Login to HuggingFace\n", "\n", "πŸŽ“ **Why?** We need to:\n", "1. Download Qwen3-1.7B from HuggingFace Hub\n", "2. **Push our trained model** back to your HuggingFace account\n", "\n", "Get your token at: https://huggingface.co/settings/tokens (needs **Write** permission)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from huggingface_hub import notebook_login\n", "notebook_login() # Paste your HF token when prompted" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Load Dataset\n", "\n", "πŸŽ“ **What's our data?** 16,520 conversations teaching the model to:\n", "- Call tools using MCP protocol (JSON-RPC format)\n", "- Plan multi-step tool chains with dependencies\n", "- Ask clarifying questions when info is missing\n", "- Refuse dangerous requests\n", "\n", "Each example is a conversation: `[{role: system, content: ...}, {role: user, content: ...}, {role: assistant, content: ...}]`\n", "\n", "The SFTTrainer automatically detects this `messages` format and applies the model's chat template." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "\n", "dataset = load_dataset(\"muhammadtlha944/mcp-agent-training-data\")\n", "\n", "print(f\"πŸ“Š Train examples: {len(dataset['train']):,}\")\n", "print(f\"πŸ“Š Validation examples: {len(dataset['validation']):,}\")\n", "print(f\"πŸ“Š Columns: {dataset['train'].column_names}\")\n", "\n", "# Let's peek at one example\n", "print(f\"\\nπŸ“ Sample conversation (first 2 messages):\")\n", "sample = dataset['train'][0]['messages']\n", "for msg in sample[:2]:\n", " role = msg['role']\n", " content = msg['content'][:200] + '...' if len(msg['content']) > 200 else msg['content']\n", " print(f\" [{role}]: {content}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Configure LoRA\n", "\n", "πŸŽ“ **LoRA (Low-Rank Adaptation) β€” The Key Idea:**\n", "\n", "Instead of updating all 2 billion parameters (which would need ~16GB+ VRAM just for optimizer states), we add tiny trainable matrices to each layer.\n", "\n", "Think of it like this:\n", "- **Full fine-tuning** = Rewriting an entire textbook (expensive, slow)\n", "- **LoRA** = Adding sticky notes to key pages (cheap, fast, nearly as effective)\n", "\n", "**Parameters explained:**\n", "- `r=16` β€” Rank of the adapter matrices. Like resolution: higher = more detail but more memory. 16 is the sweet spot for 16K examples.\n", "- `lora_alpha=32` β€” Scaling factor (rule of thumb: 2Γ— rank). Controls how strongly LoRA affects output.\n", "- `target_modules=\"all-linear\"` β€” Apply LoRA to ALL linear layers, not just attention. Research paper \"LoRA Without Regret\" proved this matches full fine-tuning quality.\n", "- `lora_dropout=0.05` β€” 5% dropout prevents overfitting (randomly zeros out some adapter weights during training)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from peft import LoraConfig\n", "\n", "peft_config = LoraConfig(\n", " r=16, # Rank β€” 16 dimensions per adapter\n", " lora_alpha=32, # Scaling factor β€” 2x rank\n", " lora_dropout=0.05, # 5% dropout for regularization\n", " bias=\"none\", # No bias terms β€” saves memory, no quality loss\n", " task_type=\"CAUSAL_LM\", # This is a language model (predicts next token)\n", " target_modules=\"all-linear\", # Apply to ALL linear layers\n", ")\n", "\n", "print(\"βœ… LoRA config ready!\")\n", "print(f\" Rank: {peft_config.r}\")\n", "print(f\" Alpha: {peft_config.lora_alpha}\")\n", "print(f\" Targets: {peft_config.target_modules}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Configure Training\n", "\n", "πŸŽ“ **Hyperparameters β€” The Recipe:**\n", "\n", "Training a model is like cooking. The hyperparameters are your recipe:\n", "\n", "| Parameter | Value | Why |\n", "|-----------|-------|-----|\n", "| **Learning rate** | 2e-4 | 10Γ— higher than full fine-tuning because LoRA updates fewer params β€” each update needs more impact |\n", "| **Batch size** | 4 Γ— 4 = 16 effective | Process 4 examples at once, accumulate 4 times before updating weights |\n", "| **Epochs** | 3 | See the data 3 times. 1 = underfitting, 10 = overfitting, 3 = sweet spot |\n", "| **Warmup** | 10% of steps | Start with tiny learning rate, ramp up gradually. Prevents early instability |\n", "| **LR schedule** | Cosine | Learning rate follows a cosine curve: high in middle, low at end. Helps convergence |\n", "| **Max seq length** | 2048 tokens | Covers our examples while fitting in T4's 16GB VRAM |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from trl import SFTConfig\n", "\n", "training_args = SFTConfig(\n", " # === Output ===\n", " output_dir=\"./mcp-agent-checkpoints\",\n", "\n", " # === Core hyperparameters ===\n", " num_train_epochs=3,\n", " per_device_train_batch_size=4, # 4 examples per GPU step\n", " gradient_accumulation_steps=4, # Accumulate 4 steps β†’ effective batch = 16\n", " learning_rate=2e-4, # 10x base LR for LoRA\n", " weight_decay=0.01, # L2 regularization\n", " lr_scheduler_type=\"cosine\", # Cosine decay\n", " warmup_ratio=0.1, # 10% warmup\n", " max_grad_norm=1.0, # Gradient clipping\n", " max_seq_length=2048, # Max tokens per example\n", "\n", " # === Memory optimization (critical for T4 16GB!) ===\n", " bf16=False, # T4 doesn't support bf16 well\n", " fp16=True, # Use fp16 instead β€” T4 is great at this\n", " gradient_checkpointing=True, # Trade compute for memory\n", " gradient_checkpointing_kwargs={\"use_reentrant\": False},\n", "\n", " # === Logging ===\n", " logging_steps=10,\n", " logging_first_step=True,\n", " logging_strategy=\"steps\",\n", "\n", " # === Evaluation ===\n", " eval_strategy=\"steps\",\n", " eval_steps=200,\n", " per_device_eval_batch_size=4,\n", "\n", " # === Checkpointing ===\n", " save_strategy=\"steps\",\n", " save_steps=200,\n", " save_total_limit=2, # Keep 2 checkpoints (save disk space)\n", " load_best_model_at_end=True,\n", " metric_for_best_model=\"eval_loss\",\n", "\n", " # === Push to HuggingFace Hub ===\n", " push_to_hub=True,\n", " hub_model_id=\"muhammadtlha944/MCP-Agent-1.7B\",\n", " hub_strategy=\"end\",\n", "\n", " # === Misc ===\n", " seed=42,\n", " dataloader_num_workers=2,\n", " optim=\"adamw_torch\",\n", ")\n", "\n", "# Print training stats\n", "steps_per_epoch = len(dataset['train']) // (4 * 4) # train_size // effective_batch\n", "total_steps = steps_per_epoch * 3\n", "print(f\"βœ… Training config ready!\")\n", "print(f\" Effective batch size: 16\")\n", "print(f\" Steps per epoch: {steps_per_epoch}\")\n", "print(f\" Total steps: {total_steps}\")\n", "print(f\" Warmup steps: {int(total_steps * 0.1)}\")\n", "print(f\" Estimated time: ~2 hours on T4\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Load Tokenizer\n", "\n", "πŸŽ“ **Tokenizer β€” Translating Words to Numbers:**\n", "\n", "AI models don't understand text β€” they work with numbers. The tokenizer converts:\n", "- `\"Hello world\"` β†’ `[9707, 1879]` (encoding)\n", "- `[9707, 1879]` β†’ `\"Hello world\"` (decoding)\n", "\n", "Qwen3 uses a **chat template** that wraps conversations in special tokens like `<|im_start|>user` and `<|im_end|>`. The SFTTrainer applies this automatically to our `messages` data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoTokenizer\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\n", " \"Qwen/Qwen3-1.7B\",\n", " trust_remote_code=True,\n", ")\n", "\n", "print(f\"βœ… Tokenizer loaded!\")\n", "print(f\" Vocab size: {tokenizer.vocab_size:,}\")\n", "\n", "# Demo: see how tokenization works\n", "demo_text = \"Call the GitHub search tool\"\n", "tokens = tokenizer.encode(demo_text)\n", "print(f\"\\nπŸ“ Demo: '{demo_text}'\")\n", "print(f\" β†’ Token IDs: {tokens}\")\n", "print(f\" β†’ Tokens: {[tokenizer.decode([t]) for t in tokens]}\")\n", "print(f\" β†’ {len(tokens)} tokens\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6: Create Trainer & Start Training! πŸš€\n", "\n", "πŸŽ“ **SFTTrainer does everything:**\n", "1. Loads the 2B parameter model onto the GPU\n", "2. Injects LoRA adapters into all linear layers (~40M trainable params out of 2B)\n", "3. Tokenizes all conversations using the chat template\n", "4. Runs the training loop for 3 epochs\n", "5. Evaluates on validation set every 200 steps\n", "6. Saves checkpoints and picks the best one\n", "7. Pushes the final model to HuggingFace Hub\n", "\n", "**What to watch:** The `loss` value should go DOWN over time. This means the model is learning. If loss goes up after going down, that's overfitting (the model is memorizing instead of learning).\n", "\n", "⏱️ **This cell takes ~2 hours. Don't close the tab!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from trl import SFTTrainer\n", "\n", "print(\"πŸ”§ Loading model and applying LoRA adapters...\")\n", "print(\" (This takes 2-3 minutes β€” downloading 2B parameters)\\n\")\n", "\n", "trainer = SFTTrainer(\n", " model=\"Qwen/Qwen3-1.7B\",\n", " args=training_args,\n", " train_dataset=dataset[\"train\"],\n", " eval_dataset=dataset[\"validation\"],\n", " peft_config=peft_config,\n", " processing_class=tokenizer,\n", ")\n", "\n", "# Print parameter stats\n", "trainable = sum(p.numel() for p in trainer.model.parameters() if p.requires_grad)\n", "total = sum(p.numel() for p in trainer.model.parameters())\n", "print(f\"\\nπŸ“Š Model loaded!\")\n", "print(f\" Total parameters: {total:,}\")\n", "print(f\" Trainable (LoRA): {trainable:,}\")\n", "print(f\" Trainable %: {100 * trainable / total:.2f}%\")\n", "print(f\" GPU memory used: {torch.cuda.memory_allocated() / 1e9:.1f} GB\")\n", "print(f\"\\nπŸš€ Starting training...\\n\")\n", "\n", "# TRAIN!\n", "train_result = trainer.train()\n", "\n", "print(f\"\\nβœ… Training complete!\")\n", "print(f\" Final loss: {train_result.metrics.get('train_loss', 'N/A')}\")\n", "print(f\" Runtime: {train_result.metrics.get('train_runtime', 0)/3600:.1f} hours\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7: Evaluate & Push to Hub\n", "\n", "πŸŽ“ **Evaluation:** We run the model on the validation set (826 examples it has NEVER seen during training) to measure real performance. If eval loss is close to train loss = good generalization. If eval loss >> train loss = overfitting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Final evaluation\n", "print(\"πŸ“Š Running final evaluation...\")\n", "eval_metrics = trainer.evaluate()\n", "print(f\" Eval loss: {eval_metrics['eval_loss']:.4f}\")\n", "\n", "# Save metrics\n", "trainer.log_metrics(\"train\", train_result.metrics)\n", "trainer.save_metrics(\"train\", train_result.metrics)\n", "trainer.log_metrics(\"eval\", eval_metrics)\n", "trainer.save_metrics(\"eval\", eval_metrics)\n", "\n", "# Push to HuggingFace Hub\n", "print(\"\\nπŸš€ Pushing model to HuggingFace Hub...\")\n", "trainer.push_to_hub(\n", " commit_message=\"MCP-Agent-1.7B: LoRA fine-tuned Qwen3-1.7B for MCP tool calling\",\n", " tags=[\"mcp\", \"tool-calling\", \"function-calling\", \"agent\", \"qwen3\", \"lora\"],\n", ")\n", "\n", "print(f\"\\n\" + \"=\"*60)\n", "print(f\"πŸŽ‰ MCP-Agent-1.7B is LIVE!\")\n", "print(f\"=\"*60)\n", "print(f\"πŸ“¦ Model: https://huggingface.co/muhammadtlha944/MCP-Agent-1.7B\")\n", "print(f\"πŸ“Š Train loss: {train_result.metrics.get('train_loss', 'N/A'):.4f}\")\n", "print(f\"πŸ“Š Eval loss: {eval_metrics['eval_loss']:.4f}\")\n", "print(f\"⏱️ Training time: {train_result.metrics.get('train_runtime', 0)/3600:.1f} hours\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 8: Test Your Model! πŸ§ͺ\n", "\n", "Let's see MCP-Agent-1.7B in action β€” give it a request and watch it plan tool calls!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Quick test β€” see the model generate MCP tool calls\n", "from transformers import pipeline\n", "\n", "print(\"πŸ§ͺ Testing MCP-Agent-1.7B...\\n\")\n", "\n", "pipe = pipeline(\n", " \"text-generation\",\n", " model=trainer.model,\n", " tokenizer=tokenizer,\n", " max_new_tokens=512,\n", " do_sample=True,\n", " temperature=0.7,\n", ")\n", "\n", "test_prompts = [\n", " # Test 1: Simple tool call\n", " {\n", " \"messages\": [\n", " {\"role\": \"system\", \"content\": \"You are an MCP agent with access to tools: github_search, read_file, shell_exec. Use JSON-RPC format for tool calls.\"},\n", " {\"role\": \"user\", \"content\": \"Find all Python files in the src/ directory that import pandas\"}\n", " ]\n", " },\n", " # Test 2: Multi-step planning\n", " {\n", " \"messages\": [\n", " {\"role\": \"system\", \"content\": \"You are an MCP agent with access to tools: github_search, read_file, shell_exec, sqlite_query. Plan multi-step tool chains when needed.\"},\n", " {\"role\": \"user\", \"content\": \"Clone the repo https://github.com/example/app, find all TODO comments, and create a summary report\"}\n", " ]\n", " },\n", " # Test 3: Clarification (should ask for missing info)\n", " {\n", " \"messages\": [\n", " {\"role\": \"system\", \"content\": \"You are an MCP agent. Ask for clarification when the request is ambiguous or missing critical information.\"},\n", " {\"role\": \"user\", \"content\": \"Delete the database\"}\n", " ]\n", " },\n", "]\n", "\n", "for i, prompt in enumerate(test_prompts, 1):\n", " print(f\"{'='*60}\")\n", " print(f\"TEST {i}: {prompt['messages'][-1]['content']}\")\n", " print(f\"{'='*60}\")\n", " result = pipe(prompt['messages'])\n", " assistant_msg = result[0]['generated_text'][-1]['content']\n", " print(f\"\\nπŸ€– MCP-Agent Response:\\n{assistant_msg}\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## πŸŽ‰ Congratulations!\n", "\n", "You just trained an AI model! Here's what you accomplished:\n", "\n", "- βœ… Fine-tuned a 2 billion parameter model using LoRA\n", "- βœ… Trained on 16,520 MCP tool-calling examples\n", "- βœ… Published your model to HuggingFace Hub\n", "- βœ… Tested it on real MCP scenarios\n", "\n", "**Your model:** [muhammadtlha944/MCP-Agent-1.7B](https://huggingface.co/muhammadtlha944/MCP-Agent-1.7B)\n", "\n", "**Next steps:**\n", "1. Try more test prompts above\n", "2. Share on X/Twitter with #MCP-Agent\n", "3. Build a Gradio demo for interactive testing\n", "\n", "---\n", "*Built by Muhammad Talha β€” Learning ML by building real projects*" ] } ] }