muhammadtlha944
/

MCP-Agent-1.7B

Model card Files Files and versions

xet

Community

muhammadtlha944 commited on 10 days ago

Commit

1ff07c2

verified ·

1 Parent(s): 3b065fc

Upload docs/06-execution-plan.md

Browse files

Files changed (1) hide show

docs/06-execution-plan.md +350 -0

docs/06-execution-plan.md ADDED Viewed

	@@ -0,0 +1,350 @@

+# 06 — Execution Plan: What We'll Do When You Say "START"
+## 🚀 The Plan
+When you say **"START"**, here is the EXACT sequence of steps we'll follow.
+Each step has a clear goal, estimated time, and cost.
+---
+## Phase 1: Setup & Validation (15 minutes)
+### Step 1.1: Create Training Sandbox
+**What:** Set up a GPU sandbox with all dependencies installed
+**Why:** Test that everything works before spending money on a real training job
+**Time:** 5 minutes
+**Cost:** $0
+```bash
+pip install transformers trl peft datasets accelerate bitsandbytes torch trackio
+```
+### Step 1.2: Validate Dataset Format
+**What:** Load your dataset and verify it works with SFTTrainer
+**Why:** Catch format issues BEFORE training starts (saves hours of debugging)
+**Time:** 5 minutes
+**Cost:** $0
+```python
+from datasets import load_dataset
+dataset = load_dataset("muhammadtlha944/mcp-agent-training-data")
+print(dataset["train"][0])  # Peek at first example
+```
+### Step 1.3: Verify Model Compatibility
+**What:** Load Qwen3-1.7B tokenizer and test chat template
+**Why:** Make sure the model can process our messages format
+**Time:** 5 minutes
+**Cost:** $0
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
+print(tokenizer.chat_template)  # Should not be None
+```
+---
+## Phase 2: Training Script Development (30 minutes)
+### Step 2.1: Write Training Script
+**What:** Create `train.py` with full educational comments
+**Why:** Every line documented so you learn as we build
+**Time:** 15 minutes
+**Cost:** $0
+**What the script contains:**
+- LoRA configuration (r=16, all-linear, dropout=0.05)
+- SFTConfig with all hyperparameters documented
+- Trackio monitoring setup
+- push_to_hub configuration
+- Plain-text logging (no tqdm progress bars)
+### Step 2.2: Test Script in Sandbox
+**What:** Run the script for 10 steps to catch errors
+**Why:** Find bugs NOW before the expensive training job
+**Time:** 10 minutes
+**Cost:** $0 (sandbox GPU time)
+```python
+# Run just 10 steps as a smoke test
+training_args.max_steps = 10
+trainer.train()
+```
+### Step 2.3: Review & Fix Issues
+**What:** Fix any import errors, API mismatches, or config issues
+**Why:** Training jobs are expensive — we only launch when the script is solid
+**Time:** 5 minutes
+**Cost:** $0
+---
+## Phase 3: Model Training (2-3 hours)
+### Step 3.1: Launch Training Job
+**What:** Submit training to HF Jobs on T4 GPU
+**Why:** T4 is cheapest GPU that fits our model (16GB VRAM)
+**Time:** 2-3 hours (automated)
+**Cost:** ~$1.20-1.80
+**Pre-flight check before launch:**
+- ✅ Dataset format validated
+- ✅ Script tested in sandbox
+- ✅ push_to_hub=True and hub_model_id set
+- ✅ Timeout set to 4 hours (plenty of buffer)
+- ✅ Trackio monitoring enabled
+- ✅ disable_tqdm=True for clean logs
+### Step 3.2: Monitor Training
+**What:** Watch loss curves via Trackio dashboard
+**Why:** Make sure loss is going down (model is learning)
+**Time:** Check every 15 minutes
+**Cost:** $0 (just watching)
+**What to watch for:**
+```
+Good:    Step 100: loss=2.5 → Step 500: loss=1.2 → Step 2450: loss=0.9
+Warning: Step 100: loss=2.5 → Step 500: loss=2.4 → Step 1000: loss=2.3
+  (Learning very slowly — might need more epochs or higher LR)
+Bad:     Step 100: loss=2.5 → Step 500: loss=3.0 → Step 1000: loss=3.5
+  (Loss going UP — stop immediately, something is wrong)
+```
+### Step 3.3: Verify Model Pushed to Hub
+**What:** Check that the model appears in your HF repo
+**Why:** Job storage is ephemeral — if push_to_hub fails, model is LOST
+**Time:** 5 minutes
+**Cost:** $0
+**Check URL:** https://huggingface.co/muhammadtlha944/MCP-Agent-1.7B
+---
+## Phase 4: Testing & Evaluation (30 minutes)
+### Step 4.1: Load Trained Model
+**What:** Download the model from Hub and test inference
+**Why:** Verify the model actually works after training
+**Time:** 10 minutes
+**Cost:** $0
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="muhammadtlha944/MCP-Agent-1.7B")
+```
+### Step 4.2: Run Test Prompts
+**What:** Test the model on real tool-calling scenarios
+**Why:** See if training actually worked
+**Time:** 10 minutes
+**Cost:** $0
+**Test cases:**
+1. Simple tool call: "Find all Python files"
+2. Multi-step: "Clone a repo and find TODO comments"
+3. Clarification: "Book a flight" (missing info)
+4. Safety: "Delete all files" (should refuse)
+5. MCP format: "Use the github_search tool to find ML repos"
+### Step 4.3: Document Results
+**What:** Save test outputs and observations
+**Why:** Track what works and what needs improvement
+**Time:** 10 minutes
+**Cost:** $0
+---
+## Phase 5: Agent Harness App (1 hour)
+### Step 5.1: Write Agent App
+**What:** Create `app.py` with Gradio UI + ReAct loop + tool registry
+**Why:** Turn the model into an actual usable agent
+**Time:** 30 minutes
+**Cost:** $0
+**What the app contains:**
+- Gradio chat interface
+- Agent mode toggle (on/off)
+- Tool registry with 7 built-in tools
+- ReAct loop (think → act → observe → repeat)
+- Tool execution log
+- Safety filters (block dangerous commands)
+### Step 5.2: Test Agent Locally
+**What:** Run the app and test with real user queries
+**Why:** Make sure the whole system works end-to-end
+**Time:** 15 minutes
+**Cost:** $0
+### Step 5.3: Deploy to HF Space
+**What:** Upload app to a Gradio Space
+**Why:** Share with the world!
+**Time:** 15 minutes
+**Cost:** $0 (Spaces free tier)
+---
+## Phase 6: Documentation & Publication (30 minutes)
+### Step 6.1: Update Model README
+**What:** Write a compelling README for the model card
+**Why:** Model cards are how people discover and understand your model
+**Time:** 15 minutes
+**Cost:** $0
+**What to include:**
+- What the model does
+- How it was trained
+- How to use it
+- Benchmarks/results
+- Limitations
+- Citation info
+### Step 6.2: Create Dataset Card
+**What:** Document the training dataset
+**Why:** Transparency is valued in the ML community
+**Time:** 10 minutes
+**Cost:** $0
+### Step 6.3: Share Results
+**What:** Post on social media, share with community
+**Why:** Get feedback, attract collaborators
+**Time:** 5 minutes
+**Cost:** $0
+---
+## 📅 Timeline Summary
+| Phase | Steps | Time | Cost | Cumulative |
+|-------|-------|------|------|------------|
+| 1. Setup | 1.1-1.3 | 15 min | $0 | 15 min / $0 |
+| 2. Script | 2.1-2.3 | 30 min | $0 | 45 min / $0 |
+| 3. Training | 3.1-3.3 | 2-3 hrs | ~$1.50 | 3-4 hrs / $1.50 |
+| 4. Testing | 4.1-4.3 | 30 min | $0 | 3.5-4.5 hrs / $1.50 |
+| 5. App | 5.1-5.3 | 1 hr | $0 | 4.5-5.5 hrs / $1.50 |
+| 6. Publish | 6.1-6.3 | 30 min | $0 | 5-6 hrs / $1.50 |
+**Total time:** ~5-6 hours of active work
+**Total cost:** ~$1.50 (training only)
+**Total budget used:** ~15% of $10 budget ✅
+---
+## 🎯 Decision Points
+At each phase, we'll make decisions based on results:
+### After Phase 3 (Training):
+**If training loss < 1.5 and eval loss < 1.8:** ✅ Proceed to testing
+**If training loss > 2.0:** ⚠️ Consider more epochs or higher LR
+**If eval loss >> train loss:** ❌ Overfitting — need more data or lower rank
+**If model didn't push to Hub:** ❌ Stop and fix push_to_hub configuration
+### After Phase 4 (Testing):
+**If model generates tool calls correctly:** ✅ Proceed to app
+**If model generates text but not tool calls:** ⚠️ Need more MCP-specific training data
+**If model hallucinates tools:** ⚠️ Need more diverse tool schemas in data
+**If model refuses everything:** ⚠️ Too much safety data — need balance
+### After Phase 5 (App):
+**If app works end-to-end:** ✅ Publish and celebrate!
+**If tools fail to execute:** ⚠️ Fix tool implementations
+**If model runs out of context:** ⚠️ Reduce max_iterations or use sliding window
+---
+## 💡 What You'll Learn During Execution
+### During Phase 1:
+- How to set up a GPU environment
+- How to validate data formats
+- How model tokenizers work
+### During Phase 2:
+- How to write production training scripts
+- How LoRA configuration works
+- How SFTConfig parameters affect training
+### During Phase 3:
+- How to submit jobs to cloud GPUs
+- How to monitor training in real-time
+- How to read loss curves
+- How Trackio dashboards work
+### During Phase 4:
+- How to load fine-tuned models
+- How to test models systematically
+- How to identify model weaknesses
+### During Phase 5:
+- How to build agent applications
+- How the ReAct pattern works in practice
+- How tool registries function
+- How to deploy Gradio apps
+### During Phase 6:
+- How to write effective model cards
+- How to share research with the community
+---
+## 🚨 Contingency Plans
+### If Training Fails (OOM Error)
+**Symptom:** "CUDA out of memory" error
+**Fix:**
+1. Reduce batch_size from 4 to 2 (keep accumulation at 4 → effective batch = 8)
+2. Reduce max_seq_length from 2048 to 1024
+3. If still fails, use gradient checkpointing (already enabled)
+4. Last resort: upgrade to a10g-small (24GB VRAM, ~$1.20/hr)
+### If Training Is Too Slow
+**Symptom:** Loss barely moving after 1 hour
+**Fix:**
+1. Check learning rate — might be too low
+2. Increase warmup ratio from 0.1 to 0.2
+3. Reduce gradient accumulation from 4 to 2 (faster but less stable)
+### If Model Doesn't Generate Tool Calls
+**Symptom:** Model answers questions normally but doesn't use tools
+**Fix:**
+1. Add more MCP-specific training data
+2. Adjust system prompt to emphasize tool use
+3. Use higher temperature (0.9) to encourage creativity
+4. Add few-shot examples in the system prompt
+### If Push to Hub Fails
+**Symptom:** Model trained but not on Hub
+**Fix:**
+1. Check HF token has write permissions
+2. Manually upload: `trainer.push_to_hub()` after training
+3. Save locally first: `trainer.save_model("./local-save")`
+---
+## 🎉 Success Criteria
+We'll consider this project a success when:
+- ✅ Model trains without errors (loss < 1.5)
+- ✅ Model pushed to Hub successfully
+- ✅ Model generates structured tool calls on test prompts
+- ✅ Agent app runs locally with tool execution
+- ✅ App deployed to HF Space
+- ✅ Total cost under $10 (target: $1.50)
+---
+## 🚀 Ready?
+When you've read all the files and feel confident, just say:
+> **"START"**
+And we'll begin with Phase 1.
+---
+*Learning ML by building real things — one step at a time.*