🤖 MCP-Agent-1.7B — Project Overview

Author: Muhammad Talha
Goal: Build a mini-Manus: a small language model fine-tuned for tool-calling, wrapped in an agent harness
Budget: ~$3 (fits well under $10)
Status: ✅ PLANNING COMPLETE — Waiting for your "START" signal

📚 What You'll Learn (A-to-Z)

This project is designed to teach you every concept from the ground up. Read these files in order — each builds on the previous:

File	Topic	What You'll Learn	Read Time
`01-vision.md`	The Vision	What Manus is, what we're building, why it matters	10 min
`02-research.md`	Research	Papers we found, datasets discovered, what works	10 min
`03-architecture.md`	Architecture	ReAct loop, MCP protocol, agent harness design	15 min
`04-training.md`	Training	LoRA, SFT, hyperparameters, why each matters	15 min
`05-dataset.md`	Dataset	What data we have, quality issues, how to improve	10 min
`06-execution-plan.md`	Execution	Exact step-by-step plan when you say START	10 min
`07-tools-research.md`	WOW Tools	Browser automation, image gen, RAG, data analysis, etc.	15 min
`08-tool-ecosystem.md`	Tool Ecosystem	How to add ANY tool dynamically, no retraining	15 min
`GUIDE_A_TO_Z.md`	Master Guide	Complete reference combining all chapters	30 min

Total reading time: ~130 minutes
Total build time: ~5-6 hours
Total cost: ~$1.50

🎯 The Big Picture

You asked: "How does Manus do it, and how can we build something similar?"

What Is Manus?

Manus (acquired by Meta) is an AI agent with three specialized sub-agents:

Planner — Breaks tasks into steps
Executor — Runs code, browses web, uses tools
Verifier — Checks results, fixes errors

It runs in a cloud VM, works while you sleep, and can browse 50+ websites simultaneously.

What We're Building: "Mini-Manus"

We use ONE model (Qwen3-1.7B, 2B parameters) that plays all three roles:

We fine-tune it to natively understand tool-calling (MCP protocol)
We wrap it in a ReAct loop (think → act → observe → repeat)
We give it real tools it can execute (shell, files, Python, web search)
We build a Gradio web app around it

The magic: The model doesn't call external MCP servers — it already KNOWS how to format tool calls because we trained it on 15,000 examples.

Why People Will Say "WOW"

Runs locally — No API costs, no rate limits
Actually DOES things — Not just chat, but real shell commands and file operations
100× smaller than Manus's models — 1.7B vs 100B+ parameters
Costs $3 — Not thousands
YOU built it — From research → data → training → app

💰 Budget Breakdown

Item	Cost	Why
Training (T4 GPU, ~2h)	~$1.20	Fine-tuning with LoRA
Inference testing	~$0.30	Testing the model
Gradio Space (Zero GPU)	$0	Free tier
Contingency	~$0.50	Buffer for retries
Total	~$2	Well under $10! ✅

🔬 Research Highlights (From Our Deep Dive)

Papers That Back Our Approach

Paper	Key Finding	How We Use It
TinyAgent (arXiv:2409.00608)	1.1B model ≈ GPT-4 at tool-calling	Proves small models work
STAR (arXiv:2602.03022)	Qwen3-1.7B beats Llama-3.1-8B	Chose Qwen3 as base
Agent-World (arXiv:2604.18292)	MCP-based training environments	MCP is the right protocol
LoRA Without Regret (2025)	all-linear LoRA = full fine-tuning	Using `target_modules="all-linear"`

Datasets We Discovered

glaiveai/glaive-function-calling-v2 — 100K examples, most popular
Salesforce/xlam-function-calling — 60K diverse examples
Our dataset — 16K examples, already prepared, needs some improvements

📖 Reading Guide

Start Here: 01-vision.md

Understand WHAT we're building and WHY. This answers your core question: "How does Manus work and what are we replicating?"

Then: 02-research.md

See the papers we found and WHY we made our choices. This teaches you how to do research for any ML project.

Then: 03-architecture.md

Learn HOW the agent harness works — the ReAct loop, MCP protocol, tool registry, and how Manus's multi-agent design compares to our simpler approach.

Then: 04-training.md

Understand HOW we train the model — LoRA, SFT, cross-entropy loss, backpropagation, and what each hyperparameter controls. This is the deepest technical chapter.

Then: 05-dataset.md

Review our training data — what's good, what's missing, and how we'd improve it. This teaches you data quality assessment.

Then: 06-execution-plan.md

See the EXACT step-by-step plan with timelines, costs, and decision points. This is our "project management" document.

Then: 07-tools-research.md

Discover the 12+ tools we can add — browser automation, image generation, RAG, data analysis, and more. Ranked by wow factor and feasibility.

Then: 08-tool-ecosystem.md

Learn how to add ANY tool dynamically without retraining. The @tool decorator, MCP servers, and the tool marketplace concept.

Finally: GUIDE_A_TO_Z.md

The master reference combining all chapters into one document. Use this as a quick reference after reading the individual chapters.

🚀 When You're Ready

When you've read all the files and feel confident, just say:

"START"

And we'll begin building. Every step will be explained as we do it.

📁 File Structure

/project/
├── 00-README.md           ← You are here
├── 01-vision.md           ← The Vision & Manus comparison
├── 02-research.md         ← Papers, datasets & findings
├── 03-architecture.md     ← Agent harness & MCP protocol
├── 04-training.md         ← LoRA, SFT & hyperparameters
├── 05-dataset.md          ← Dataset analysis & improvements
├── 06-execution-plan.md   ← Step-by-step build plan
├── 07-tools-research.md   ← WOW tools: browser, RAG, image gen, etc.
├── 08-tool-ecosystem.md   ← How to add ANY tool dynamically
├── GUIDE_A_TO_Z.md        ← Master guide combining all chapters
├── train.py               ← Training script (generated when you say START)
├── agent_app.py           ← Gradio app (generated when you say START)
└── datasets/              ← Training data & related files
    └── mcp-agent-training-data/

Learning ML by building real things — one step at a time. Built by Muhammad Talha