muhammadtlha944
/

MCP-Agent-1.7B

Model card Files Files and versions

xet

Community

muhammadtlha944 commited on 10 days ago

Commit

7d1f7d5

verified ·

1 Parent(s): 504482b

Upload docs/01-vision.md

Browse files

Files changed (1) hide show

docs/01-vision.md +189 -0

docs/01-vision.md ADDED Viewed

	@@ -0,0 +1,189 @@

+# 01 — The Vision: What We're Building & Why
+## 🎯 Your Question
+> "Manus is amazing. How do they do it? Can we build something like that?"
+**Short answer:** Yes! Not identical — Manus has hundreds of engineers and millions in funding. But we can build a **"child version"** that captures the core idea and teaches you every concept along the way.
+---
+## 🤖 What Is Manus AI?
+Manus (acquired by Meta) is an **AI agent** — not just a chatbot. Here's what makes it special:
+### 1. It Actually DOES Things (Not Just Talks)
+| ChatGPT/Claude | Manus |
+|---------------|-------|
+| "Here's how to find Python files..." | *Actually runs the command and shows you* |
+| "Here's a script idea..." | *Writes, tests, and deploys the code* |
+| "I can help you plan..." | *Plans, executes, and verifies* |
+### 2. Three Specialized Agents Working Together
+Manus uses **three sub-agents** that coordinate:
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  PLANNER    │────▶│  EXECUTOR   │────▶│  VERIFIER   │
+│             │     │             │     │             │
+│ "Break this │     │ "Run shell  │     │ "Check if   │
+│  into steps"│     │  commands"  │     │  it worked"  │
+│             │     │             │     │             │
+│  Strategize  │     │  Navigate   │     │  Quality    │
+│  multi-step  │     │  web, write │     │  control    │
+│  path        │     │  code, use  │     │  & fix      │
+│              │     │  tools      │     │  errors     │
+└─────────────┘     └─────────────┘     └─────────────┘
+```
+### 3. Persistent Cloud Environment
+Manus runs in a **cloud VM** (virtual machine):
+- Files persist between sessions
+- Can install software (`pip install`, `npm install`)
+- Works while you sleep (asynchronous)
+### 4. Can Browse 50+ Websites Simultaneously
+For research tasks, Manus spawns many parallel agents to gather info.
+---
+## 🔬 What We're Building: "Mini-Manus"
+### Our Simpler Architecture
+Instead of three separate agents + cloud VM, we use **ONE model** with a loop:
+```
+User: "Find all Python files and count them"
+  │
+  ▼
+┌─────────────────────────────────────────┐
+│         MCP-Agent-1.7B (Our Model)        │
+│                                         │
+│  ┌─── THINK ───┐                       │
+│  │ "I need to   │                       │
+│  │  list .py    │                       │
+│  │  files"      │                       │
+│  └──────┬───────┘                       │
+│         │                               │
+│  ┌─── ACT ─────┐                       │
+│  │ shell_exec({│  ◀── ONE MODEL plays   │
+│  │  "command": │      ALL three roles   │
+│  │  "find .    │      (planner +        │
+│  │   -name     │       executor +       │
+│  │   '*.py'"   │       verifier)        │
+│  │ })          │                       │
+│  └──────┬───────┘                       │
+│         │                               │
+│  ▼ (Result: "main.py, test.py, utils.py")
+│                                         │
+│  ┌─── VERIFY ──┐                       │
+│  │ "Got 3      │                       │
+│  │  files. Now │                       │
+│  │  count."   │                       │
+│  └──────┬───────┘                       │
+│         │                               │
+│  ┌─── ACT ─────┐                       │
+│  │ python_exec({│                      │
+│  │  "code":    │                       │
+│  │  "print(3)"│                       │
+│  │ })         │                       │
+│  └──────┬───────┘                       │
+│         │                               │
+│  ▼ (Result: "3")                        │
+│                                         │
+│  ┌── RESPOND ──┐                       │
+│  │ "Found 3    │                       │
+│  │  Python     │                       │
+│  │  files! ✅" │                       │
+│  └─────────────┘                       │
+└─────────────────────────────────────────┘
+```
+### Key Differences from Manus
+| Feature | Manus | Mini-Manus (Ours) |
+|---------|-------|-------------------|
+| Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles |
+| Environment | Cloud VM | Local/Gradio Space |
+| Parallelism | 50+ simultaneous | Sequential (one at a time) |
+| Cost | $$$/month | $3 one-time |
+| Model Size | GPT-4 class (100B+) | 1.7B (100× smaller!) |
+| Persistence | Files persist forever | Session-based |
+| Web Browsing | Real browser | DuckDuckGo search API |
+### Why This Still Impresses People
+1. **It runs LOCALLY** — No API keys, no cloud costs, no rate limits
+2. **It actually DOES things** — Not just text, but real shell commands, file operations, Python execution
+3. **It's 100× smaller** than Manus's models but still functional
+4. **It's OPEN SOURCE** — Anyone can use, modify, improve it
+5. **YOU trained it** — From base model to agent in one project
+---
+## 🧠 The Core Insight: Why Small Models CAN Work for Agents
+You might think: *"How can a 1.7B model compete with GPT-4?"*
+The secret is **FOCUS**.
+GPT-4 is a generalist — it knows about history, science, poetry, coding, everything.
+Our model is a **specialist** — it ONLY knows about tool-calling.
+Think of it like this:
+- GPT-4 = A professor who can teach any subject
+- Our model = A skilled technician who only knows how to use tools
+The **TinyAgent paper** proved this: a 1.1B model fine-tuned on tool-calling
+data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter,
+but because it's **focused**.
+---
+## 📋 What Makes This a "WOW" Project
+When you show this to people, they'll be impressed because:
+### 1. "You trained your own AI agent?"
+Most people think you need a PhD and a supercomputer. You don't.
+### 2. "It runs on a laptop?"
+1.7B parameters = 4GB in memory. Runs on any gaming laptop.
+### 3. "It can actually modify files?"
+Not just text generation — real file system operations, shell commands, Python execution.
+### 4. "It costs $3?"
+Compared to Manus's pricing (or OpenAI API costs), this is almost free.
+### 5. "You built this yourself?"
+From research → data → training → app. Full pipeline.
+---
+## 🎓 What You'll Learn From This Project
+By the end, you'll understand:
+- ✅ How AI agents work (ReAct pattern)
+- ✅ What MCP is and why it matters
+- ✅ How to pick base models for different budgets
+- ✅ LoRA: the magic of cheap fine-tuning
+- ✅ SFT: supervised fine-tuning step-by-step
+- ✅ How to tune hyperparameters (learning rate, batch size, epochs)
+- ✅ How to build an agent harness
+- ✅ How to deploy ML models
+- ✅ How to read research papers and apply them
+**If you can train a 1.7B model, you can train a 70B model.**
+The concepts are identical — only the scale changes.
+---
+## 🔜 Next Step
+Read `02-research.md` to see what papers and datasets we found, and why we made the choices we did.