Upload docs/01-vision.md

7d1f7d5 verified 11 days ago

7.77 kB

	# 01 — The Vision: What We're Building & Why

	## 🎯 Your Question

	> "Manus is amazing. How do they do it? Can we build something like that?"

	Short answer: Yes! Not identical — Manus has hundreds of engineers and millions in funding. But we can build a "child version" that captures the core idea and teaches you every concept along the way.

	---

	## 🤖 What Is Manus AI?

	Manus (acquired by Meta) is an AI agent — not just a chatbot. Here's what makes it special:

	### 1. It Actually DOES Things (Not Just Talks)

	\| ChatGPT/Claude \| Manus \|
	\|---------------\|-------\|
	\| "Here's how to find Python files..." \| Actually runs the command and shows you \|
	\| "Here's a script idea..." \| Writes, tests, and deploys the code \|
	\| "I can help you plan..." \| Plans, executes, and verifies \|

	### 2. Three Specialized Agents Working Together

	Manus uses three sub-agents that coordinate:

	```
	┌─────────────┐ ┌─────────────┐ ┌─────────────┐
	│ PLANNER │────▶│ EXECUTOR │────▶│ VERIFIER │
	│ │ │ │ │ │
	│ "Break this │ │ "Run shell │ │ "Check if │
	│ into steps"│ │ commands" │ │ it worked" │
	│ │ │ │ │ │
	│ Strategize │ │ Navigate │ │ Quality │
	│ multi-step │ │ web, write │ │ control │
	│ path │ │ code, use │ │ & fix │
	│ │ │ tools │ │ errors │
	└─────────────┘ └─────────────┘ └─────────────┘
	```

	### 3. Persistent Cloud Environment

	Manus runs in a cloud VM (virtual machine):
	- Files persist between sessions
	- Can install software (`pip install`, `npm install`)
	- Works while you sleep (asynchronous)

	### 4. Can Browse 50+ Websites Simultaneously

	For research tasks, Manus spawns many parallel agents to gather info.

	---

	## 🔬 What We're Building: "Mini-Manus"

	### Our Simpler Architecture

	Instead of three separate agents + cloud VM, we use ONE model with a loop:

	```
	User: "Find all Python files and count them"
	│
	▼
	┌─────────────────────────────────────────┐
	│ MCP-Agent-1.7B (Our Model) │
	│ │
	│ ┌─── THINK ───┐ │
	│ │ "I need to │ │
	│ │ list .py │ │
	│ │ files" │ │
	│ └──────┬───────┘ │
	│ │ │
	│ ┌─── ACT ─────┐ │
	│ │ shell_exec({│ ◀── ONE MODEL plays │
	│ │ "command": │ ALL three roles │
	│ │ "find . │ (planner + │
	│ │ -name │ executor + │
	│ │ '*.py'" │ verifier) │
	│ │ }) │ │
	│ └──────┬───────┘ │
	│ │ │
	│ ▼ (Result: "main.py, test.py, utils.py")
	│ │
	│ ┌─── VERIFY ──┐ │
	│ │ "Got 3 │ │
	│ │ files. Now │ │
	│ │ count." │ │
	│ └──────┬───────┘ │
	│ │ │
	│ ┌─── ACT ─────┐ │
	│ │ python_exec({│ │
	│ │ "code": │ │
	│ │ "print(3)"│ │
	│ │ }) │ │
	│ └──────┬───────┘ │
	│ │ │
	│ ▼ (Result: "3") │
	│ │
	│ ┌── RESPOND ──┐ │
	│ │ "Found 3 │ │
	│ │ Python │ │
	│ │ files! ✅" │ │
	│ └─────────────┘ │
	└─────────────────────────────────────────┘
	```

	### Key Differences from Manus

	\| Feature \| Manus \| Mini-Manus (Ours) \|
	\|---------\|-------\|-------------------\|
	\| Agents \| 3 specialized (Planner/Executor/Verifier) \| 1 model, all roles \|
	\| Environment \| Cloud VM \| Local/Gradio Space \|
	\| Parallelism \| 50+ simultaneous \| Sequential (one at a time) \|
	\| Cost \| $$$/month \| $3 one-time \|
	\| Model Size \| GPT-4 class (100B+) \| 1.7B (100× smaller!) \|
	\| Persistence \| Files persist forever \| Session-based \|
	\| Web Browsing \| Real browser \| DuckDuckGo search API \|

	### Why This Still Impresses People

	1. It runs LOCALLY — No API keys, no cloud costs, no rate limits
	2. It actually DOES things — Not just text, but real shell commands, file operations, Python execution
	3. It's 100× smaller than Manus's models but still functional
	4. It's OPEN SOURCE — Anyone can use, modify, improve it
	5. YOU trained it — From base model to agent in one project

	---

	## 🧠 The Core Insight: Why Small Models CAN Work for Agents

	You might think: "How can a 1.7B model compete with GPT-4?"

	The secret is FOCUS.

	GPT-4 is a generalist — it knows about history, science, poetry, coding, everything.
	Our model is a specialist — it ONLY knows about tool-calling.

	Think of it like this:
	- GPT-4 = A professor who can teach any subject
	- Our model = A skilled technician who only knows how to use tools

	The TinyAgent paper proved this: a 1.1B model fine-tuned on tool-calling
	data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter,
	but because it's focused.

	---

	## 📋 What Makes This a "WOW" Project

	When you show this to people, they'll be impressed because:

	### 1. "You trained your own AI agent?"
	Most people think you need a PhD and a supercomputer. You don't.

	### 2. "It runs on a laptop?"
	1.7B parameters = 4GB in memory. Runs on any gaming laptop.

	### 3. "It can actually modify files?"
	Not just text generation — real file system operations, shell commands, Python execution.

	### 4. "It costs $3?"
	Compared to Manus's pricing (or OpenAI API costs), this is almost free.

	### 5. "You built this yourself?"
	From research → data → training → app. Full pipeline.

	---

	## 🎓 What You'll Learn From This Project

	By the end, you'll understand:
	- ✅ How AI agents work (ReAct pattern)
	- ✅ What MCP is and why it matters
	- ✅ How to pick base models for different budgets
	- ✅ LoRA: the magic of cheap fine-tuning
	- ✅ SFT: supervised fine-tuning step-by-step
	- ✅ How to tune hyperparameters (learning rate, batch size, epochs)
	- ✅ How to build an agent harness
	- ✅ How to deploy ML models
	- ✅ How to read research papers and apply them

	If you can train a 1.7B model, you can train a 70B model.
	The concepts are identical — only the scale changes.

	---

	## 🔜 Next Step

	Read `02-research.md` to see what papers and datasets we found, and why we made the choices we did.