01 β The Vision: What We're Building & Why
π― Your Question
"Manus is amazing. How do they do it? Can we build something like that?"
Short answer: Yes! Not identical β Manus has hundreds of engineers and millions in funding. But we can build a "child version" that captures the core idea and teaches you every concept along the way.
π€ What Is Manus AI?
Manus (acquired by Meta) is an AI agent β not just a chatbot. Here's what makes it special:
1. It Actually DOES Things (Not Just Talks)
| ChatGPT/Claude | Manus |
|---|---|
| "Here's how to find Python files..." | Actually runs the command and shows you |
| "Here's a script idea..." | Writes, tests, and deploys the code |
| "I can help you plan..." | Plans, executes, and verifies |
2. Three Specialized Agents Working Together
Manus uses three sub-agents that coordinate:
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β PLANNER ββββββΆβ EXECUTOR ββββββΆβ VERIFIER β
β β β β β β
β "Break this β β "Run shell β β "Check if β
β into steps"β β commands" β β it worked" β
β β β β β β
β Strategize β β Navigate β β Quality β
β multi-step β β web, write β β control β
β path β β code, use β β & fix β
β β β tools β β errors β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
3. Persistent Cloud Environment
Manus runs in a cloud VM (virtual machine):
- Files persist between sessions
- Can install software (
pip install,npm install) - Works while you sleep (asynchronous)
4. Can Browse 50+ Websites Simultaneously
For research tasks, Manus spawns many parallel agents to gather info.
π¬ What We're Building: "Mini-Manus"
Our Simpler Architecture
Instead of three separate agents + cloud VM, we use ONE model with a loop:
User: "Find all Python files and count them"
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β MCP-Agent-1.7B (Our Model) β
β β
β ββββ THINK ββββ β
β β "I need to β β
β β list .py β β
β β files" β β
β ββββββββ¬ββββββββ β
β β β
β ββββ ACT ββββββ β
β β shell_exec({β βββ ONE MODEL plays β
β β "command": β ALL three roles β
β β "find . β (planner + β
β β -name β executor + β
β β '*.py'" β verifier) β
β β }) β β
β ββββββββ¬ββββββββ β
β β β
β βΌ (Result: "main.py, test.py, utils.py")
β β
β ββββ VERIFY βββ β
β β "Got 3 β β
β β files. Now β β
β β count." β β
β ββββββββ¬ββββββββ β
β β β
β ββββ ACT ββββββ β
β β python_exec({β β
β β "code": β β
β β "print(3)"β β
β β }) β β
β ββββββββ¬ββββββββ β
β β β
β βΌ (Result: "3") β
β β
β βββ RESPOND βββ β
β β "Found 3 β β
β β Python β β
β β files! β
" β β
β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Key Differences from Manus
| Feature | Manus | Mini-Manus (Ours) |
|---|---|---|
| Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles |
| Environment | Cloud VM | Local/Gradio Space |
| Parallelism | 50+ simultaneous | Sequential (one at a time) |
| Cost | $$$/month | $3 one-time |
| Model Size | GPT-4 class (100B+) | 1.7B (100Γ smaller!) |
| Persistence | Files persist forever | Session-based |
| Web Browsing | Real browser | DuckDuckGo search API |
Why This Still Impresses People
- It runs LOCALLY β No API keys, no cloud costs, no rate limits
- It actually DOES things β Not just text, but real shell commands, file operations, Python execution
- It's 100Γ smaller than Manus's models but still functional
- It's OPEN SOURCE β Anyone can use, modify, improve it
- YOU trained it β From base model to agent in one project
π§ The Core Insight: Why Small Models CAN Work for Agents
You might think: "How can a 1.7B model compete with GPT-4?"
The secret is FOCUS.
GPT-4 is a generalist β it knows about history, science, poetry, coding, everything. Our model is a specialist β it ONLY knows about tool-calling.
Think of it like this:
- GPT-4 = A professor who can teach any subject
- Our model = A skilled technician who only knows how to use tools
The TinyAgent paper proved this: a 1.1B model fine-tuned on tool-calling data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter, but because it's focused.
π What Makes This a "WOW" Project
When you show this to people, they'll be impressed because:
1. "You trained your own AI agent?"
Most people think you need a PhD and a supercomputer. You don't.
2. "It runs on a laptop?"
1.7B parameters = 4GB in memory. Runs on any gaming laptop.
3. "It can actually modify files?"
Not just text generation β real file system operations, shell commands, Python execution.
4. "It costs $3?"
Compared to Manus's pricing (or OpenAI API costs), this is almost free.
5. "You built this yourself?"
From research β data β training β app. Full pipeline.
π What You'll Learn From This Project
By the end, you'll understand:
- β How AI agents work (ReAct pattern)
- β What MCP is and why it matters
- β How to pick base models for different budgets
- β LoRA: the magic of cheap fine-tuning
- β SFT: supervised fine-tuning step-by-step
- β How to tune hyperparameters (learning rate, batch size, epochs)
- β How to build an agent harness
- β How to deploy ML models
- β How to read research papers and apply them
If you can train a 1.7B model, you can train a 70B model. The concepts are identical β only the scale changes.
π Next Step
Read 02-research.md to see what papers and datasets we found, and why we made the choices we did.