| # 01 β The Vision: What We're Building & Why |
|
|
| ## π― Your Question |
|
|
| > "Manus is amazing. How do they do it? Can we build something like that?" |
|
|
| **Short answer:** Yes! Not identical β Manus has hundreds of engineers and millions in funding. But we can build a **"child version"** that captures the core idea and teaches you every concept along the way. |
|
|
| --- |
|
|
| ## π€ What Is Manus AI? |
|
|
| Manus (acquired by Meta) is an **AI agent** β not just a chatbot. Here's what makes it special: |
|
|
| ### 1. It Actually DOES Things (Not Just Talks) |
|
|
| | ChatGPT/Claude | Manus | |
| |---------------|-------| |
| | "Here's how to find Python files..." | *Actually runs the command and shows you* | |
| | "Here's a script idea..." | *Writes, tests, and deploys the code* | |
| | "I can help you plan..." | *Plans, executes, and verifies* | |
|
|
| ### 2. Three Specialized Agents Working Together |
|
|
| Manus uses **three sub-agents** that coordinate: |
|
|
| ``` |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ |
| β PLANNER ββββββΆβ EXECUTOR ββββββΆβ VERIFIER β |
| β β β β β β |
| β "Break this β β "Run shell β β "Check if β |
| β into steps"β β commands" β β it worked" β |
| β β β β β β |
| β Strategize β β Navigate β β Quality β |
| β multi-step β β web, write β β control β |
| β path β β code, use β β & fix β |
| β β β tools β β errors β |
| βββββββββββββββ βββββββββββββββ βββββββββββββββ |
| ``` |
|
|
| ### 3. Persistent Cloud Environment |
|
|
| Manus runs in a **cloud VM** (virtual machine): |
| - Files persist between sessions |
| - Can install software (`pip install`, `npm install`) |
| - Works while you sleep (asynchronous) |
|
|
| ### 4. Can Browse 50+ Websites Simultaneously |
|
|
| For research tasks, Manus spawns many parallel agents to gather info. |
|
|
| --- |
|
|
| ## π¬ What We're Building: "Mini-Manus" |
|
|
| ### Our Simpler Architecture |
|
|
| Instead of three separate agents + cloud VM, we use **ONE model** with a loop: |
|
|
| ``` |
| User: "Find all Python files and count them" |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β MCP-Agent-1.7B (Our Model) β |
| β β |
| β ββββ THINK ββββ β |
| β β "I need to β β |
| β β list .py β β |
| β β files" β β |
| β ββββββββ¬ββββββββ β |
| β β β |
| β ββββ ACT ββββββ β |
| β β shell_exec({β βββ ONE MODEL plays β |
| β β "command": β ALL three roles β |
| β β "find . β (planner + β |
| β β -name β executor + β |
| β β '*.py'" β verifier) β |
| β β }) β β |
| β ββββββββ¬ββββββββ β |
| β β β |
| β βΌ (Result: "main.py, test.py, utils.py") |
| β β |
| β ββββ VERIFY βββ β |
| β β "Got 3 β β |
| β β files. Now β β |
| β β count." β β |
| β ββββββββ¬ββββββββ β |
| β β β |
| β ββββ ACT ββββββ β |
| β β python_exec({β β |
| β β "code": β β |
| β β "print(3)"β β |
| β β }) β β |
| β ββββββββ¬ββββββββ β |
| β β β |
| β βΌ (Result: "3") β |
| β β |
| β βββ RESPOND βββ β |
| β β "Found 3 β β |
| β β Python β β |
| β β files! β
" β β |
| β βββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ### Key Differences from Manus |
|
|
| | Feature | Manus | Mini-Manus (Ours) | |
| |---------|-------|-------------------| |
| | Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles | |
| | Environment | Cloud VM | Local/Gradio Space | |
| | Parallelism | 50+ simultaneous | Sequential (one at a time) | |
| | Cost | $$$/month | $3 one-time | |
| | Model Size | GPT-4 class (100B+) | 1.7B (100Γ smaller!) | |
| | Persistence | Files persist forever | Session-based | |
| | Web Browsing | Real browser | DuckDuckGo search API | |
|
|
| ### Why This Still Impresses People |
|
|
| 1. **It runs LOCALLY** β No API keys, no cloud costs, no rate limits |
| 2. **It actually DOES things** β Not just text, but real shell commands, file operations, Python execution |
| 3. **It's 100Γ smaller** than Manus's models but still functional |
| 4. **It's OPEN SOURCE** β Anyone can use, modify, improve it |
| 5. **YOU trained it** β From base model to agent in one project |
|
|
| --- |
|
|
| ## π§ The Core Insight: Why Small Models CAN Work for Agents |
|
|
| You might think: *"How can a 1.7B model compete with GPT-4?"* |
|
|
| The secret is **FOCUS**. |
|
|
| GPT-4 is a generalist β it knows about history, science, poetry, coding, everything. |
| Our model is a **specialist** β it ONLY knows about tool-calling. |
|
|
| Think of it like this: |
| - GPT-4 = A professor who can teach any subject |
| - Our model = A skilled technician who only knows how to use tools |
|
|
| The **TinyAgent paper** proved this: a 1.1B model fine-tuned on tool-calling |
| data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter, |
| but because it's **focused**. |
|
|
| --- |
|
|
| ## π What Makes This a "WOW" Project |
|
|
| When you show this to people, they'll be impressed because: |
|
|
| ### 1. "You trained your own AI agent?" |
| Most people think you need a PhD and a supercomputer. You don't. |
|
|
| ### 2. "It runs on a laptop?" |
| 1.7B parameters = 4GB in memory. Runs on any gaming laptop. |
|
|
| ### 3. "It can actually modify files?" |
| Not just text generation β real file system operations, shell commands, Python execution. |
|
|
| ### 4. "It costs $3?" |
| Compared to Manus's pricing (or OpenAI API costs), this is almost free. |
|
|
| ### 5. "You built this yourself?" |
| From research β data β training β app. Full pipeline. |
|
|
| --- |
|
|
| ## π What You'll Learn From This Project |
|
|
| By the end, you'll understand: |
| - β
How AI agents work (ReAct pattern) |
| - β
What MCP is and why it matters |
| - β
How to pick base models for different budgets |
| - β
LoRA: the magic of cheap fine-tuning |
| - β
SFT: supervised fine-tuning step-by-step |
| - β
How to tune hyperparameters (learning rate, batch size, epochs) |
| - β
How to build an agent harness |
| - β
How to deploy ML models |
| - β
How to read research papers and apply them |
|
|
| **If you can train a 1.7B model, you can train a 70B model.** |
| The concepts are identical β only the scale changes. |
|
|
| --- |
|
|
| ## π Next Step |
|
|
| Read `02-research.md` to see what papers and datasets we found, and why we made the choices we did. |
|
|