# 01 β€” The Vision: What We're Building & Why ## 🎯 Your Question > "Manus is amazing. How do they do it? Can we build something like that?" **Short answer:** Yes! Not identical β€” Manus has hundreds of engineers and millions in funding. But we can build a **"child version"** that captures the core idea and teaches you every concept along the way. --- ## πŸ€– What Is Manus AI? Manus (acquired by Meta) is an **AI agent** β€” not just a chatbot. Here's what makes it special: ### 1. It Actually DOES Things (Not Just Talks) | ChatGPT/Claude | Manus | |---------------|-------| | "Here's how to find Python files..." | *Actually runs the command and shows you* | | "Here's a script idea..." | *Writes, tests, and deploys the code* | | "I can help you plan..." | *Plans, executes, and verifies* | ### 2. Three Specialized Agents Working Together Manus uses **three sub-agents** that coordinate: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PLANNER │────▢│ EXECUTOR │────▢│ VERIFIER β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ "Break this β”‚ β”‚ "Run shell β”‚ β”‚ "Check if β”‚ β”‚ into steps"β”‚ β”‚ commands" β”‚ β”‚ it worked" β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Strategize β”‚ β”‚ Navigate β”‚ β”‚ Quality β”‚ β”‚ multi-step β”‚ β”‚ web, write β”‚ β”‚ control β”‚ β”‚ path β”‚ β”‚ code, use β”‚ β”‚ & fix β”‚ β”‚ β”‚ β”‚ tools β”‚ β”‚ errors β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### 3. Persistent Cloud Environment Manus runs in a **cloud VM** (virtual machine): - Files persist between sessions - Can install software (`pip install`, `npm install`) - Works while you sleep (asynchronous) ### 4. Can Browse 50+ Websites Simultaneously For research tasks, Manus spawns many parallel agents to gather info. --- ## πŸ”¬ What We're Building: "Mini-Manus" ### Our Simpler Architecture Instead of three separate agents + cloud VM, we use **ONE model** with a loop: ``` User: "Find all Python files and count them" β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MCP-Agent-1.7B (Our Model) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€ THINK ───┐ β”‚ β”‚ β”‚ "I need to β”‚ β”‚ β”‚ β”‚ list .py β”‚ β”‚ β”‚ β”‚ files" β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€ ACT ─────┐ β”‚ β”‚ β”‚ shell_exec({β”‚ ◀── ONE MODEL plays β”‚ β”‚ β”‚ "command": β”‚ ALL three roles β”‚ β”‚ β”‚ "find . β”‚ (planner + β”‚ β”‚ β”‚ -name β”‚ executor + β”‚ β”‚ β”‚ '*.py'" β”‚ verifier) β”‚ β”‚ β”‚ }) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό (Result: "main.py, test.py, utils.py") β”‚ β”‚ β”‚ β”Œβ”€β”€β”€ VERIFY ──┐ β”‚ β”‚ β”‚ "Got 3 β”‚ β”‚ β”‚ β”‚ files. Now β”‚ β”‚ β”‚ β”‚ count." β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€ ACT ─────┐ β”‚ β”‚ β”‚ python_exec({β”‚ β”‚ β”‚ β”‚ "code": β”‚ β”‚ β”‚ β”‚ "print(3)"β”‚ β”‚ β”‚ β”‚ }) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό (Result: "3") β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€ RESPOND ──┐ β”‚ β”‚ β”‚ "Found 3 β”‚ β”‚ β”‚ β”‚ Python β”‚ β”‚ β”‚ β”‚ files! βœ…" β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Key Differences from Manus | Feature | Manus | Mini-Manus (Ours) | |---------|-------|-------------------| | Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles | | Environment | Cloud VM | Local/Gradio Space | | Parallelism | 50+ simultaneous | Sequential (one at a time) | | Cost | $$$/month | $3 one-time | | Model Size | GPT-4 class (100B+) | 1.7B (100Γ— smaller!) | | Persistence | Files persist forever | Session-based | | Web Browsing | Real browser | DuckDuckGo search API | ### Why This Still Impresses People 1. **It runs LOCALLY** β€” No API keys, no cloud costs, no rate limits 2. **It actually DOES things** β€” Not just text, but real shell commands, file operations, Python execution 3. **It's 100Γ— smaller** than Manus's models but still functional 4. **It's OPEN SOURCE** β€” Anyone can use, modify, improve it 5. **YOU trained it** β€” From base model to agent in one project --- ## 🧠 The Core Insight: Why Small Models CAN Work for Agents You might think: *"How can a 1.7B model compete with GPT-4?"* The secret is **FOCUS**. GPT-4 is a generalist β€” it knows about history, science, poetry, coding, everything. Our model is a **specialist** β€” it ONLY knows about tool-calling. Think of it like this: - GPT-4 = A professor who can teach any subject - Our model = A skilled technician who only knows how to use tools The **TinyAgent paper** proved this: a 1.1B model fine-tuned on tool-calling data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter, but because it's **focused**. --- ## πŸ“‹ What Makes This a "WOW" Project When you show this to people, they'll be impressed because: ### 1. "You trained your own AI agent?" Most people think you need a PhD and a supercomputer. You don't. ### 2. "It runs on a laptop?" 1.7B parameters = 4GB in memory. Runs on any gaming laptop. ### 3. "It can actually modify files?" Not just text generation β€” real file system operations, shell commands, Python execution. ### 4. "It costs $3?" Compared to Manus's pricing (or OpenAI API costs), this is almost free. ### 5. "You built this yourself?" From research β†’ data β†’ training β†’ app. Full pipeline. --- ## πŸŽ“ What You'll Learn From This Project By the end, you'll understand: - βœ… How AI agents work (ReAct pattern) - βœ… What MCP is and why it matters - βœ… How to pick base models for different budgets - βœ… LoRA: the magic of cheap fine-tuning - βœ… SFT: supervised fine-tuning step-by-step - βœ… How to tune hyperparameters (learning rate, batch size, epochs) - βœ… How to build an agent harness - βœ… How to deploy ML models - βœ… How to read research papers and apply them **If you can train a 1.7B model, you can train a 70B model.** The concepts are identical β€” only the scale changes. --- ## πŸ”œ Next Step Read `02-research.md` to see what papers and datasets we found, and why we made the choices we did.