01 — The Vision: What We're Building & Why

🎯 Your Question

"Manus is amazing. How do they do it? Can we build something like that?"

Short answer: Yes! Not identical — Manus has hundreds of engineers and millions in funding. But we can build a "child version" that captures the core idea and teaches you every concept along the way.

🤖 What Is Manus AI?

Manus (acquired by Meta) is an AI agent — not just a chatbot. Here's what makes it special:

1. It Actually DOES Things (Not Just Talks)

ChatGPT/Claude	Manus
"Here's how to find Python files..."	Actually runs the command and shows you
"Here's a script idea..."	Writes, tests, and deploys the code
"I can help you plan..."	Plans, executes, and verifies

2. Three Specialized Agents Working Together

Manus uses three sub-agents that coordinate:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  PLANNER    │────▶│  EXECUTOR   │────▶│  VERIFIER   │
│             │     │             │     │             │
│ "Break this │     │ "Run shell  │     │ "Check if   │
│  into steps"│     │  commands"  │     │  it worked"  │
│             │     │             │     │             │
│  Strategize  │     │  Navigate   │     │  Quality    │
│  multi-step  │     │  web, write │     │  control    │
│  path        │     │  code, use  │     │  & fix      │
│              │     │  tools      │     │  errors     │
└─────────────┘     └─────────────┘     └─────────────┘

3. Persistent Cloud Environment

Manus runs in a cloud VM (virtual machine):

Files persist between sessions
Can install software (pip install, npm install)
Works while you sleep (asynchronous)

4. Can Browse 50+ Websites Simultaneously

For research tasks, Manus spawns many parallel agents to gather info.

🔬 What We're Building: "Mini-Manus"

Our Simpler Architecture

Instead of three separate agents + cloud VM, we use ONE model with a loop:

User: "Find all Python files and count them"
  │
  ▼
┌─────────────────────────────────────────┐
│         MCP-Agent-1.7B (Our Model)        │
│                                         │
│  ┌─── THINK ───┐                       │
│  │ "I need to   │                       │
│  │  list .py    │                       │
│  │  files"      │                       │
│  └──────┬───────┘                       │
│         │                               │
│  ┌─── ACT ─────┐                       │
│  │ shell_exec({│  ◀── ONE MODEL plays   │
│  │  "command": │      ALL three roles   │
│  │  "find .    │      (planner +        │
│  │   -name     │       executor +       │
│  │   '*.py'"   │       verifier)        │
│  │ })          │                       │
│  └──────┬───────┘                       │
│         │                               │
│  ▼ (Result: "main.py, test.py, utils.py")
│                                         │
│  ┌─── VERIFY ──┐                       │
│  │ "Got 3      │                       │
│  │  files. Now │                       │
│  │  count."   │                       │
│  └──────┬───────┘                       │
│         │                               │
│  ┌─── ACT ─────┐                       │
│  │ python_exec({│                      │
│  │  "code":    │                       │
│  │  "print(3)"│                       │
│  │ })         │                       │
│  └──────┬───────┘                       │
│         │                               │
│  ▼ (Result: "3")                        │
│                                         │
│  ┌── RESPOND ──┐                       │
│  │ "Found 3    │                       │
│  │  Python     │                       │
│  │  files! ✅" │                       │
│  └─────────────┘                       │
└─────────────────────────────────────────┘

Key Differences from Manus

Feature	Manus	Mini-Manus (Ours)
Agents	3 specialized (Planner/Executor/Verifier)	1 model, all roles
Environment	Cloud VM	Local/Gradio Space
Parallelism	50+ simultaneous	Sequential (one at a time)
Cost	$$$/month	$3 one-time
Model Size	GPT-4 class (100B+)	1.7B (100× smaller!)
Persistence	Files persist forever	Session-based
Web Browsing	Real browser	DuckDuckGo search API

Why This Still Impresses People

It runs LOCALLY — No API keys, no cloud costs, no rate limits
It actually DOES things — Not just text, but real shell commands, file operations, Python execution
It's 100× smaller than Manus's models but still functional
It's OPEN SOURCE — Anyone can use, modify, improve it
YOU trained it — From base model to agent in one project

🧠 The Core Insight: Why Small Models CAN Work for Agents

You might think: "How can a 1.7B model compete with GPT-4?"

The secret is FOCUS.

GPT-4 is a generalist — it knows about history, science, poetry, coding, everything. Our model is a specialist — it ONLY knows about tool-calling.

Think of it like this:

GPT-4 = A professor who can teach any subject
Our model = A skilled technician who only knows how to use tools

The TinyAgent paper proved this: a 1.1B model fine-tuned on tool-calling data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter, but because it's focused.

📋 What Makes This a "WOW" Project

When you show this to people, they'll be impressed because:

1. "You trained your own AI agent?"

Most people think you need a PhD and a supercomputer. You don't.

2. "It runs on a laptop?"

1.7B parameters = 4GB in memory. Runs on any gaming laptop.

3. "It can actually modify files?"

Not just text generation — real file system operations, shell commands, Python execution.

4. "It costs $3?"

Compared to Manus's pricing (or OpenAI API costs), this is almost free.

5. "You built this yourself?"

From research → data → training → app. Full pipeline.

🎓 What You'll Learn From This Project

By the end, you'll understand:

✅ How AI agents work (ReAct pattern)
✅ What MCP is and why it matters
✅ How to pick base models for different budgets
✅ LoRA: the magic of cheap fine-tuning
✅ SFT: supervised fine-tuning step-by-step
✅ How to tune hyperparameters (learning rate, batch size, epochs)
✅ How to build an agent harness
✅ How to deploy ML models
✅ How to read research papers and apply them

If you can train a 1.7B model, you can train a 70B model. The concepts are identical — only the scale changes.

🔜 Next Step

Read 02-research.md to see what papers and datasets we found, and why we made the choices we did.