MCP-Agent-1.7B / docs /01-vision.md
muhammadtlha944's picture
Upload docs/01-vision.md
7d1f7d5 verified

01 β€” The Vision: What We're Building & Why

🎯 Your Question

"Manus is amazing. How do they do it? Can we build something like that?"

Short answer: Yes! Not identical β€” Manus has hundreds of engineers and millions in funding. But we can build a "child version" that captures the core idea and teaches you every concept along the way.


πŸ€– What Is Manus AI?

Manus (acquired by Meta) is an AI agent β€” not just a chatbot. Here's what makes it special:

1. It Actually DOES Things (Not Just Talks)

ChatGPT/Claude Manus
"Here's how to find Python files..." Actually runs the command and shows you
"Here's a script idea..." Writes, tests, and deploys the code
"I can help you plan..." Plans, executes, and verifies

2. Three Specialized Agents Working Together

Manus uses three sub-agents that coordinate:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PLANNER    │────▢│  EXECUTOR   │────▢│  VERIFIER   β”‚
β”‚             β”‚     β”‚             β”‚     β”‚             β”‚
β”‚ "Break this β”‚     β”‚ "Run shell  β”‚     β”‚ "Check if   β”‚
β”‚  into steps"β”‚     β”‚  commands"  β”‚     β”‚  it worked"  β”‚
β”‚             β”‚     β”‚             β”‚     β”‚             β”‚
β”‚  Strategize  β”‚     β”‚  Navigate   β”‚     β”‚  Quality    β”‚
β”‚  multi-step  β”‚     β”‚  web, write β”‚     β”‚  control    β”‚
β”‚  path        β”‚     β”‚  code, use  β”‚     β”‚  & fix      β”‚
β”‚              β”‚     β”‚  tools      β”‚     β”‚  errors     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Persistent Cloud Environment

Manus runs in a cloud VM (virtual machine):

  • Files persist between sessions
  • Can install software (pip install, npm install)
  • Works while you sleep (asynchronous)

4. Can Browse 50+ Websites Simultaneously

For research tasks, Manus spawns many parallel agents to gather info.


πŸ”¬ What We're Building: "Mini-Manus"

Our Simpler Architecture

Instead of three separate agents + cloud VM, we use ONE model with a loop:

User: "Find all Python files and count them"
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         MCP-Agent-1.7B (Our Model)        β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€ THINK ───┐                       β”‚
β”‚  β”‚ "I need to   β”‚                       β”‚
β”‚  β”‚  list .py    β”‚                       β”‚
β”‚  β”‚  files"      β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚         β”‚                               β”‚
β”‚  β”Œβ”€β”€β”€ ACT ─────┐                       β”‚
β”‚  β”‚ shell_exec({β”‚  ◀── ONE MODEL plays   β”‚
β”‚  β”‚  "command": β”‚      ALL three roles   β”‚
β”‚  β”‚  "find .    β”‚      (planner +        β”‚
β”‚  β”‚   -name     β”‚       executor +       β”‚
β”‚  β”‚   '*.py'"   β”‚       verifier)        β”‚
β”‚  β”‚ })          β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚         β”‚                               β”‚
β”‚  β–Ό (Result: "main.py, test.py, utils.py")
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€ VERIFY ──┐                       β”‚
β”‚  β”‚ "Got 3      β”‚                       β”‚
β”‚  β”‚  files. Now β”‚                       β”‚
β”‚  β”‚  count."   β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚         β”‚                               β”‚
β”‚  β”Œβ”€β”€β”€ ACT ─────┐                       β”‚
β”‚  β”‚ python_exec({β”‚                      β”‚
β”‚  β”‚  "code":    β”‚                       β”‚
β”‚  β”‚  "print(3)"β”‚                       β”‚
β”‚  β”‚ })         β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚         β”‚                               β”‚
β”‚  β–Ό (Result: "3")                        β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€ RESPOND ──┐                       β”‚
β”‚  β”‚ "Found 3    β”‚                       β”‚
β”‚  β”‚  Python     β”‚                       β”‚
β”‚  β”‚  files! βœ…" β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Differences from Manus

Feature Manus Mini-Manus (Ours)
Agents 3 specialized (Planner/Executor/Verifier) 1 model, all roles
Environment Cloud VM Local/Gradio Space
Parallelism 50+ simultaneous Sequential (one at a time)
Cost $$$/month $3 one-time
Model Size GPT-4 class (100B+) 1.7B (100Γ— smaller!)
Persistence Files persist forever Session-based
Web Browsing Real browser DuckDuckGo search API

Why This Still Impresses People

  1. It runs LOCALLY β€” No API keys, no cloud costs, no rate limits
  2. It actually DOES things β€” Not just text, but real shell commands, file operations, Python execution
  3. It's 100Γ— smaller than Manus's models but still functional
  4. It's OPEN SOURCE β€” Anyone can use, modify, improve it
  5. YOU trained it β€” From base model to agent in one project

🧠 The Core Insight: Why Small Models CAN Work for Agents

You might think: "How can a 1.7B model compete with GPT-4?"

The secret is FOCUS.

GPT-4 is a generalist β€” it knows about history, science, poetry, coding, everything. Our model is a specialist β€” it ONLY knows about tool-calling.

Think of it like this:

  • GPT-4 = A professor who can teach any subject
  • Our model = A skilled technician who only knows how to use tools

The TinyAgent paper proved this: a 1.1B model fine-tuned on tool-calling data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter, but because it's focused.


πŸ“‹ What Makes This a "WOW" Project

When you show this to people, they'll be impressed because:

1. "You trained your own AI agent?"

Most people think you need a PhD and a supercomputer. You don't.

2. "It runs on a laptop?"

1.7B parameters = 4GB in memory. Runs on any gaming laptop.

3. "It can actually modify files?"

Not just text generation β€” real file system operations, shell commands, Python execution.

4. "It costs $3?"

Compared to Manus's pricing (or OpenAI API costs), this is almost free.

5. "You built this yourself?"

From research β†’ data β†’ training β†’ app. Full pipeline.


πŸŽ“ What You'll Learn From This Project

By the end, you'll understand:

  • βœ… How AI agents work (ReAct pattern)
  • βœ… What MCP is and why it matters
  • βœ… How to pick base models for different budgets
  • βœ… LoRA: the magic of cheap fine-tuning
  • βœ… SFT: supervised fine-tuning step-by-step
  • βœ… How to tune hyperparameters (learning rate, batch size, epochs)
  • βœ… How to build an agent harness
  • βœ… How to deploy ML models
  • βœ… How to read research papers and apply them

If you can train a 1.7B model, you can train a 70B model. The concepts are identical β€” only the scale changes.


πŸ”œ Next Step

Read 02-research.md to see what papers and datasets we found, and why we made the choices we did.