muhammadtlha944 commited on
Commit
7d1f7d5
Β·
verified Β·
1 Parent(s): 504482b

Upload docs/01-vision.md

Browse files
Files changed (1) hide show
  1. docs/01-vision.md +189 -0
docs/01-vision.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 01 β€” The Vision: What We're Building & Why
2
+
3
+ ## 🎯 Your Question
4
+
5
+ > "Manus is amazing. How do they do it? Can we build something like that?"
6
+
7
+ **Short answer:** Yes! Not identical β€” Manus has hundreds of engineers and millions in funding. But we can build a **"child version"** that captures the core idea and teaches you every concept along the way.
8
+
9
+ ---
10
+
11
+ ## πŸ€– What Is Manus AI?
12
+
13
+ Manus (acquired by Meta) is an **AI agent** β€” not just a chatbot. Here's what makes it special:
14
+
15
+ ### 1. It Actually DOES Things (Not Just Talks)
16
+
17
+ | ChatGPT/Claude | Manus |
18
+ |---------------|-------|
19
+ | "Here's how to find Python files..." | *Actually runs the command and shows you* |
20
+ | "Here's a script idea..." | *Writes, tests, and deploys the code* |
21
+ | "I can help you plan..." | *Plans, executes, and verifies* |
22
+
23
+ ### 2. Three Specialized Agents Working Together
24
+
25
+ Manus uses **three sub-agents** that coordinate:
26
+
27
+ ```
28
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
29
+ β”‚ PLANNER │────▢│ EXECUTOR │────▢│ VERIFIER β”‚
30
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
31
+ β”‚ "Break this β”‚ β”‚ "Run shell β”‚ β”‚ "Check if β”‚
32
+ β”‚ into steps"β”‚ β”‚ commands" β”‚ β”‚ it worked" β”‚
33
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
34
+ β”‚ Strategize β”‚ β”‚ Navigate β”‚ β”‚ Quality β”‚
35
+ β”‚ multi-step β”‚ β”‚ web, write β”‚ β”‚ control β”‚
36
+ β”‚ path β”‚ β”‚ code, use β”‚ β”‚ & fix β”‚
37
+ β”‚ β”‚ β”‚ tools β”‚ β”‚ errors β”‚
38
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
39
+ ```
40
+
41
+ ### 3. Persistent Cloud Environment
42
+
43
+ Manus runs in a **cloud VM** (virtual machine):
44
+ - Files persist between sessions
45
+ - Can install software (`pip install`, `npm install`)
46
+ - Works while you sleep (asynchronous)
47
+
48
+ ### 4. Can Browse 50+ Websites Simultaneously
49
+
50
+ For research tasks, Manus spawns many parallel agents to gather info.
51
+
52
+ ---
53
+
54
+ ## πŸ”¬ What We're Building: "Mini-Manus"
55
+
56
+ ### Our Simpler Architecture
57
+
58
+ Instead of three separate agents + cloud VM, we use **ONE model** with a loop:
59
+
60
+ ```
61
+ User: "Find all Python files and count them"
62
+ β”‚
63
+ β–Ό
64
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
65
+ β”‚ MCP-Agent-1.7B (Our Model) β”‚
66
+ β”‚ β”‚
67
+ β”‚ β”Œβ”€β”€β”€ THINK ───┐ β”‚
68
+ β”‚ β”‚ "I need to β”‚ β”‚
69
+ β”‚ β”‚ list .py β”‚ β”‚
70
+ β”‚ β”‚ files" β”‚ β”‚
71
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
72
+ β”‚ β”‚ β”‚
73
+ β”‚ β”Œβ”€β”€β”€ ACT ─────┐ β”‚
74
+ β”‚ β”‚ shell_exec({β”‚ ◀── ONE MODEL plays β”‚
75
+ β”‚ β”‚ "command": β”‚ ALL three roles β”‚
76
+ β”‚ β”‚ "find . β”‚ (planner + β”‚
77
+ β”‚ β”‚ -name β”‚ executor + β”‚
78
+ β”‚ β”‚ '*.py'" β”‚ verifier) β”‚
79
+ β”‚ β”‚ }) β”‚ β”‚
80
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
81
+ β”‚ β”‚ β”‚
82
+ β”‚ β–Ό (Result: "main.py, test.py, utils.py")
83
+ β”‚ β”‚
84
+ β”‚ β”Œβ”€β”€β”€ VERIFY ──┐ β”‚
85
+ β”‚ β”‚ "Got 3 β”‚ β”‚
86
+ β”‚ β”‚ files. Now β”‚ β”‚
87
+ β”‚ β”‚ count." β”‚ β”‚
88
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
89
+ β”‚ β”‚ β”‚
90
+ β”‚ β”Œβ”€β”€β”€ ACT ─────┐ β”‚
91
+ β”‚ β”‚ python_exec({β”‚ β”‚
92
+ β”‚ β”‚ "code": β”‚ β”‚
93
+ β”‚ β”‚ "print(3)"β”‚ β”‚
94
+ β”‚ β”‚ }) β”‚ β”‚
95
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
96
+ β”‚ β”‚ β”‚
97
+ β”‚ β–Ό (Result: "3") β”‚
98
+ β”‚ β”‚
99
+ β”‚ β”Œβ”€β”€ RESPOND ──┐ β”‚
100
+ β”‚ β”‚ "Found 3 β”‚ β”‚
101
+ β”‚ β”‚ Python β”‚ β”‚
102
+ β”‚ β”‚ files! βœ…" β”‚ β”‚
103
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
104
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
105
+ ```
106
+
107
+ ### Key Differences from Manus
108
+
109
+ | Feature | Manus | Mini-Manus (Ours) |
110
+ |---------|-------|-------------------|
111
+ | Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles |
112
+ | Environment | Cloud VM | Local/Gradio Space |
113
+ | Parallelism | 50+ simultaneous | Sequential (one at a time) |
114
+ | Cost | $$$/month | $3 one-time |
115
+ | Model Size | GPT-4 class (100B+) | 1.7B (100Γ— smaller!) |
116
+ | Persistence | Files persist forever | Session-based |
117
+ | Web Browsing | Real browser | DuckDuckGo search API |
118
+
119
+ ### Why This Still Impresses People
120
+
121
+ 1. **It runs LOCALLY** β€” No API keys, no cloud costs, no rate limits
122
+ 2. **It actually DOES things** β€” Not just text, but real shell commands, file operations, Python execution
123
+ 3. **It's 100Γ— smaller** than Manus's models but still functional
124
+ 4. **It's OPEN SOURCE** β€” Anyone can use, modify, improve it
125
+ 5. **YOU trained it** β€” From base model to agent in one project
126
+
127
+ ---
128
+
129
+ ## 🧠 The Core Insight: Why Small Models CAN Work for Agents
130
+
131
+ You might think: *"How can a 1.7B model compete with GPT-4?"*
132
+
133
+ The secret is **FOCUS**.
134
+
135
+ GPT-4 is a generalist β€” it knows about history, science, poetry, coding, everything.
136
+ Our model is a **specialist** β€” it ONLY knows about tool-calling.
137
+
138
+ Think of it like this:
139
+ - GPT-4 = A professor who can teach any subject
140
+ - Our model = A skilled technician who only knows how to use tools
141
+
142
+ The **TinyAgent paper** proved this: a 1.1B model fine-tuned on tool-calling
143
+ data matched GPT-4-Turbo at function-calling tasks. Not because it's smarter,
144
+ but because it's **focused**.
145
+
146
+ ---
147
+
148
+ ## πŸ“‹ What Makes This a "WOW" Project
149
+
150
+ When you show this to people, they'll be impressed because:
151
+
152
+ ### 1. "You trained your own AI agent?"
153
+ Most people think you need a PhD and a supercomputer. You don't.
154
+
155
+ ### 2. "It runs on a laptop?"
156
+ 1.7B parameters = 4GB in memory. Runs on any gaming laptop.
157
+
158
+ ### 3. "It can actually modify files?"
159
+ Not just text generation β€” real file system operations, shell commands, Python execution.
160
+
161
+ ### 4. "It costs $3?"
162
+ Compared to Manus's pricing (or OpenAI API costs), this is almost free.
163
+
164
+ ### 5. "You built this yourself?"
165
+ From research β†’ data β†’ training β†’ app. Full pipeline.
166
+
167
+ ---
168
+
169
+ ## πŸŽ“ What You'll Learn From This Project
170
+
171
+ By the end, you'll understand:
172
+ - βœ… How AI agents work (ReAct pattern)
173
+ - βœ… What MCP is and why it matters
174
+ - βœ… How to pick base models for different budgets
175
+ - βœ… LoRA: the magic of cheap fine-tuning
176
+ - βœ… SFT: supervised fine-tuning step-by-step
177
+ - βœ… How to tune hyperparameters (learning rate, batch size, epochs)
178
+ - βœ… How to build an agent harness
179
+ - βœ… How to deploy ML models
180
+ - βœ… How to read research papers and apply them
181
+
182
+ **If you can train a 1.7B model, you can train a 70B model.**
183
+ The concepts are identical β€” only the scale changes.
184
+
185
+ ---
186
+
187
+ ## πŸ”œ Next Step
188
+
189
+ Read `02-research.md` to see what papers and datasets we found, and why we made the choices we did.