| # 03 β Architecture: How the Agent Harness Works |
|
|
| ## ποΈ The Big Picture |
|
|
| An "agent harness" is the software that wraps around an AI model and gives it the ability to actually **do things** in the real world. Manus has a sophisticated harness. We're building a simpler but functional one. |
|
|
| --- |
|
|
| ## π The ReAct Pattern (Reasoning + Acting) |
|
|
| Every agent β from Manus to AutoGPT to ours β follows this pattern: |
|
|
| ``` |
| User: "Find all Python files and count them" |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββ |
| β ReAct Loop β |
| β β |
| β ββββ 1. REASON (Think) ββββ β |
| β β User wants me to find β β |
| β β Python files. I should β β |
| β β use the shell_exec tool β β |
| β β with a find command. β β |
| β βββββββββββββ¬ββββββββββββββ β |
| β β β |
| β ββββ 2. ACT (Do) ββββββββ β |
| β β Execute: β β |
| β β shell_exec({ β β |
| β β "command": β β |
| β β "find . -name β β |
| β β '*.py'" β β |
| β β }) β β |
| β βββββββββββββ¬ββββββββββββββ β |
| β β β |
| β ββββ 3. OBSERVE (See) βββ β |
| β β Result: β β |
| β β "main.py, test.py" β β |
| β βββββββββββββ¬ββββββββββββββ β |
| β β β |
| β ββββ 4. REASON (Think) ββββ β |
| β β Found 2 files. Now I β β |
| β β should count them and β β |
| β β report to the user. β β |
| β βββββββββββββ¬ββββββββββββββ β |
| β β β |
| β ββββ 5. ACT (Respond) ββββ β |
| β β "I found 2 Python β β |
| β β files!" β β |
| β ββββββββββββββββββββββββββ β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| This loop continues until the task is complete, max iterations reached, or a tool fails. |
|
|
| **Why this works:** The model SEES the results of its actions and can adjust. |
| It's not just making one guess β it's in a conversation with the environment. |
|
|
| --- |
|
|
| ## π οΈ The Three Components |
|
|
| ### 1. The Model (The Brain) |
|
|
| **What it does:** Decides WHAT to do |
|
|
| Our fine-tuned Qwen3-1.7B model has been trained to: |
| - Parse tool schemas ("Here's what tools are available") |
| - Analyze user requests ("User wants to find files") |
| - Generate tool calls in correct format (JSON-RPC for MCP) |
| - Plan multi-step sequences ("First list files, then read them") |
| - Ask for clarification ("Which directory?") |
| - Refuse dangerous requests ("No, I won't delete everything") |
|
|
| **Memory usage:** ~4GB (2B params in fp16) + ~100MB (LoRA adapters) |
|
|
| --- |
|
|
| ### 2. The Tool Registry (The Hands) |
|
|
| **What it does:** Defines WHAT the model CAN do |
|
|
| A tool registry is a dictionary mapping tool names to their implementations: |
|
|
| ```python |
| TOOL_REGISTRY = { |
| "shell_exec": { |
| "name": "shell_exec", |
| "description": "Execute shell commands", |
| "parameters": {"type": "object", "properties": {"command": {"type": "string"}}}, |
| "function": shell_exec_function, |
| }, |
| "read_file": { |
| "name": "read_file", |
| "description": "Read file contents", |
| "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}, |
| "function": read_file_function, |
| }, |
| # ... more tools |
| } |
| ``` |
|
|
| **The system prompt tells the model about available tools:** |
|
|
| ``` |
| You are MCP-Agent with access to these tools: |
| [JSON schema of all tools] |
| |
| When you need to use a tool, respond with: |
| {"tool": "tool_name", "arguments": {"param": "value"}} |
| ``` |
|
|
| --- |
|
|
| ### 3. The Execution Loop (The Orchestrator) |
|
|
| **What it does:** Runs the conversation between user, model, and tools |
|
|
| ```python |
| def agent_loop(user_message, max_iterations=5): |
| messages = [system_prompt_with_tools] |
| messages.append({"role": "user", "content": user_message}) |
| |
| for i in range(max_iterations): |
| # 1. Model THINKS and generates response |
| response = model.generate(messages) |
| |
| # 2. Check if response contains a tool call |
| tool_call = parse_tool_call(response) |
| |
| if tool_call is None: |
| return response # Done! |
| |
| # 3. EXECUTE the tool |
| result = execute_tool(tool_call) |
| |
| # 4. Add tool result to conversation context |
| messages.append({"role": "assistant", "content": response}) |
| messages.append({"role": "user", "content": f"Tool result: {result}"}) |
| |
| # 5. Loop back β model sees result and decides next step |
| |
| return "Max iterations reached" |
| ``` |
|
|
| **Why this works:** The model sees the FULL context including tool results. |
| It's reacting to real information, not just guessing. |
|
|
| --- |
|
|
| ## π MCP: Model Context Protocol Explained |
|
|
| ### What Is MCP? |
|
|
| MCP is a standard for how AI models communicate with tools. Think of it as |
| "HTTP for AI tools" β a common language that any model and any tool can speak. |
|
|
| ### With MCP (The Solution) |
|
|
| One standard format using JSON-RPC: |
|
|
| ```json |
| { |
| "jsonrpc": "2.0", |
| "method": "tools/call", |
| "params": { |
| "name": "github_search", |
| "arguments": { |
| "query": "machine learning", |
| "language": "python" |
| } |
| } |
| } |
| ``` |
|
|
| **Result:** Any model that speaks MCP can use any MCP-compatible tool. |
|
|
| ### Why We Embed MCP INTO the Model |
|
|
| **Standard approach:** Model β Calls MCP Server β Server calls Tool β Result back |
|
|
| **Our approach:** Model **already knows** MCP patterns from training |
|
|
| Benefits: |
| - Faster (no network calls) |
| - Works offline |
| - No dependency on external MCP servers |
| - Can run on edge devices |
|
|
| --- |
|
|
| ## π How Manus Uses Multiple Agents (And How We Simplify) |
|
|
| ### Manus Architecture |
|
|
| Manus uses **three separate LLM instances** with different system prompts: |
| - **Planner:** Breaks tasks into steps, creates DAG |
| - **Executor:** Runs each step (shell, browser, code) |
| - **Verifier:** Checks results, flags errors |
|
|
| ### Our Simplified Architecture |
|
|
| We use **ONE model** that plays all three roles via a single system prompt: |
|
|
| ```python |
| SYSTEM_PROMPT = """You are MCP-Agent, an autonomous AI assistant that uses |
| tools to help users accomplish tasks. |
| |
| ## Your Identity |
| - You are a tool-calling specialist |
| - You understand the Model Context Protocol (MCP) |
| - You plan multi-step operations when needed |
| - You ask for clarification when information is missing |
| - You refuse dangerous or harmful requests |
| |
| ## How You Work |
| 1. THINK about what the user needs |
| 2. Use tools when they would help (generate JSON tool calls) |
| 3. OBSERVE results and decide next steps |
| 4. Repeat until task is complete |
| 5. Respond clearly when done |
| |
| ## Tool Call Format |
| When using a tool, respond with: |
| {"tool": "tool_name", "arguments": {"param": "value"}} |
| """ |
| ``` |
|
|
| **Trade-off:** Our approach is simpler but less powerful. Manus's separation |
| allows specialization. Our single model might mix roles. But for our use cases, it's sufficient. |
|
|
| --- |
|
|
| ## π§© How Adding New Tools Works |
|
|
| The model doesn't need to know SPECIFIC tools. It needs to know the PATTERN of using tools. |
|
|
| **Adding a new tool is just writing a Python function:** |
|
|
| ```python |
| @register_tool( |
| name="my_new_tool", |
| description="What this tool does", |
| parameters={ |
| "type": "object", |
| "properties": { |
| "param1": {"type": "string", "description": "..."} |
| }, |
| "required": ["param1"] |
| } |
| ) |
| def my_new_tool(param1: str) -> str: |
| # Your code here |
| return "result" |
| ``` |
|
|
| The decorator adds it to the registry, and the system prompt automatically |
| includes it. **No retraining needed.** |
|
|
| --- |
|
|
| ## π Comparison: Manus vs Mini-Manus |
|
|
| | Aspect | Manus | Mini-Manus (Ours) | |
| |--------|-------|-------------------| |
| | **Agents** | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles | |
| | **Environment** | Cloud VM (persistent) | Local/Gradio Space | |
| | **Parallelism** | 50+ simultaneous | Sequential | |
| | **Model Size** | GPT-4 class (100B+) | 1.7B (100Γ smaller) | |
| | **Cost** | $$$/month | $3 one-time | |
| | **Web Browsing** | Real browser | DuckDuckGo search API | |
| | **File System** | Full VM access | Working directory only | |
| | **Custom Tools** | Via MCP servers | Python decorators | |
| | **Learning Curve** | Complex setup | pip install + python app.py | |
| | **Ownership** | Proprietary (Meta) | Fully open source | |
|
|
| --- |
|
|
| ## π Key Concepts You Should Understand |
|
|
| 1. **ReAct Pattern:** Think β Act β Observe β Loop |
| 2. **Tool Registry:** Dictionary of available tools with schemas |
| 3. **MCP Protocol:** Standard JSON-RPC format for tool calls |
| 4. **System Prompt:** Tells the model WHO it is and WHAT tools it has |
| 5. **Context Window:** The model sees all previous messages + tool results |
| 6. **Max Iterations:** Safety limit to prevent infinite loops |
|
|
| --- |
|
|
| ## π Next Step |
|
|
| Read `04-training.md` to understand HOW we train the model β LoRA, SFT, hyperparameters, and why each matters. |
|
|