03 β Architecture: How the Agent Harness Works
ποΈ The Big Picture
An "agent harness" is the software that wraps around an AI model and gives it the ability to actually do things in the real world. Manus has a sophisticated harness. We're building a simpler but functional one.
π The ReAct Pattern (Reasoning + Acting)
Every agent β from Manus to AutoGPT to ours β follows this pattern:
User: "Find all Python files and count them"
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β ReAct Loop β
β β
β ββββ 1. REASON (Think) ββββ β
β β User wants me to find β β
β β Python files. I should β β
β β use the shell_exec tool β β
β β with a find command. β β
β βββββββββββββ¬ββββββββββββββ β
β β β
β ββββ 2. ACT (Do) ββββββββ β
β β Execute: β β
β β shell_exec({ β β
β β "command": β β
β β "find . -name β β
β β '*.py'" β β
β β }) β β
β βββββββββββββ¬ββββββββββββββ β
β β β
β ββββ 3. OBSERVE (See) βββ β
β β Result: β β
β β "main.py, test.py" β β
β βββββββββββββ¬ββββββββββββββ β
β β β
β ββββ 4. REASON (Think) ββββ β
β β Found 2 files. Now I β β
β β should count them and β β
β β report to the user. β β
β βββββββββββββ¬ββββββββββββββ β
β β β
β ββββ 5. ACT (Respond) ββββ β
β β "I found 2 Python β β
β β files!" β β
β ββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββ
This loop continues until the task is complete, max iterations reached, or a tool fails.
Why this works: The model SEES the results of its actions and can adjust. It's not just making one guess β it's in a conversation with the environment.
π οΈ The Three Components
1. The Model (The Brain)
What it does: Decides WHAT to do
Our fine-tuned Qwen3-1.7B model has been trained to:
- Parse tool schemas ("Here's what tools are available")
- Analyze user requests ("User wants to find files")
- Generate tool calls in correct format (JSON-RPC for MCP)
- Plan multi-step sequences ("First list files, then read them")
- Ask for clarification ("Which directory?")
- Refuse dangerous requests ("No, I won't delete everything")
Memory usage: ~4GB (2B params in fp16) + ~100MB (LoRA adapters)
2. The Tool Registry (The Hands)
What it does: Defines WHAT the model CAN do
A tool registry is a dictionary mapping tool names to their implementations:
TOOL_REGISTRY = {
"shell_exec": {
"name": "shell_exec",
"description": "Execute shell commands",
"parameters": {"type": "object", "properties": {"command": {"type": "string"}}},
"function": shell_exec_function,
},
"read_file": {
"name": "read_file",
"description": "Read file contents",
"parameters": {"type": "object", "properties": {"path": {"type": "string"}}},
"function": read_file_function,
},
# ... more tools
}
The system prompt tells the model about available tools:
You are MCP-Agent with access to these tools:
[JSON schema of all tools]
When you need to use a tool, respond with:
{"tool": "tool_name", "arguments": {"param": "value"}}
3. The Execution Loop (The Orchestrator)
What it does: Runs the conversation between user, model, and tools
def agent_loop(user_message, max_iterations=5):
messages = [system_prompt_with_tools]
messages.append({"role": "user", "content": user_message})
for i in range(max_iterations):
# 1. Model THINKS and generates response
response = model.generate(messages)
# 2. Check if response contains a tool call
tool_call = parse_tool_call(response)
if tool_call is None:
return response # Done!
# 3. EXECUTE the tool
result = execute_tool(tool_call)
# 4. Add tool result to conversation context
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": f"Tool result: {result}"})
# 5. Loop back β model sees result and decides next step
return "Max iterations reached"
Why this works: The model sees the FULL context including tool results. It's reacting to real information, not just guessing.
π MCP: Model Context Protocol Explained
What Is MCP?
MCP is a standard for how AI models communicate with tools. Think of it as "HTTP for AI tools" β a common language that any model and any tool can speak.
With MCP (The Solution)
One standard format using JSON-RPC:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "github_search",
"arguments": {
"query": "machine learning",
"language": "python"
}
}
}
Result: Any model that speaks MCP can use any MCP-compatible tool.
Why We Embed MCP INTO the Model
Standard approach: Model β Calls MCP Server β Server calls Tool β Result back
Our approach: Model already knows MCP patterns from training
Benefits:
- Faster (no network calls)
- Works offline
- No dependency on external MCP servers
- Can run on edge devices
π How Manus Uses Multiple Agents (And How We Simplify)
Manus Architecture
Manus uses three separate LLM instances with different system prompts:
- Planner: Breaks tasks into steps, creates DAG
- Executor: Runs each step (shell, browser, code)
- Verifier: Checks results, flags errors
Our Simplified Architecture
We use ONE model that plays all three roles via a single system prompt:
SYSTEM_PROMPT = """You are MCP-Agent, an autonomous AI assistant that uses
tools to help users accomplish tasks.
## Your Identity
- You are a tool-calling specialist
- You understand the Model Context Protocol (MCP)
- You plan multi-step operations when needed
- You ask for clarification when information is missing
- You refuse dangerous or harmful requests
## How You Work
1. THINK about what the user needs
2. Use tools when they would help (generate JSON tool calls)
3. OBSERVE results and decide next steps
4. Repeat until task is complete
5. Respond clearly when done
## Tool Call Format
When using a tool, respond with:
{"tool": "tool_name", "arguments": {"param": "value"}}
"""
Trade-off: Our approach is simpler but less powerful. Manus's separation allows specialization. Our single model might mix roles. But for our use cases, it's sufficient.
π§© How Adding New Tools Works
The model doesn't need to know SPECIFIC tools. It needs to know the PATTERN of using tools.
Adding a new tool is just writing a Python function:
@register_tool(
name="my_new_tool",
description="What this tool does",
parameters={
"type": "object",
"properties": {
"param1": {"type": "string", "description": "..."}
},
"required": ["param1"]
}
)
def my_new_tool(param1: str) -> str:
# Your code here
return "result"
The decorator adds it to the registry, and the system prompt automatically includes it. No retraining needed.
π Comparison: Manus vs Mini-Manus
| Aspect | Manus | Mini-Manus (Ours) |
|---|---|---|
| Agents | 3 specialized (Planner/Executor/Verifier) | 1 model, all roles |
| Environment | Cloud VM (persistent) | Local/Gradio Space |
| Parallelism | 50+ simultaneous | Sequential |
| Model Size | GPT-4 class (100B+) | 1.7B (100Γ smaller) |
| Cost | $$$/month | $3 one-time |
| Web Browsing | Real browser | DuckDuckGo search API |
| File System | Full VM access | Working directory only |
| Custom Tools | Via MCP servers | Python decorators |
| Learning Curve | Complex setup | pip install + python app.py |
| Ownership | Proprietary (Meta) | Fully open source |
π Key Concepts You Should Understand
- ReAct Pattern: Think β Act β Observe β Loop
- Tool Registry: Dictionary of available tools with schemas
- MCP Protocol: Standard JSON-RPC format for tool calls
- System Prompt: Tells the model WHO it is and WHAT tools it has
- Context Window: The model sees all previous messages + tool results
- Max Iterations: Safety limit to prevent infinite loops
π Next Step
Read 04-training.md to understand HOW we train the model β LoRA, SFT, hyperparameters, and why each matters.