MCP-Agent-1.7B / docs /03-architecture.md

Upload docs/03-architecture.md

3b065fc verified 27 days ago

9.85 kB

	# 03 — Architecture: How the Agent Harness Works

	## 🏗️ The Big Picture

	An "agent harness" is the software that wraps around an AI model and gives it the ability to actually do things in the real world. Manus has a sophisticated harness. We're building a simpler but functional one.

	---

	## 🔄 The ReAct Pattern (Reasoning + Acting)

	Every agent — from Manus to AutoGPT to ours — follows this pattern:

	```
	User: "Find all Python files and count them"
	│
	▼
	┌─────────────────────────────────────────┐
	│ ReAct Loop │
	│ │
	│ ┌─── 1. REASON (Think) ───┐ │
	│ │ User wants me to find │ │
	│ │ Python files. I should │ │
	│ │ use the shell_exec tool │ │
	│ │ with a find command. │ │
	│ └───────────┬─────────────┘ │
	│ │ │
	│ ┌─── 2. ACT (Do) ───────┐ │
	│ │ Execute: │ │
	│ │ shell_exec({ │ │
	│ │ "command": │ │
	│ │ "find . -name │ │
	│ │ '*.py'" │ │
	│ │ }) │ │
	│ └───────────┬─────────────┘ │
	│ │ │
	│ ┌─── 3. OBSERVE (See) ──┐ │
	│ │ Result: │ │
	│ │ "main.py, test.py" │ │
	│ └───────────┬─────────────┘ │
	│ │ │
	│ ┌─── 4. REASON (Think) ───┐ │
	│ │ Found 2 files. Now I │ │
	│ │ should count them and │ │
	│ │ report to the user. │ │
	│ └───────────┬─────────────┘ │
	│ │ │
	│ ┌─── 5. ACT (Respond) ───┐ │
	│ │ "I found 2 Python │ │
	│ │ files!" │ │
	│ └────────────────────────┘ │
	│ │
	└─────────────────────────────────────────┘
	```

	This loop continues until the task is complete, max iterations reached, or a tool fails.

	Why this works: The model SEES the results of its actions and can adjust.
	It's not just making one guess — it's in a conversation with the environment.

	---

	## 🛠️ The Three Components

	### 1. The Model (The Brain)

	What it does: Decides WHAT to do

	Our fine-tuned Qwen3-1.7B model has been trained to:
	- Parse tool schemas ("Here's what tools are available")
	- Analyze user requests ("User wants to find files")
	- Generate tool calls in correct format (JSON-RPC for MCP)
	- Plan multi-step sequences ("First list files, then read them")
	- Ask for clarification ("Which directory?")
	- Refuse dangerous requests ("No, I won't delete everything")

	Memory usage: ~4GB (2B params in fp16) + ~100MB (LoRA adapters)

	---

	### 2. The Tool Registry (The Hands)

	What it does: Defines WHAT the model CAN do

	A tool registry is a dictionary mapping tool names to their implementations:

	```python
	TOOL_REGISTRY = {
	"shell_exec": {
	"name": "shell_exec",
	"description": "Execute shell commands",
	"parameters": {"type": "object", "properties": {"command": {"type": "string"}}},
	"function": shell_exec_function,
	},
	"read_file": {
	"name": "read_file",
	"description": "Read file contents",
	"parameters": {"type": "object", "properties": {"path": {"type": "string"}}},
	"function": read_file_function,
	},
	# ... more tools
	}
	```

	The system prompt tells the model about available tools:

	```
	You are MCP-Agent with access to these tools:
	[JSON schema of all tools]

	When you need to use a tool, respond with:
	{"tool": "tool_name", "arguments": {"param": "value"}}
	```

	---

	### 3. The Execution Loop (The Orchestrator)

	What it does: Runs the conversation between user, model, and tools

	```python
	def agent_loop(user_message, max_iterations=5):
	messages = [system_prompt_with_tools]
	messages.append({"role": "user", "content": user_message})

	for i in range(max_iterations):
	# 1. Model THINKS and generates response
	response = model.generate(messages)

	# 2. Check if response contains a tool call
	tool_call = parse_tool_call(response)

	if tool_call is None:
	return response # Done!

	# 3. EXECUTE the tool
	result = execute_tool(tool_call)

	# 4. Add tool result to conversation context
	messages.append({"role": "assistant", "content": response})
	messages.append({"role": "user", "content": f"Tool result: {result}"})

	# 5. Loop back — model sees result and decides next step

	return "Max iterations reached"
	```

	Why this works: The model sees the FULL context including tool results.
	It's reacting to real information, not just guessing.

	---

	## 🌐 MCP: Model Context Protocol Explained

	### What Is MCP?

	MCP is a standard for how AI models communicate with tools. Think of it as
	"HTTP for AI tools" — a common language that any model and any tool can speak.

	### With MCP (The Solution)

	One standard format using JSON-RPC:

	```json
	{
	"jsonrpc": "2.0",
	"method": "tools/call",
	"params": {
	"name": "github_search",
	"arguments": {
	"query": "machine learning",
	"language": "python"
	}
	}
	}
	```

	Result: Any model that speaks MCP can use any MCP-compatible tool.

	### Why We Embed MCP INTO the Model

	Standard approach: Model → Calls MCP Server → Server calls Tool → Result back

	Our approach: Model already knows MCP patterns from training

	Benefits:
	- Faster (no network calls)
	- Works offline
	- No dependency on external MCP servers
	- Can run on edge devices

	---

	## 🎭 How Manus Uses Multiple Agents (And How We Simplify)

	### Manus Architecture

	Manus uses three separate LLM instances with different system prompts:
	- Planner: Breaks tasks into steps, creates DAG
	- Executor: Runs each step (shell, browser, code)
	- Verifier: Checks results, flags errors

	### Our Simplified Architecture

	We use ONE model that plays all three roles via a single system prompt:

	```python
	SYSTEM_PROMPT = """You are MCP-Agent, an autonomous AI assistant that uses
	tools to help users accomplish tasks.

	## Your Identity
	- You are a tool-calling specialist
	- You understand the Model Context Protocol (MCP)
	- You plan multi-step operations when needed
	- You ask for clarification when information is missing
	- You refuse dangerous or harmful requests

	## How You Work
	1. THINK about what the user needs
	2. Use tools when they would help (generate JSON tool calls)
	3. OBSERVE results and decide next steps
	4. Repeat until task is complete
	5. Respond clearly when done

	## Tool Call Format
	When using a tool, respond with:
	{"tool": "tool_name", "arguments": {"param": "value"}}
	"""
	```

	Trade-off: Our approach is simpler but less powerful. Manus's separation
	allows specialization. Our single model might mix roles. But for our use cases, it's sufficient.

	---

	## 🧩 How Adding New Tools Works

	The model doesn't need to know SPECIFIC tools. It needs to know the PATTERN of using tools.

	Adding a new tool is just writing a Python function:

	```python
	@register_tool(
	name="my_new_tool",
	description="What this tool does",
	parameters={
	"type": "object",
	"properties": {
	"param1": {"type": "string", "description": "..."}
	},
	"required": ["param1"]
	}
	)
	def my_new_tool(param1: str) -> str:
	# Your code here
	return "result"
	```

	The decorator adds it to the registry, and the system prompt automatically
	includes it. No retraining needed.

	---

	## 📊 Comparison: Manus vs Mini-Manus

	\| Aspect \| Manus \| Mini-Manus (Ours) \|
	\|--------\|-------\|-------------------\|
	\| Agents \| 3 specialized (Planner/Executor/Verifier) \| 1 model, all roles \|
	\| Environment \| Cloud VM (persistent) \| Local/Gradio Space \|
	\| Parallelism \| 50+ simultaneous \| Sequential \|
	\| Model Size \| GPT-4 class (100B+) \| 1.7B (100× smaller) \|
	\| Cost \| $$$/month \| $3 one-time \|
	\| Web Browsing \| Real browser \| DuckDuckGo search API \|
	\| File System \| Full VM access \| Working directory only \|
	\| Custom Tools \| Via MCP servers \| Python decorators \|
	\| Learning Curve \| Complex setup \| pip install + python app.py \|
	\| Ownership \| Proprietary (Meta) \| Fully open source \|

	---

	## 🎓 Key Concepts You Should Understand

	1. ReAct Pattern: Think → Act → Observe → Loop
	2. Tool Registry: Dictionary of available tools with schemas
	3. MCP Protocol: Standard JSON-RPC format for tool calls
	4. System Prompt: Tells the model WHO it is and WHAT tools it has
	5. Context Window: The model sees all previous messages + tool results
	6. Max Iterations: Safety limit to prevent infinite loops

	---

	## 🔜 Next Step

	Read `04-training.md` to understand HOW we train the model — LoRA, SFT, hyperparameters, and why each matters.