MCP-Agent-1.7B / docs /03-architecture.md

Upload docs/03-architecture.md

3b065fc verified 27 days ago

preview code

raw

history blame contribute delete

9.85 kB

03 — Architecture: How the Agent Harness Works

🏗️ The Big Picture

An "agent harness" is the software that wraps around an AI model and gives it the ability to actually do things in the real world. Manus has a sophisticated harness. We're building a simpler but functional one.

🔄 The ReAct Pattern (Reasoning + Acting)

Every agent — from Manus to AutoGPT to ours — follows this pattern:

User: "Find all Python files and count them"
  │
  ▼
┌─────────────────────────────────────────┐
│              ReAct Loop                 │
│                                         │
│  ┌─── 1. REASON (Think) ───┐            │
│  │ User wants me to find   │            │
│  │ Python files. I should  │            │
│  │ use the shell_exec tool │            │
│  │ with a find command.    │            │
│  └───────────┬─────────────┘            │
│              │                          │
│  ┌─── 2. ACT (Do) ───────┐            │
│  │ Execute:               │            │
│  │ shell_exec({           │            │
│  │   "command":           │            │
│  │   "find . -name       │            │
│  │    '*.py'"            │            │
│  │ })                     │            │
│  └───────────┬─────────────┘            │
│              │                          │
│  ┌─── 3. OBSERVE (See) ──┐            │
│  │ Result:               │            │
│  │ "main.py, test.py"     │            │
│  └───────────┬─────────────┘            │
│              │                          │
│  ┌─── 4. REASON (Think) ───┐            │
│  │ Found 2 files. Now I    │            │
│  │ should count them and   │            │
│  │ report to the user.     │            │
│  └───────────┬─────────────┘            │
│              │                          │
│  ┌─── 5. ACT (Respond) ───┐            │
│  │ "I found 2 Python      │            │
│  │  files!"               │            │
│  └────────────────────────┘            │
│                                         │
└─────────────────────────────────────────┘

This loop continues until the task is complete, max iterations reached, or a tool fails.

Why this works: The model SEES the results of its actions and can adjust. It's not just making one guess — it's in a conversation with the environment.

🛠️ The Three Components

1. The Model (The Brain)

What it does: Decides WHAT to do

Our fine-tuned Qwen3-1.7B model has been trained to:

Parse tool schemas ("Here's what tools are available")
Analyze user requests ("User wants to find files")
Generate tool calls in correct format (JSON-RPC for MCP)
Plan multi-step sequences ("First list files, then read them")
Ask for clarification ("Which directory?")
Refuse dangerous requests ("No, I won't delete everything")

Memory usage: ~4GB (2B params in fp16) + ~100MB (LoRA adapters)

2. The Tool Registry (The Hands)

What it does: Defines WHAT the model CAN do

A tool registry is a dictionary mapping tool names to their implementations:

TOOL_REGISTRY = {
    "shell_exec": {
        "name": "shell_exec",
        "description": "Execute shell commands",
        "parameters": {"type": "object", "properties": {"command": {"type": "string"}}},
        "function": shell_exec_function,
    },
    "read_file": {
        "name": "read_file", 
        "description": "Read file contents",
        "parameters": {"type": "object", "properties": {"path": {"type": "string"}}},
        "function": read_file_function,
    },
    # ... more tools
}

The system prompt tells the model about available tools:

You are MCP-Agent with access to these tools:
[JSON schema of all tools]

When you need to use a tool, respond with:
{"tool": "tool_name", "arguments": {"param": "value"}}

3. The Execution Loop (The Orchestrator)

What it does: Runs the conversation between user, model, and tools

def agent_loop(user_message, max_iterations=5):
    messages = [system_prompt_with_tools]
    messages.append({"role": "user", "content": user_message})
    
    for i in range(max_iterations):
        # 1. Model THINKS and generates response
        response = model.generate(messages)
        
        # 2. Check if response contains a tool call
        tool_call = parse_tool_call(response)
        
        if tool_call is None:
            return response  # Done!
        
        # 3. EXECUTE the tool
        result = execute_tool(tool_call)
        
        # 4. Add tool result to conversation context
        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "user", "content": f"Tool result: {result}"})
        
        # 5. Loop back — model sees result and decides next step
    
    return "Max iterations reached"

Why this works: The model sees the FULL context including tool results. It's reacting to real information, not just guessing.

🌐 MCP: Model Context Protocol Explained

What Is MCP?

MCP is a standard for how AI models communicate with tools. Think of it as "HTTP for AI tools" — a common language that any model and any tool can speak.

With MCP (The Solution)

One standard format using JSON-RPC:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "github_search",
    "arguments": {
      "query": "machine learning",
      "language": "python"
    }
  }
}

Result: Any model that speaks MCP can use any MCP-compatible tool.

Why We Embed MCP INTO the Model

Standard approach: Model → Calls MCP Server → Server calls Tool → Result back

Our approach: Model already knows MCP patterns from training

Benefits:

Faster (no network calls)
Works offline
No dependency on external MCP servers
Can run on edge devices

🎭 How Manus Uses Multiple Agents (And How We Simplify)

Manus Architecture

Manus uses three separate LLM instances with different system prompts:

Planner: Breaks tasks into steps, creates DAG
Executor: Runs each step (shell, browser, code)
Verifier: Checks results, flags errors

Our Simplified Architecture

We use ONE model that plays all three roles via a single system prompt:

SYSTEM_PROMPT = """You are MCP-Agent, an autonomous AI assistant that uses 
tools to help users accomplish tasks.

## Your Identity
- You are a tool-calling specialist
- You understand the Model Context Protocol (MCP)
- You plan multi-step operations when needed
- You ask for clarification when information is missing
- You refuse dangerous or harmful requests

## How You Work
1. THINK about what the user needs
2. Use tools when they would help (generate JSON tool calls)
3. OBSERVE results and decide next steps
4. Repeat until task is complete
5. Respond clearly when done

## Tool Call Format
When using a tool, respond with:
{"tool": "tool_name", "arguments": {"param": "value"}}
"""

Trade-off: Our approach is simpler but less powerful. Manus's separation allows specialization. Our single model might mix roles. But for our use cases, it's sufficient.

🧩 How Adding New Tools Works

The model doesn't need to know SPECIFIC tools. It needs to know the PATTERN of using tools.

Adding a new tool is just writing a Python function:

@register_tool(
    name="my_new_tool",
    description="What this tool does",
    parameters={
        "type": "object",
        "properties": {
            "param1": {"type": "string", "description": "..."}
        },
        "required": ["param1"]
    }
)
def my_new_tool(param1: str) -> str:
    # Your code here
    return "result"

The decorator adds it to the registry, and the system prompt automatically includes it. No retraining needed.

📊 Comparison: Manus vs Mini-Manus

Aspect	Manus	Mini-Manus (Ours)
Agents	3 specialized (Planner/Executor/Verifier)	1 model, all roles
Environment	Cloud VM (persistent)	Local/Gradio Space
Parallelism	50+ simultaneous	Sequential
Model Size	GPT-4 class (100B+)	1.7B (100× smaller)
Cost	$$$/month	$3 one-time
Web Browsing	Real browser	DuckDuckGo search API
File System	Full VM access	Working directory only
Custom Tools	Via MCP servers	Python decorators
Learning Curve	Complex setup	pip install + python app.py
Ownership	Proprietary (Meta)	Fully open source

🎓 Key Concepts You Should Understand

ReAct Pattern: Think → Act → Observe → Loop
Tool Registry: Dictionary of available tools with schemas
MCP Protocol: Standard JSON-RPC format for tool calls
System Prompt: Tells the model WHO it is and WHAT tools it has
Context Window: The model sees all previous messages + tool results
Max Iterations: Safety limit to prevent infinite loops

🔜 Next Step

Read 04-training.md to understand HOW we train the model — LoRA, SFT, hyperparameters, and why each matters.