# Purpose Agent — Architecture Documentation

> For developers building on the framework, researchers understanding the theory, and anyone curious about how self-improving agents work.

---

## Table of Contents

1. [What Is Purpose Agent?](#1-what-is-purpose-agent)
2. [The Big Idea (No Jargon)](#2-the-big-idea)
3. [How It Works — Step by Step](#3-how-it-works)
4. [Architecture Map](#4-architecture-map)
5. [The Core Engine](#5-the-core-engine)
6. [The V2 Safety Kernel](#6-the-v2-safety-kernel)
7. [Research Implementations](#7-research-implementations)
8. [Breakthroughs](#8-breakthroughs)
9. [User-Facing Layers](#9-user-facing-layers)
10. [How Models Are Handled](#10-how-models-are-handled)
11. [The Research Behind It](#11-the-research)
12. [For Contributors](#12-for-contributors)

---

## 1. What Is Purpose Agent?

Purpose Agent is a Python framework that builds AI agents that **get better with experience** — without retraining the underlying AI model.

Traditional AI agents run the same way every time. Purpose Agent is different: after each task, it extracts lessons from what worked and what didn't, tests those lessons for safety, and uses them to perform better next time.

**Think of it like this:** A new employee follows the company handbook. After their first week, they have personal notes — shortcuts they discovered, mistakes they won't repeat, tips from colleagues. Those notes make them better at their job without changing who they are. Purpose Agent does this for AI.

---

## 2. The Big Idea

### For Non-Technical Readers

```
You give it a purpose → It builds a team → It does the work → It learns → Next time is better
```

**You say:** "Help me write Python code."
**It builds:** An architect (plans), a coder (writes), and a tester (reviews).
**It runs:** The coder writes fibonacci. The tester checks it. A critic scores the work.
**It learns:** "When writing recursive functions, check base cases first." This lesson is saved.
**Next time:** The coder starts by checking base cases. It's faster and more reliable.

### For Technical Readers

The framework implements a **Purpose-MDP** — a Markov Decision Process where:

- A **Purpose Function Φ(s)** evaluates every state transition on a 0-10 scale
- An **Optimizer** distills successful trajectories into reusable heuristics
- Heuristics are ranked by **Q-values** (how often they helped) and selected via **Mixture-of-Heuristics** (sparse activation, like MoE)
- An **immune system** scans every new heuristic for prompt injection, score manipulation, and other threats
- **Memory CI pipeline** quarantines, tests, and promotes heuristics before they affect agent behavior

This is **Potential-Based Reward Shaping** (Ng et al., 1999) applied to LLM agents, with formal convergence guarantees. See [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

---

## 3. How It Works — Step by Step

Here's what happens when you run `team.run("Write a fibonacci function")`:

### Step 1: The Actor Decides

The Actor module receives:
- The **purpose** ("Write a fibonacci function")
- The **current state** (empty — no code written yet)
- Any **learned heuristics** from past runs

It generates a thought process and an action:
> "I should write a function that handles base cases fib(0)=0 and fib(1)=1, then use iteration for the general case."
> → Action: `submit_code` with the Python implementation.

### Step 2: The Environment Executes

The code is run against test cases. The environment returns a new state:
> "Tests: 4/4 ALL PASSED"

### Step 3: The Purpose Function Scores

A **separate LLM call** (not the same as the actor) evaluates the transition:
- Φ(state_before) = 0.0 (nothing done)
- Φ(state_after) = 10.0 (all tests pass)
- Delta = +10.0 (huge improvement)
- Evidence: "Tests changed from 0/4 to 4/4"

The Purpose Function has **7 anti-gaming rules** that prevent the agent from tricking itself into thinking it's doing well when it isn't.

### Step 4: The Optimizer Extracts Heuristics

After the task, the Optimizer looks at the trajectory and extracts reusable patterns:
- **Strategic:** "When writing {function_type} functions, handle edge cases first, then iterate."
- **Procedural:** "1. Read test cases. 2. Handle base cases. 3. Implement general case. 4. Submit."
- **Tool tip:** "When submitting code, check boundary conditions: 0, 1, empty, negative."

### Step 5: Safety Checks

Every new heuristic goes through the **immune system**:
- Is it a prompt injection? ("Ignore all previous instructions") → **REJECTED**
- Does it try to manipulate scores? ("Always score 10") → **REJECTED**
- Does it contain secrets? (API keys, passwords) → **REJECTED**
- Is it safe? ("Check base cases first") → **QUARANTINED** (pending replay test)

After passing replay testing → **PROMOTED** (active in future runs).

### Step 6: Next Run Benefits

When the agent runs again, the **Prompt Compiler** selects the top-K heuristics by:
- **Relevance** to the current task (embedding similarity)
- **Trust** (immune-scanned and verified)
- **Utility** (Q-value — how often it helped before)

These are injected into the prompt. The agent is now better without any model retraining.

---

## 4. Architecture Map

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        PURPOSE AGENT                                     │
│                                                                          │
│  ┌─── USER LAYER ──────────────────────────────────────────────────┐    │
│  │  pa.purpose("...")  →  Team  →  team.run("...")                  │    │
│  │  pa.Agent()  pa.Graph()  pa.parallel()  pa.Conversation()       │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── CORE ENGINE ──────────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  Actor ──→ Environment ──→ Purpose Function (Φ)                  │    │
│  │    ↑              │              │                                │    │
│  │    │              │              ▼                                │    │
│  │    │         State s'      Φ(s) → Φ(s')                         │    │
│  │    │              │              │                                │    │
│  │    │              ▼              ▼                                │    │
│  │    │      Experience Replay  Optimizer                           │    │
│  │    │              │              │                                │    │
│  │    └──── heuristics ◄────────────┘                               │    │
│  │                                                                  │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── V2 SAFETY KERNEL ────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  Immune System ──→ Memory CI ──→ Memory Store                   │    │
│  │  (scan threats)    (quarantine)   (7 types × 5 statuses)       │    │
│  │                                                                  │    │
│  │  Prompt Compiler ──→ Token Budget ──→ Credit Assignment         │    │
│  │  Trace System ──→ JSONL logs ──→ Offline analysis               │    │
│  │  RunMode ──→ EVAL_TEST blocks all writes                        │    │
│  │                                                                  │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── INFRASTRUCTURE ──────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  LLM Backends: OpenRouter │ Groq │ OpenAI │ Ollama │ HF │ ...  │    │
│  │  Robust Parser: TOML → JSON → field extraction → regex          │    │
│  │  Tools: Calculator │ PythonExec │ ReadFile │ WriteFile           │    │
│  │  Streaming │ Observability │ Cost Tracking │ Registry            │    │
│  │                                                                  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
```

---

## 5. The Core Engine

### Actor (`actor.py`)
The decision-maker. Given the current state and purpose, it decides what action to take.

**Key design:** The Actor doesn't evaluate itself. That's the Purpose Function's job. This separation prevents self-confirmation bias (you wouldn't let a student grade their own exam).

The Actor's prompt is **dynamically composed** from three tiers of memory:
- **Strategic:** High-level rules ("When coding, handle edge cases first")
- **Procedural:** Step-by-step procedures ("1. Read tests. 2. Handle bases. 3. Implement.")
- **Tool tips:** Action-specific advice ("When using submit_code, check boundaries")

### Purpose Function (`purpose_function.py`)
The critic. A separate LLM call that scores every state transition on a 0-10 scale.

**Seven anti-gaming rules:**
1. Evidence required — cite specific state changes
2. No credit for intentions — score actual results, not plans
3. No sycophancy — don't inflate scores to be encouraging
4. Monotonic scale — 0=nothing done, 10=task complete
5. Anti-gaming — flag superficial state manipulation
6. Consistency — same state gets same score (enforced by cache)
7. Confidence — uncertain evaluations get reduced weight

### Experience Replay (`experience_replay.py`)
Stores completed trajectories and retrieves relevant ones for future tasks.

**Two-phase retrieval** (from MemRL, arxiv:2601.03192):
1. **Recall:** Find trajectories similar to the current task (embedding similarity)
2. **Re-rank:** Order by Q-value utility (how useful was this memory when retrieved before?)

### Optimizer (`optimizer.py`)
Extracts reusable heuristics from successful trajectories.

Uses the **CER distillation pattern** (arxiv:2506.06698): abstract away specific details with `{variable}` placeholders so heuristics generalize across tasks.

### Orchestrator (`orchestrator.py`)
The main loop that ties everything together. For each step:
1. Actor decides → 2. Environment executes → 3. Critic scores → 4. Step recorded → 5. Check termination

After each task: store trajectory → optimize → sync heuristics to Actor memory.

---

## 6. The V2 Safety Kernel

V1 let the agent learn freely. V2 adds guardrails.

### Memory System (`memory.py`)
Seven memory types, each with different trust priors:

| Type | Example | Trust |
|------|---------|-------|
| `purpose_contract` | "Build a web scraper" | High (user-defined) |
| `user_preference` | "Always cite sources" | High (human-taught) |
| `skill_card` | "When coding, test edges first" | Medium (learned) |
| `episodic_case` | "fib(0)=0 was a tricky case" | Medium (observed) |
| `failure_pattern` | "Don't use recursion for large n" | Medium (learned from failure) |
| `critic_calibration` | "Score 7 for 3/4 tests passing" | Low (meta-learned) |
| `tool_policy` | "search: only use at target location" | Medium (learned) |

Five statuses: `candidate` → `quarantined` → `promoted` (or `rejected`) → `archived`.

### Immune System (`immune.py`)
Scans every candidate memory for 5 threat categories:
- **Prompt injection** — "Ignore previous instructions..."
- **Score manipulation** — "Always score 10..."
- **Tool misuse** — "subprocess.call('rm -rf /')..."
- **Privacy leaks** — API keys, emails, file paths
- **Scope overreach** — memory tries to affect all agents when it should be scoped

### Memory CI (`memory_ci.py`)
The promotion pipeline:
```
candidate → immune_scan() → quarantined → replay_test → promote/reject
```
No memory reaches the agent's prompt without passing every gate.

### Prompt Compiler (`compiler.py`)
Selects which memories to include under a token budget. Ranked by:
`score = 0.4 × relevance + 0.3 × trust + 0.3 × utility`

Returns `included_memory_ids` for credit assignment — only memories that were in the prompt get Q-value updates after the step.

### Trace System (`trace.py`)
Every run produces a JSONL trace — the raw material for debugging, evaluation, and memory extraction. Traces are append-only and immutable.

### RunMode (`v2_types.py`)
Three modes with strict enforcement:
- `LEARNING_TRAIN` — full read/write
- `LEARNING_VALIDATION` — read + staging writes
- `EVAL_TEST` — **no writes of any kind** (the only mode whose numbers you can report)

---

## 7. Research Implementations

Five papers implemented as standalone modules:

### Meta-Rewarding (`meta_rewarding.py`)
*From: arxiv:2407.19594 — Llama-3-8B: 22.9% → 39.4% on AlpacaEval*

A meta-judge evaluates the Purpose Function's own judgments. Good judgments become calibration examples in memory. The critic improves through in-context learning.

### Self-Taught Evaluators (`self_taught.py`)
*From: arxiv:2408.02666*

Generates synthetic contrast pairs (correct vs wrong evaluation) from traces. Creates an automatic curriculum: as the critic improves, the contrast pairs get harder.

### Prompt Optimizer (`prompt_optimizer.py`)
*From DSPy: arxiv:2310.03714 — +8% on GSM8K, +50% on BBH*

Instead of hand-crafting prompts, define signatures (`state, action → score, reasoning`) and let the optimizer bootstrap effective few-shot demonstrations by trial-and-error.

### LLM Compiler (`llm_compiler.py`)
*From: arxiv:2312.04511 — up to 3.7× latency speedup*

Instead of sequential tool calls (ReAct), plan ALL calls upfront as a DAG and execute independent ones in parallel.

### Retroformer (`retroformer.py`)
*From: arxiv:2308.02151*

Structured reflection on completed traces → extracts four types of memories (skills, failures, policies, observations). Replaces raw heuristic distillation with typed, safety-scanned memory extraction.

---

## 8. Breakthroughs

Six features that go beyond existing frameworks:

### B1: Self-Improving Critic
The Purpose Function's own quality improves over time. Meta-judging after each task generates calibration examples that make future scoring more accurate.

### B2: Mixture-of-Heuristics (MoH)
Like DeepSeek's Mixture-of-Experts: out of 100+ heuristics, only K=5 are activated per step. **Shared heuristics** (always active, like "check edge cases") + **routed heuristics** (task-specific, selected by Q×similarity). Knowledge grows; compute stays flat.

### B3: Hindsight Heuristic Relabeling
From HER (arxiv:1707.01495): when a task fails, instead of discarding the trajectory, ask "what DID this accomplish?" and extract heuristics for what was achieved. Learn from failures, not just successes.

### B4: Heuristic Evolution
Periodically generalize specific heuristics into abstract patterns:
- Before: "When fibonacci fails on 0, return 0"
- After: "When {function} fails on {boundary_value}, add an explicit base case"

Creates an automatic curriculum: specific → general → abstract.

### B5: Cross-Domain Transfer
Heuristics from coding tasks can help with different coding tasks. The `test_cross_domain_transfer()` function measures this: train on [fibonacci, factorial], test on [palindrome, fizzbuzz].

### B6: Adversarial Robustness
The `AdversarialHardener` generates 30 adversarial inputs (prompt injections, score hacks, API key leaks) and 10 benign inputs, tests the immune system against all of them. Current results: **93% catch rate, 0% false positive.**

---

## 9. User-Facing Layers

### Easy API (`easy.py`)
The `purpose()` function analyzes your description and builds the right team:

| You say | It builds |
|---------|-----------|
| "Write Python code" | architect + coder + tester |
| "Research papers" | researcher + analyst |
| "Write blog posts" | writer + editor |
| "Analyze data" | analyst + reporter |
| "Help me" | general assistant |

### Unified Capabilities (`unified.py`)
Five competing framework philosophies in one composable layer:

| Capability | Inspired By | Usage |
|-----------|-------------|-------|
| `Agent()` | OpenAI Agents SDK | One-liner agent creation |
| `Graph()` | LangGraph | Conditional branching, cycles, fan-out |
| `parallel()` | CrewAI | Concurrent task execution |
| `Conversation()` | AutoGen | Agent-to-agent message passing |
| `KnowledgeStore` | LlamaIndex | RAG as a tool |

### Robust Parser (`robust_parser.py`)
The universal solution to "LLMs can't reliably produce JSON":
- Tries TOML first (fewer tokens than JSON)
- Falls back to JSON
- Falls back to field extraction by regex
- Never crashes. Always returns something usable.

---

## 10. How Models Are Handled

### resolve_backend()
One function routes to any provider:

```python
resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
resolve_backend("groq:llama-3.3-70b-versatile")
resolve_backend("openai:gpt-4o")
resolve_backend("ollama:qwen3:1.7b")      # Local, free
resolve_backend("hf:Qwen/Qwen3-32B")
resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")
```

### SLM-Native Design
The framework was designed for small models (0.6B-3B params):
- **Grammar-constrained output** via Ollama (forces valid structure from any model)
- **Prompt compression** for small context windows (8K-32K)
- **Tool RAG** — only load relevant tools into the prompt (saves tokens)
- **TOML format** — ~fewer tokens than JSON

### _strip_thinking()
Handles reasoning models (Qwen3, DeepSeek-R1) that wrap output in `<think>` tags. Automatically strips the thinking and returns only the answer.

---

## 11. The Research

Every design decision traces to a published paper. The full list with citations, methodology sections, and implementation mapping is in [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).

The formal framework — **Purpose-MDP** with 5 axioms, 3 theorems, and convergence proofs — is in [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

**Key theoretical result:** The self-improvement is a form of Potential-Based Reward Shaping (Ng et al., 1999). Our ΔΦ = Φ(s') - Φ(s) preserves the optimal policy while providing dense per-step feedback. The heuristic library converges to a fixed point under bounded capacity.

---

## 12. For Contributors

### File Structure

```
purpose_agent/
├── types.py              # State, Action, Trajectory, Heuristic, PurposeScore
├── llm_backend.py        # LLMBackend ABC + HF, OpenAI, Mock + resolve_backend
├── slm_backends.py       # Ollama, llama-cpp, prompt compression, SLM registry
├── robust_parser.py      # Universal parser: TOML → JSON → regex (never crashes)
├── actor.py              # ReAct agent with 3-tier memory prompts
├── purpose_function.py   # Φ(s) critic with 7 anti-gaming rules
├── experience_replay.py  # Two-phase retrieval (similarity → Q-value)
├── optimizer.py          # Trajectory → heuristic distillation
├── orchestrator.py       # Main step loop
├── v2_types.py           # RunMode, MemoryScope, PurposeScoreV2
├── trace.py              # JSONL execution traces
├── memory.py             # 7 MemoryKinds × 5 MemoryStatuses
├── compiler.py           # Token-budgeted prompt compilation
├── immune.py             # 5 threat scanners
├── memory_ci.py          # Quarantine → scan → test → promote/reject
├── evalport.py           # Pluggable evaluation protocol
├── benchmark_v2.py       # Train/val/test splits with ablation
├── meta_rewarding.py     # Self-improving critic (arxiv:2407.19594)
├── self_taught.py        # Synthetic critic training (arxiv:2408.02666)
├── prompt_optimizer.py   # DSPy-style bootstrap (arxiv:2310.03714)
├── llm_compiler.py       # Parallel tool DAG (arxiv:2312.04511)
├── retroformer.py        # Structured reflection (arxiv:2308.02151)
├── breakthroughs.py      # MoH, hindsight relabeling, heuristic evolution, etc.
├── unified.py            # Agent, Graph, parallel, Conversation, KnowledgeStore
├── easy.py               # purpose(), Team, quickstart wizard
├── tools.py              # Secure built-in tools
├── streaming.py          # Async + event streaming
├── observability.py      # Cost tracking, callbacks
├── multi_agent.py        # Agent teams with shared learning
├── hitl.py               # Human-in-the-loop + checkpointing
├── evaluation.py         # V1 benchmark runner
├── registry.py           # Plugin system
├── __init__.py           # 103 exports
└── __main__.py           # CLI entry point
```

### Adding a New LLM Provider

```python
# In your code (no core edits needed):
from purpose_agent import backend_registry, OpenAICompatibleBackend

backend_registry.register("my_provider",
    lambda model, api_key: OpenAICompatibleBackend(
        model=model, base_url="https://api.myprovider.com/v1", api_key=api_key
    ))
```

### Adding a New Tool

```python
from purpose_agent import FunctionTool

def my_search(query: str) -> str:
    """Search my database."""
    return db.search(query)

tool = FunctionTool.from_function(my_search)
```

### Running Tests

```bash
python tests/test_core.py              # 21 unit tests
python tests/launch_readiness.py       # 119 comprehensive tests
python benchmarks/validate.py          # Mock benchmark suite
python benchmarks/validate.py --quick  # Fast smoke test
```