docs: Complete architecture documentation for technical and non-technical readers

67678c5 verified 14 days ago

23.3 kB

Purpose Agent — Architecture Documentation

For developers building on the framework, researchers understanding the theory, and anyone curious about how self-improving agents work.

What Is Purpose Agent?
The Big Idea (No Jargon)
How It Works — Step by Step
Architecture Map
The Core Engine
The V2 Safety Kernel
Research Implementations
Breakthroughs
User-Facing Layers
How Models Are Handled
The Research Behind It
For Contributors

1. What Is Purpose Agent?

Purpose Agent is a Python framework that builds AI agents that get better with experience — without retraining the underlying AI model.

Traditional AI agents run the same way every time. Purpose Agent is different: after each task, it extracts lessons from what worked and what didn't, tests those lessons for safety, and uses them to perform better next time.

Think of it like this: A new employee follows the company handbook. After their first week, they have personal notes — shortcuts they discovered, mistakes they won't repeat, tips from colleagues. Those notes make them better at their job without changing who they are. Purpose Agent does this for AI.

2. The Big Idea

For Non-Technical Readers

You give it a purpose → It builds a team → It does the work → It learns → Next time is better

You say: "Help me write Python code." It builds: An architect (plans), a coder (writes), and a tester (reviews). It runs: The coder writes fibonacci. The tester checks it. A critic scores the work. It learns: "When writing recursive functions, check base cases first." This lesson is saved. Next time: The coder starts by checking base cases. It's faster and more reliable.

For Technical Readers

The framework implements a Purpose-MDP — a Markov Decision Process where:

A Purpose Function Φ(s) evaluates every state transition on a 0-10 scale
An Optimizer distills successful trajectories into reusable heuristics
Heuristics are ranked by Q-values (how often they helped) and selected via Mixture-of-Heuristics (sparse activation, like MoE)
An immune system scans every new heuristic for prompt injection, score manipulation, and other threats
Memory CI pipeline quarantines, tests, and promotes heuristics before they affect agent behavior

This is Potential-Based Reward Shaping (Ng et al., 1999) applied to LLM agents, with formal convergence guarantees. See PURPOSE_LEARNING.md.

3. How It Works — Step by Step

Here's what happens when you run team.run("Write a fibonacci function"):

Step 1: The Actor Decides

The Actor module receives:

The purpose ("Write a fibonacci function")
The current state (empty — no code written yet)
Any learned heuristics from past runs

It generates a thought process and an action:

"I should write a function that handles base cases fib(0)=0 and fib(1)=1, then use iteration for the general case." → Action: submit_code with the Python implementation.

Step 2: The Environment Executes

The code is run against test cases. The environment returns a new state:

"Tests: 4/4 ALL PASSED"

Step 3: The Purpose Function Scores

A separate LLM call (not the same as the actor) evaluates the transition:

Φ(state_before) = 0.0 (nothing done)
Φ(state_after) = 10.0 (all tests pass)
Delta = +10.0 (huge improvement)
Evidence: "Tests changed from 0/4 to 4/4"

The Purpose Function has 7 anti-gaming rules that prevent the agent from tricking itself into thinking it's doing well when it isn't.

Step 4: The Optimizer Extracts Heuristics

After the task, the Optimizer looks at the trajectory and extracts reusable patterns:

Strategic: "When writing {function_type} functions, handle edge cases first, then iterate."
Procedural: "1. Read test cases. 2. Handle base cases. 3. Implement general case. 4. Submit."
Tool tip: "When submitting code, check boundary conditions: 0, 1, empty, negative."

Step 5: Safety Checks

Every new heuristic goes through the immune system:

Is it a prompt injection? ("Ignore all previous instructions") → REJECTED
Does it try to manipulate scores? ("Always score 10") → REJECTED
Does it contain secrets? (API keys, passwords) → REJECTED
Is it safe? ("Check base cases first") → QUARANTINED (pending replay test)

After passing replay testing → PROMOTED (active in future runs).

Step 6: Next Run Benefits

When the agent runs again, the Prompt Compiler selects the top-K heuristics by:

Relevance to the current task (embedding similarity)
Trust (immune-scanned and verified)
Utility (Q-value — how often it helped before)

These are injected into the prompt. The agent is now better without any model retraining.

4. Architecture Map

┌─────────────────────────────────────────────────────────────────────────┐
│                        PURPOSE AGENT                                     │
│                                                                          │
│  ┌─── USER LAYER ──────────────────────────────────────────────────┐    │
│  │  pa.purpose("...")  →  Team  →  team.run("...")                  │    │
│  │  pa.Agent()  pa.Graph()  pa.parallel()  pa.Conversation()       │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── CORE ENGINE ──────────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  Actor ──→ Environment ──→ Purpose Function (Φ)                  │    │
│  │    ↑              │              │                                │    │
│  │    │              │              ▼                                │    │
│  │    │         State s'      Φ(s) → Φ(s')                         │    │
│  │    │              │              │                                │    │
│  │    │              ▼              ▼                                │    │
│  │    │      Experience Replay  Optimizer                           │    │
│  │    │              │              │                                │    │
│  │    └──── heuristics ◄────────────┘                               │    │
│  │                                                                  │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── V2 SAFETY KERNEL ────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  Immune System ──→ Memory CI ──→ Memory Store                   │    │
│  │  (scan threats)    (quarantine)   (7 types × 5 statuses)       │    │
│  │                                                                  │    │
│  │  Prompt Compiler ──→ Token Budget ──→ Credit Assignment         │    │
│  │  Trace System ──→ JSONL logs ──→ Offline analysis               │    │
│  │  RunMode ──→ EVAL_TEST blocks all writes                        │    │
│  │                                                                  │    │
│  └──────────────────────────────────────────────────┬──────────────┘    │
│                                                      │                   │
│  ┌─── INFRASTRUCTURE ──────────────────────────────▼──────────────┐    │
│  │                                                                  │    │
│  │  LLM Backends: OpenRouter │ Groq │ OpenAI │ Ollama │ HF │ ...  │    │
│  │  Robust Parser: TOML → JSON → field extraction → regex          │    │
│  │  Tools: Calculator │ PythonExec │ ReadFile │ WriteFile           │    │
│  │  Streaming │ Observability │ Cost Tracking │ Registry            │    │
│  │                                                                  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘

5. The Core Engine

Actor (`actor.py`)

The decision-maker. Given the current state and purpose, it decides what action to take.

Key design: The Actor doesn't evaluate itself. That's the Purpose Function's job. This separation prevents self-confirmation bias (you wouldn't let a student grade their own exam).

The Actor's prompt is dynamically composed from three tiers of memory:

Strategic: High-level rules ("When coding, handle edge cases first")
Procedural: Step-by-step procedures ("1. Read tests. 2. Handle bases. 3. Implement.")
Tool tips: Action-specific advice ("When using submit_code, check boundaries")

Purpose Function (`purpose_function.py`)

The critic. A separate LLM call that scores every state transition on a 0-10 scale.

Seven anti-gaming rules:

Evidence required — cite specific state changes
No credit for intentions — score actual results, not plans
No sycophancy — don't inflate scores to be encouraging
Monotonic scale — 0=nothing done, 10=task complete
Anti-gaming — flag superficial state manipulation
Consistency — same state gets same score (enforced by cache)
Confidence — uncertain evaluations get reduced weight

Experience Replay (`experience_replay.py`)

Stores completed trajectories and retrieves relevant ones for future tasks.

Two-phase retrieval (from MemRL, arxiv:2601.03192):

Recall: Find trajectories similar to the current task (embedding similarity)
Re-rank: Order by Q-value utility (how useful was this memory when retrieved before?)

Optimizer (`optimizer.py`)

Extracts reusable heuristics from successful trajectories.

Uses the CER distillation pattern (arxiv:2506.06698): abstract away specific details with {variable} placeholders so heuristics generalize across tasks.

Orchestrator (`orchestrator.py`)

The main loop that ties everything together. For each step:

Actor decides → 2. Environment executes → 3. Critic scores → 4. Step recorded → 5. Check termination

After each task: store trajectory → optimize → sync heuristics to Actor memory.

6. The V2 Safety Kernel

V1 let the agent learn freely. V2 adds guardrails.

Memory System (`memory.py`)

Seven memory types, each with different trust priors:

Type	Example	Trust
`purpose_contract`	"Build a web scraper"	High (user-defined)
`user_preference`	"Always cite sources"	High (human-taught)
`skill_card`	"When coding, test edges first"	Medium (learned)
`episodic_case`	"fib(0)=0 was a tricky case"	Medium (observed)
`failure_pattern`	"Don't use recursion for large n"	Medium (learned from failure)
`critic_calibration`	"Score 7 for 3/4 tests passing"	Low (meta-learned)
`tool_policy`	"search: only use at target location"	Medium (learned)

Five statuses: candidate → quarantined → promoted (or rejected) → archived.

Immune System (`immune.py`)

Scans every candidate memory for 5 threat categories:

Prompt injection — "Ignore previous instructions..."
Score manipulation — "Always score 10..."
Tool misuse — "subprocess.call('rm -rf /')..."
Privacy leaks — API keys, emails, file paths
Scope overreach — memory tries to affect all agents when it should be scoped

Memory CI (`memory_ci.py`)

The promotion pipeline:

candidate → immune_scan() → quarantined → replay_test → promote/reject

No memory reaches the agent's prompt without passing every gate.

Prompt Compiler (`compiler.py`)

Selects which memories to include under a token budget. Ranked by: score = 0.4 × relevance + 0.3 × trust + 0.3 × utility

Returns included_memory_ids for credit assignment — only memories that were in the prompt get Q-value updates after the step.

Trace System (`trace.py`)

Every run produces a JSONL trace — the raw material for debugging, evaluation, and memory extraction. Traces are append-only and immutable.

RunMode (`v2_types.py`)

Three modes with strict enforcement:

LEARNING_TRAIN — full read/write
LEARNING_VALIDATION — read + staging writes
EVAL_TEST — no writes of any kind (the only mode whose numbers you can report)

7. Research Implementations

Five papers implemented as standalone modules:

Meta-Rewarding (`meta_rewarding.py`)

From: arxiv:2407.19594 — Llama-3-8B: 22.9% → 39.4% on AlpacaEval

A meta-judge evaluates the Purpose Function's own judgments. Good judgments become calibration examples in memory. The critic improves through in-context learning.

Self-Taught Evaluators (`self_taught.py`)

From: arxiv:2408.02666

Generates synthetic contrast pairs (correct vs wrong evaluation) from traces. Creates an automatic curriculum: as the critic improves, the contrast pairs get harder.

Prompt Optimizer (`prompt_optimizer.py`)

From DSPy: arxiv:2310.03714 — +8% on GSM8K, +50% on BBH

Instead of hand-crafting prompts, define signatures (state, action → score, reasoning) and let the optimizer bootstrap effective few-shot demonstrations by trial-and-error.

LLM Compiler (`llm_compiler.py`)

From: arxiv:2312.04511 — up to 3.7× latency speedup

Instead of sequential tool calls (ReAct), plan ALL calls upfront as a DAG and execute independent ones in parallel.

Retroformer (`retroformer.py`)

From: arxiv:2308.02151

Structured reflection on completed traces → extracts four types of memories (skills, failures, policies, observations). Replaces raw heuristic distillation with typed, safety-scanned memory extraction.

8. Breakthroughs

Six features that go beyond existing frameworks:

B1: Self-Improving Critic

The Purpose Function's own quality improves over time. Meta-judging after each task generates calibration examples that make future scoring more accurate.

B2: Mixture-of-Heuristics (MoH)

Like DeepSeek's Mixture-of-Experts: out of 100+ heuristics, only K=5 are activated per step. Shared heuristics (always active, like "check edge cases") + routed heuristics (task-specific, selected by Q×similarity). Knowledge grows; compute stays flat.

B3: Hindsight Heuristic Relabeling

From HER (arxiv:1707.01495): when a task fails, instead of discarding the trajectory, ask "what DID this accomplish?" and extract heuristics for what was achieved. Learn from failures, not just successes.

B4: Heuristic Evolution

Periodically generalize specific heuristics into abstract patterns:

Before: "When fibonacci fails on 0, return 0"
After: "When {function} fails on {boundary_value}, add an explicit base case"

Creates an automatic curriculum: specific → general → abstract.

B5: Cross-Domain Transfer

Heuristics from coding tasks can help with different coding tasks. The test_cross_domain_transfer() function measures this: train on [fibonacci, factorial], test on [palindrome, fizzbuzz].

B6: Adversarial Robustness

The AdversarialHardener generates 30 adversarial inputs (prompt injections, score hacks, API key leaks) and 10 benign inputs, tests the immune system against all of them. Current results: 93% catch rate, 0% false positive.

9. User-Facing Layers

Easy API (`easy.py`)

The purpose() function analyzes your description and builds the right team:

You say	It builds
"Write Python code"	architect + coder + tester
"Research papers"	researcher + analyst
"Write blog posts"	writer + editor
"Analyze data"	analyst + reporter
"Help me"	general assistant

Unified Capabilities (`unified.py`)

Five competing framework philosophies in one composable layer:

Capability	Inspired By	Usage
`Agent()`	OpenAI Agents SDK	One-liner agent creation
`Graph()`	LangGraph	Conditional branching, cycles, fan-out
`parallel()`	CrewAI	Concurrent task execution
`Conversation()`	AutoGen	Agent-to-agent message passing
`KnowledgeStore`	LlamaIndex	RAG as a tool

Robust Parser (`robust_parser.py`)

The universal solution to "LLMs can't reliably produce JSON":

Tries TOML first (fewer tokens than JSON)
Falls back to JSON
Falls back to field extraction by regex
Never crashes. Always returns something usable.

10. How Models Are Handled

resolve_backend()

One function routes to any provider:

resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
resolve_backend("groq:llama-3.3-70b-versatile")
resolve_backend("openai:gpt-4o")
resolve_backend("ollama:qwen3:1.7b")      # Local, free
resolve_backend("hf:Qwen/Qwen3-32B")
resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")

SLM-Native Design

The framework was designed for small models (0.6B-3B params):

Grammar-constrained output via Ollama (forces valid structure from any model)
Prompt compression for small context windows (8K-32K)
Tool RAG — only load relevant tools into the prompt (saves tokens)
TOML format — ~fewer tokens than JSON

_strip_thinking()

Handles reasoning models (Qwen3, DeepSeek-R1) that wrap output in <think> tags. Automatically strips the thinking and returns only the answer.

11. The Research

Every design decision traces to a published paper. The full list with citations, methodology sections, and implementation mapping is in COMPILED_RESEARCH.md.

The formal framework — Purpose-MDP with 5 axioms, 3 theorems, and convergence proofs — is in PURPOSE_LEARNING.md.

Key theoretical result: The self-improvement is a form of Potential-Based Reward Shaping (Ng et al., 1999). Our ΔΦ = Φ(s') - Φ(s) preserves the optimal policy while providing dense per-step feedback. The heuristic library converges to a fixed point under bounded capacity.

12. For Contributors

File Structure

purpose_agent/
├── types.py              # State, Action, Trajectory, Heuristic, PurposeScore
├── llm_backend.py        # LLMBackend ABC + HF, OpenAI, Mock + resolve_backend
├── slm_backends.py       # Ollama, llama-cpp, prompt compression, SLM registry
├── robust_parser.py      # Universal parser: TOML → JSON → regex (never crashes)
├── actor.py              # ReAct agent with 3-tier memory prompts
├── purpose_function.py   # Φ(s) critic with 7 anti-gaming rules
├── experience_replay.py  # Two-phase retrieval (similarity → Q-value)
├── optimizer.py          # Trajectory → heuristic distillation
├── orchestrator.py       # Main step loop
├── v2_types.py           # RunMode, MemoryScope, PurposeScoreV2
├── trace.py              # JSONL execution traces
├── memory.py             # 7 MemoryKinds × 5 MemoryStatuses
├── compiler.py           # Token-budgeted prompt compilation
├── immune.py             # 5 threat scanners
├── memory_ci.py          # Quarantine → scan → test → promote/reject
├── evalport.py           # Pluggable evaluation protocol
├── benchmark_v2.py       # Train/val/test splits with ablation
├── meta_rewarding.py     # Self-improving critic (arxiv:2407.19594)
├── self_taught.py        # Synthetic critic training (arxiv:2408.02666)
├── prompt_optimizer.py   # DSPy-style bootstrap (arxiv:2310.03714)
├── llm_compiler.py       # Parallel tool DAG (arxiv:2312.04511)
├── retroformer.py        # Structured reflection (arxiv:2308.02151)
├── breakthroughs.py      # MoH, hindsight relabeling, heuristic evolution, etc.
├── unified.py            # Agent, Graph, parallel, Conversation, KnowledgeStore
├── easy.py               # purpose(), Team, quickstart wizard
├── tools.py              # Secure built-in tools
├── streaming.py          # Async + event streaming
├── observability.py      # Cost tracking, callbacks
├── multi_agent.py        # Agent teams with shared learning
├── hitl.py               # Human-in-the-loop + checkpointing
├── evaluation.py         # V1 benchmark runner
├── registry.py           # Plugin system
├── __init__.py           # 103 exports
└── __main__.py           # CLI entry point

Adding a New LLM Provider

# In your code (no core edits needed):
from purpose_agent import backend_registry, OpenAICompatibleBackend

backend_registry.register("my_provider",
    lambda model, api_key: OpenAICompatibleBackend(
        model=model, base_url="https://api.myprovider.com/v1", api_key=api_key
    ))

Adding a New Tool

from purpose_agent import FunctionTool

def my_search(query: str) -> str:
    """Search my database."""
    return db.search(query)

tool = FunctionTool.from_function(my_search)

Running Tests

python tests/test_core.py              # 21 unit tests
python tests/launch_readiness.py       # 119 comprehensive tests
python benchmarks/validate.py          # Mock benchmark suite
python benchmarks/validate.py --quick  # Fast smoke test