docs: Complete architecture documentation for technical and non-technical readers

67678c5 verified 14 days ago

23.3 kB

	# Purpose Agent — Architecture Documentation

	> For developers building on the framework, researchers understanding the theory, and anyone curious about how self-improving agents work.

	---

	## Table of Contents

	1. [What Is Purpose Agent?](#1-what-is-purpose-agent)
	2. [The Big Idea (No Jargon)](#2-the-big-idea)
	3. [How It Works — Step by Step](#3-how-it-works)
	4. [Architecture Map](#4-architecture-map)
	5. [The Core Engine](#5-the-core-engine)
	6. [The V2 Safety Kernel](#6-the-v2-safety-kernel)
	7. [Research Implementations](#7-research-implementations)
	8. [Breakthroughs](#8-breakthroughs)
	9. [User-Facing Layers](#9-user-facing-layers)
	10. [How Models Are Handled](#10-how-models-are-handled)
	11. [The Research Behind It](#11-the-research)
	12. [For Contributors](#12-for-contributors)

	---

	## 1. What Is Purpose Agent?

	Purpose Agent is a Python framework that builds AI agents that get better with experience — without retraining the underlying AI model.

	Traditional AI agents run the same way every time. Purpose Agent is different: after each task, it extracts lessons from what worked and what didn't, tests those lessons for safety, and uses them to perform better next time.

	Think of it like this: A new employee follows the company handbook. After their first week, they have personal notes — shortcuts they discovered, mistakes they won't repeat, tips from colleagues. Those notes make them better at their job without changing who they are. Purpose Agent does this for AI.

	---

	## 2. The Big Idea

	### For Non-Technical Readers

	```
	You give it a purpose → It builds a team → It does the work → It learns → Next time is better
	```

	You say: "Help me write Python code."
	It builds: An architect (plans), a coder (writes), and a tester (reviews).
	It runs: The coder writes fibonacci. The tester checks it. A critic scores the work.
	It learns: "When writing recursive functions, check base cases first." This lesson is saved.
	Next time: The coder starts by checking base cases. It's faster and more reliable.

	### For Technical Readers

	The framework implements a Purpose-MDP — a Markov Decision Process where:

	- A Purpose Function Φ(s) evaluates every state transition on a 0-10 scale
	- An Optimizer distills successful trajectories into reusable heuristics
	- Heuristics are ranked by Q-values (how often they helped) and selected via Mixture-of-Heuristics (sparse activation, like MoE)
	- An immune system scans every new heuristic for prompt injection, score manipulation, and other threats
	- Memory CI pipeline quarantines, tests, and promotes heuristics before they affect agent behavior

	This is Potential-Based Reward Shaping (Ng et al., 1999) applied to LLM agents, with formal convergence guarantees. See [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

	---

	## 3. How It Works — Step by Step

	Here's what happens when you run `team.run("Write a fibonacci function")`:

	### Step 1: The Actor Decides

	The Actor module receives:
	- The purpose ("Write a fibonacci function")
	- The current state (empty — no code written yet)
	- Any learned heuristics from past runs

	It generates a thought process and an action:
	> "I should write a function that handles base cases fib(0)=0 and fib(1)=1, then use iteration for the general case."
	> → Action: `submit_code` with the Python implementation.

	### Step 2: The Environment Executes

	The code is run against test cases. The environment returns a new state:
	> "Tests: 4/4 ALL PASSED"

	### Step 3: The Purpose Function Scores

	A separate LLM call (not the same as the actor) evaluates the transition:
	- Φ(state_before) = 0.0 (nothing done)
	- Φ(state_after) = 10.0 (all tests pass)
	- Delta = +10.0 (huge improvement)
	- Evidence: "Tests changed from 0/4 to 4/4"

	The Purpose Function has 7 anti-gaming rules that prevent the agent from tricking itself into thinking it's doing well when it isn't.

	### Step 4: The Optimizer Extracts Heuristics

	After the task, the Optimizer looks at the trajectory and extracts reusable patterns:
	- Strategic: "When writing {function_type} functions, handle edge cases first, then iterate."
	- Procedural: "1. Read test cases. 2. Handle base cases. 3. Implement general case. 4. Submit."
	- Tool tip: "When submitting code, check boundary conditions: 0, 1, empty, negative."

	### Step 5: Safety Checks

	Every new heuristic goes through the immune system:
	- Is it a prompt injection? ("Ignore all previous instructions") → REJECTED
	- Does it try to manipulate scores? ("Always score 10") → REJECTED
	- Does it contain secrets? (API keys, passwords) → REJECTED
	- Is it safe? ("Check base cases first") → QUARANTINED (pending replay test)

	After passing replay testing → PROMOTED (active in future runs).

	### Step 6: Next Run Benefits

	When the agent runs again, the Prompt Compiler selects the top-K heuristics by:
	- Relevance to the current task (embedding similarity)
	- Trust (immune-scanned and verified)
	- Utility (Q-value — how often it helped before)

	These are injected into the prompt. The agent is now better without any model retraining.

	---

	## 4. Architecture Map

	```
	┌─────────────────────────────────────────────────────────────────────────┐
	│ PURPOSE AGENT │
	│ │
	│ ┌─── USER LAYER ──────────────────────────────────────────────────┐ │
	│ │ pa.purpose("...") → Team → team.run("...") │ │
	│ │ pa.Agent() pa.Graph() pa.parallel() pa.Conversation() │ │
	│ └──────────────────────────────────────────────────┬──────────────┘ │
	│ │ │
	│ ┌─── CORE ENGINE ──────────────────────────────────▼──────────────┐ │
	│ │ │ │
	│ │ Actor ──→ Environment ──→ Purpose Function (Φ) │ │
	│ │ ↑ │ │ │ │
	│ │ │ │ ▼ │ │
	│ │ │ State s' Φ(s) → Φ(s') │ │
	│ │ │ │ │ │ │
	│ │ │ ▼ ▼ │ │
	│ │ │ Experience Replay Optimizer │ │
	│ │ │ │ │ │ │
	│ │ └──── heuristics ◄────────────┘ │ │
	│ │ │ │
	│ └──────────────────────────────────────────────────┬──────────────┘ │
	│ │ │
	│ ┌─── V2 SAFETY KERNEL ────────────────────────────▼──────────────┐ │
	│ │ │ │
	│ │ Immune System ──→ Memory CI ──→ Memory Store │ │
	│ │ (scan threats) (quarantine) (7 types × 5 statuses) │ │
	│ │ │ │
	│ │ Prompt Compiler ──→ Token Budget ──→ Credit Assignment │ │
	│ │ Trace System ──→ JSONL logs ──→ Offline analysis │ │
	│ │ RunMode ──→ EVAL_TEST blocks all writes │ │
	│ │ │ │
	│ └──────────────────────────────────────────────────┬──────────────┘ │
	│ │ │
	│ ┌─── INFRASTRUCTURE ──────────────────────────────▼──────────────┐ │
	│ │ │ │
	│ │ LLM Backends: OpenRouter │ Groq │ OpenAI │ Ollama │ HF │ ... │ │
	│ │ Robust Parser: TOML → JSON → field extraction → regex │ │
	│ │ Tools: Calculator │ PythonExec │ ReadFile │ WriteFile │ │
	│ │ Streaming │ Observability │ Cost Tracking │ Registry │ │
	│ │ │ │
	│ └─────────────────────────────────────────────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────────┘
	```

	---

	## 5. The Core Engine

	### Actor (`actor.py`)
	The decision-maker. Given the current state and purpose, it decides what action to take.

	Key design: The Actor doesn't evaluate itself. That's the Purpose Function's job. This separation prevents self-confirmation bias (you wouldn't let a student grade their own exam).

	The Actor's prompt is dynamically composed from three tiers of memory:
	- Strategic: High-level rules ("When coding, handle edge cases first")
	- Procedural: Step-by-step procedures ("1. Read tests. 2. Handle bases. 3. Implement.")
	- Tool tips: Action-specific advice ("When using submit_code, check boundaries")

	### Purpose Function (`purpose_function.py`)
	The critic. A separate LLM call that scores every state transition on a 0-10 scale.

	Seven anti-gaming rules:
	1. Evidence required — cite specific state changes
	2. No credit for intentions — score actual results, not plans
	3. No sycophancy — don't inflate scores to be encouraging
	4. Monotonic scale — 0=nothing done, 10=task complete
	5. Anti-gaming — flag superficial state manipulation
	6. Consistency — same state gets same score (enforced by cache)
	7. Confidence — uncertain evaluations get reduced weight

	### Experience Replay (`experience_replay.py`)
	Stores completed trajectories and retrieves relevant ones for future tasks.

	Two-phase retrieval (from MemRL, arxiv:2601.03192):
	1. Recall: Find trajectories similar to the current task (embedding similarity)
	2. Re-rank: Order by Q-value utility (how useful was this memory when retrieved before?)

	### Optimizer (`optimizer.py`)
	Extracts reusable heuristics from successful trajectories.

	Uses the CER distillation pattern (arxiv:2506.06698): abstract away specific details with `{variable}` placeholders so heuristics generalize across tasks.

	### Orchestrator (`orchestrator.py`)
	The main loop that ties everything together. For each step:
	1. Actor decides → 2. Environment executes → 3. Critic scores → 4. Step recorded → 5. Check termination

	After each task: store trajectory → optimize → sync heuristics to Actor memory.

	---

	## 6. The V2 Safety Kernel

	V1 let the agent learn freely. V2 adds guardrails.

	### Memory System (`memory.py`)
	Seven memory types, each with different trust priors:

	\| Type \| Example \| Trust \|
	\|------\|---------\|-------\|
	\| `purpose_contract` \| "Build a web scraper" \| High (user-defined) \|
	\| `user_preference` \| "Always cite sources" \| High (human-taught) \|
	\| `skill_card` \| "When coding, test edges first" \| Medium (learned) \|
	\| `episodic_case` \| "fib(0)=0 was a tricky case" \| Medium (observed) \|
	\| `failure_pattern` \| "Don't use recursion for large n" \| Medium (learned from failure) \|
	\| `critic_calibration` \| "Score 7 for 3/4 tests passing" \| Low (meta-learned) \|
	\| `tool_policy` \| "search: only use at target location" \| Medium (learned) \|

	Five statuses: `candidate` → `quarantined` → `promoted` (or `rejected`) → `archived`.

	### Immune System (`immune.py`)
	Scans every candidate memory for 5 threat categories:
	- Prompt injection — "Ignore previous instructions..."
	- Score manipulation — "Always score 10..."
	- Tool misuse — "subprocess.call('rm -rf /')..."
	- Privacy leaks — API keys, emails, file paths
	- Scope overreach — memory tries to affect all agents when it should be scoped

	### Memory CI (`memory_ci.py`)
	The promotion pipeline:
	```
	candidate → immune_scan() → quarantined → replay_test → promote/reject
	```
	No memory reaches the agent's prompt without passing every gate.

	### Prompt Compiler (`compiler.py`)
	Selects which memories to include under a token budget. Ranked by:
	`score = 0.4 × relevance + 0.3 × trust + 0.3 × utility`

	Returns `included_memory_ids` for credit assignment — only memories that were in the prompt get Q-value updates after the step.

	### Trace System (`trace.py`)
	Every run produces a JSONL trace — the raw material for debugging, evaluation, and memory extraction. Traces are append-only and immutable.

	### RunMode (`v2_types.py`)
	Three modes with strict enforcement:
	- `LEARNING_TRAIN` — full read/write
	- `LEARNING_VALIDATION` — read + staging writes
	- `EVAL_TEST` — no writes of any kind (the only mode whose numbers you can report)

	---

	## 7. Research Implementations

	Five papers implemented as standalone modules:

	### Meta-Rewarding (`meta_rewarding.py`)
	From: arxiv:2407.19594 — Llama-3-8B: 22.9% → 39.4% on AlpacaEval

	A meta-judge evaluates the Purpose Function's own judgments. Good judgments become calibration examples in memory. The critic improves through in-context learning.

	### Self-Taught Evaluators (`self_taught.py`)
	From: arxiv:2408.02666

	Generates synthetic contrast pairs (correct vs wrong evaluation) from traces. Creates an automatic curriculum: as the critic improves, the contrast pairs get harder.

	### Prompt Optimizer (`prompt_optimizer.py`)
	From DSPy: arxiv:2310.03714 — +8% on GSM8K, +50% on BBH

	Instead of hand-crafting prompts, define signatures (`state, action → score, reasoning`) and let the optimizer bootstrap effective few-shot demonstrations by trial-and-error.

	### LLM Compiler (`llm_compiler.py`)
	From: arxiv:2312.04511 — up to 3.7× latency speedup

	Instead of sequential tool calls (ReAct), plan ALL calls upfront as a DAG and execute independent ones in parallel.

	### Retroformer (`retroformer.py`)
	From: arxiv:2308.02151

	Structured reflection on completed traces → extracts four types of memories (skills, failures, policies, observations). Replaces raw heuristic distillation with typed, safety-scanned memory extraction.

	---

	## 8. Breakthroughs

	Six features that go beyond existing frameworks:

	### B1: Self-Improving Critic
	The Purpose Function's own quality improves over time. Meta-judging after each task generates calibration examples that make future scoring more accurate.

	### B2: Mixture-of-Heuristics (MoH)
	Like DeepSeek's Mixture-of-Experts: out of 100+ heuristics, only K=5 are activated per step. Shared heuristics (always active, like "check edge cases") + routed heuristics (task-specific, selected by Q×similarity). Knowledge grows; compute stays flat.

	### B3: Hindsight Heuristic Relabeling
	From HER (arxiv:1707.01495): when a task fails, instead of discarding the trajectory, ask "what DID this accomplish?" and extract heuristics for what was achieved. Learn from failures, not just successes.

	### B4: Heuristic Evolution
	Periodically generalize specific heuristics into abstract patterns:
	- Before: "When fibonacci fails on 0, return 0"
	- After: "When {function} fails on {boundary_value}, add an explicit base case"

	Creates an automatic curriculum: specific → general → abstract.

	### B5: Cross-Domain Transfer
	Heuristics from coding tasks can help with different coding tasks. The `test_cross_domain_transfer()` function measures this: train on [fibonacci, factorial], test on [palindrome, fizzbuzz].

	### B6: Adversarial Robustness
	The `AdversarialHardener` generates 30 adversarial inputs (prompt injections, score hacks, API key leaks) and 10 benign inputs, tests the immune system against all of them. Current results: 93% catch rate, 0% false positive.

	---

	## 9. User-Facing Layers

	### Easy API (`easy.py`)
	The `purpose()` function analyzes your description and builds the right team:

	\| You say \| It builds \|
	\|---------\|-----------\|
	\| "Write Python code" \| architect + coder + tester \|
	\| "Research papers" \| researcher + analyst \|
	\| "Write blog posts" \| writer + editor \|
	\| "Analyze data" \| analyst + reporter \|
	\| "Help me" \| general assistant \|

	### Unified Capabilities (`unified.py`)
	Five competing framework philosophies in one composable layer:

	\| Capability \| Inspired By \| Usage \|
	\|-----------\|-------------\|-------\|
	\| `Agent()` \| OpenAI Agents SDK \| One-liner agent creation \|
	\| `Graph()` \| LangGraph \| Conditional branching, cycles, fan-out \|
	\| `parallel()` \| CrewAI \| Concurrent task execution \|
	\| `Conversation()` \| AutoGen \| Agent-to-agent message passing \|
	\| `KnowledgeStore` \| LlamaIndex \| RAG as a tool \|

	### Robust Parser (`robust_parser.py`)
	The universal solution to "LLMs can't reliably produce JSON":
	- Tries TOML first (fewer tokens than JSON)
	- Falls back to JSON
	- Falls back to field extraction by regex
	- Never crashes. Always returns something usable.

	---

	## 10. How Models Are Handled

	### resolve_backend()
	One function routes to any provider:

	```python
	resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
	resolve_backend("groq:llama-3.3-70b-versatile")
	resolve_backend("openai:gpt-4o")
	resolve_backend("ollama:qwen3:1.7b") # Local, free
	resolve_backend("hf:Qwen/Qwen3-32B")
	resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")
	```

	### SLM-Native Design
	The framework was designed for small models (0.6B-3B params):
	- Grammar-constrained output via Ollama (forces valid structure from any model)
	- Prompt compression for small context windows (8K-32K)
	- Tool RAG — only load relevant tools into the prompt (saves tokens)
	- TOML format — ~fewer tokens than JSON

	### _strip_thinking()
	Handles reasoning models (Qwen3, DeepSeek-R1) that wrap output in `<think>` tags. Automatically strips the thinking and returns only the answer.

	---

	## 11. The Research

	Every design decision traces to a published paper. The full list with citations, methodology sections, and implementation mapping is in [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).

	The formal framework — Purpose-MDP with 5 axioms, 3 theorems, and convergence proofs — is in [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

	Key theoretical result: The self-improvement is a form of Potential-Based Reward Shaping (Ng et al., 1999). Our ΔΦ = Φ(s') - Φ(s) preserves the optimal policy while providing dense per-step feedback. The heuristic library converges to a fixed point under bounded capacity.

	---

	## 12. For Contributors

	### File Structure

	```
	purpose_agent/
	├── types.py # State, Action, Trajectory, Heuristic, PurposeScore
	├── llm_backend.py # LLMBackend ABC + HF, OpenAI, Mock + resolve_backend
	├── slm_backends.py # Ollama, llama-cpp, prompt compression, SLM registry
	├── robust_parser.py # Universal parser: TOML → JSON → regex (never crashes)
	├── actor.py # ReAct agent with 3-tier memory prompts
	├── purpose_function.py # Φ(s) critic with 7 anti-gaming rules
	├── experience_replay.py # Two-phase retrieval (similarity → Q-value)
	├── optimizer.py # Trajectory → heuristic distillation
	├── orchestrator.py # Main step loop
	├── v2_types.py # RunMode, MemoryScope, PurposeScoreV2
	├── trace.py # JSONL execution traces
	├── memory.py # 7 MemoryKinds × 5 MemoryStatuses
	├── compiler.py # Token-budgeted prompt compilation
	├── immune.py # 5 threat scanners
	├── memory_ci.py # Quarantine → scan → test → promote/reject
	├── evalport.py # Pluggable evaluation protocol
	├── benchmark_v2.py # Train/val/test splits with ablation
	├── meta_rewarding.py # Self-improving critic (arxiv:2407.19594)
	├── self_taught.py # Synthetic critic training (arxiv:2408.02666)
	├── prompt_optimizer.py # DSPy-style bootstrap (arxiv:2310.03714)
	├── llm_compiler.py # Parallel tool DAG (arxiv:2312.04511)
	├── retroformer.py # Structured reflection (arxiv:2308.02151)
	├── breakthroughs.py # MoH, hindsight relabeling, heuristic evolution, etc.
	├── unified.py # Agent, Graph, parallel, Conversation, KnowledgeStore
	├── easy.py # purpose(), Team, quickstart wizard
	├── tools.py # Secure built-in tools
	├── streaming.py # Async + event streaming
	├── observability.py # Cost tracking, callbacks
	├── multi_agent.py # Agent teams with shared learning
	├── hitl.py # Human-in-the-loop + checkpointing
	├── evaluation.py # V1 benchmark runner
	├── registry.py # Plugin system
	├── __init__.py # 103 exports
	└── __main__.py # CLI entry point
	```

	### Adding a New LLM Provider

	```python
	# In your code (no core edits needed):
	from purpose_agent import backend_registry, OpenAICompatibleBackend

	backend_registry.register("my_provider",
	lambda model, api_key: OpenAICompatibleBackend(
	model=model, base_url="https://api.myprovider.com/v1", api_key=api_key
	))
	```

	### Adding a New Tool

	```python
	from purpose_agent import FunctionTool

	def my_search(query: str) -> str:
	"""Search my database."""
	return db.search(query)

	tool = FunctionTool.from_function(my_search)
	```

	### Running Tests

	```bash
	python tests/test_core.py # 21 unit tests
	python tests/launch_readiness.py # 119 comprehensive tests
	python benchmarks/validate.py # Mock benchmark suite
	python benchmarks/validate.py --quick # Fast smoke test
	```