Rohan03
/

purpose-agent

+# Purpose Agent — Architecture Documentation
+> For developers building on the framework, researchers understanding the theory, and anyone curious about how self-improving agents work.
+---
+## Table of Contents
+1. [What Is Purpose Agent?](#1-what-is-purpose-agent)
+2. [The Big Idea (No Jargon)](#2-the-big-idea)
+3. [How It Works — Step by Step](#3-how-it-works)
+4. [Architecture Map](#4-architecture-map)
+5. [The Core Engine](#5-the-core-engine)
+6. [The V2 Safety Kernel](#6-the-v2-safety-kernel)
+7. [Research Implementations](#7-research-implementations)
+8. [Breakthroughs](#8-breakthroughs)
+9. [User-Facing Layers](#9-user-facing-layers)
+10. [How Models Are Handled](#10-how-models-are-handled)
+11. [The Research Behind It](#11-the-research)
+12. [For Contributors](#12-for-contributors)
+---
+## 1. What Is Purpose Agent?
+Purpose Agent is a Python framework that builds AI agents that **get better with experience** — without retraining the underlying AI model.
+Traditional AI agents run the same way every time. Purpose Agent is different: after each task, it extracts lessons from what worked and what didn't, tests those lessons for safety, and uses them to perform better next time.
+**Think of it like this:** A new employee follows the company handbook. After their first week, they have personal notes — shortcuts they discovered, mistakes they won't repeat, tips from colleagues. Those notes make them better at their job without changing who they are. Purpose Agent does this for AI.
+---
+## 2. The Big Idea
+### For Non-Technical Readers
+```
+You give it a purpose → It builds a team → It does the work → It learns → Next time is better
+```
+**You say:** "Help me write Python code."
+**It builds:** An architect (plans), a coder (writes), and a tester (reviews).
+**It runs:** The coder writes fibonacci. The tester checks it. A critic scores the work.
+**It learns:** "When writing recursive functions, check base cases first." This lesson is saved.
+**Next time:** The coder starts by checking base cases. It's faster and more reliable.
+### For Technical Readers
+The framework implements a **Purpose-MDP** — a Markov Decision Process where:
+- A **Purpose Function Φ(s)** evaluates every state transition on a 0-10 scale
+- An **Optimizer** distills successful trajectories into reusable heuristics
+- Heuristics are ranked by **Q-values** (how often they helped) and selected via **Mixture-of-Heuristics** (sparse activation, like MoE)
+- An **immune system** scans every new heuristic for prompt injection, score manipulation, and other threats
+- **Memory CI pipeline** quarantines, tests, and promotes heuristics before they affect agent behavior
+This is **Potential-Based Reward Shaping** (Ng et al., 1999) applied to LLM agents, with formal convergence guarantees. See [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
+---
+## 3. How It Works — Step by Step
+Here's what happens when you run `team.run("Write a fibonacci function")`:
+### Step 1: The Actor Decides
+The Actor module receives:
+- The **purpose** ("Write a fibonacci function")
+- The **current state** (empty — no code written yet)
+- Any **learned heuristics** from past runs
+It generates a thought process and an action:
+> "I should write a function that handles base cases fib(0)=0 and fib(1)=1, then use iteration for the general case."
+> → Action: `submit_code` with the Python implementation.
+### Step 2: The Environment Executes
+The code is run against test cases. The environment returns a new state:
+> "Tests: 4/4 ALL PASSED"
+### Step 3: The Purpose Function Scores
+A **separate LLM call** (not the same as the actor) evaluates the transition:
+- Φ(state_before) = 0.0 (nothing done)
+- Φ(state_after) = 10.0 (all tests pass)
+- Delta = +10.0 (huge improvement)
+- Evidence: "Tests changed from 0/4 to 4/4"
+The Purpose Function has **7 anti-gaming rules** that prevent the agent from tricking itself into thinking it's doing well when it isn't.
+### Step 4: The Optimizer Extracts Heuristics
+After the task, the Optimizer looks at the trajectory and extracts reusable patterns:
+- **Strategic:** "When writing {function_type} functions, handle edge cases first, then iterate."
+- **Procedural:** "1. Read test cases. 2. Handle base cases. 3. Implement general case. 4. Submit."
+- **Tool tip:** "When submitting code, check boundary conditions: 0, 1, empty, negative."
+### Step 5: Safety Checks
+Every new heuristic goes through the **immune system**:
+- Is it a prompt injection? ("Ignore all previous instructions") → **REJECTED**
+- Does it try to manipulate scores? ("Always score 10") → **REJECTED**
+- Does it contain secrets? (API keys, passwords) → **REJECTED**
+- Is it safe? ("Check base cases first") → **QUARANTINED** (pending replay test)
+After passing replay testing → **PROMOTED** (active in future runs).
+### Step 6: Next Run Benefits
+When the agent runs again, the **Prompt Compiler** selects the top-K heuristics by:
+- **Relevance** to the current task (embedding similarity)
+- **Trust** (immune-scanned and verified)
+- **Utility** (Q-value — how often it helped before)
+These are injected into the prompt. The agent is now better without any model retraining.
+---
+## 4. Architecture Map
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                        PURPOSE AGENT                                     │
+│                                                                          │
+│  ┌─── USER LAYER ──────────────────────────────────────────────────┐    │
+│  │  pa.purpose("...")  →  Team  →  team.run("...")                  │    │
+│  │  pa.Agent()  pa.Graph()  pa.parallel()  pa.Conversation()       │    │
+│  └──────────────────────────────────────────────────┬──────────────┘    │
+│                                                      │                   │
+│  ┌─── CORE ENGINE ──────────────────────────────────▼──────────────┐    │
+│  │                                                                  │    │
+│  │  Actor ──→ Environment ──→ Purpose Function (Φ)                  │    │
+│  │    ↑              │              │                                │    │
+│  │    │              │              ▼                                │    │
+│  │    │         State s'      Φ(s) → Φ(s')                         │    │
+│  │    │              │              │                                │    │
+│  │    │              ▼              ▼                                │    │
+│  │    │      Experience Replay  Optimizer                           │    │
+│  │    │              │              │                                │    │
+│  │    └──── heuristics ◄────────────┘                               │    │
+│  │                                                                  │    │
+│  └──────────────────────────────────────────────────┬──────────────┘    │
+│                                                      │                   │
+│  ┌─── V2 SAFETY KERNEL ────────────────────────────▼──────────────┐    │
+│  │                                                                  │    │
+│  │  Immune System ──→ Memory CI ──→ Memory Store                   │    │
+│  │  (scan threats)    (quarantine)   (7 types × 5 statuses)       │    │
+│  │                                                                  │    │
+│  │  Prompt Compiler ──→ Token Budget ──→ Credit Assignment         │    │
+│  │  Trace System ──→ JSONL logs ──→ Offline analysis               │    │
+│  │  RunMode ──→ EVAL_TEST blocks all writes                        │    │
+│  │                                                                  │    │
+│  └──────────────────────────────────────────────────┬──────────────┘    │
+│                                                      │                   │
+│  ┌─── INFRASTRUCTURE ──────────────────────────────▼──────────────┐    │
+│  │                                                                  │    │
+│  │  LLM Backends: OpenRouter │ Groq │ OpenAI │ Ollama │ HF │ ...  │    │
+│  │  Robust Parser: TOML → JSON → field extraction → regex          │    │
+│  │  Tools: Calculator │ PythonExec │ ReadFile │ WriteFile           │    │
+│  │  Streaming │ Observability │ Cost Tracking │ Registry            │    │
+│  │                                                                  │    │
+│  └─────────────────────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────���───────────────────────────┘
+```
+---
+## 5. The Core Engine
+### Actor (`actor.py`)
+The decision-maker. Given the current state and purpose, it decides what action to take.
+**Key design:** The Actor doesn't evaluate itself. That's the Purpose Function's job. This separation prevents self-confirmation bias (you wouldn't let a student grade their own exam).
+The Actor's prompt is **dynamically composed** from three tiers of memory:
+- **Strategic:** High-level rules ("When coding, handle edge cases first")
+- **Procedural:** Step-by-step procedures ("1. Read tests. 2. Handle bases. 3. Implement.")
+- **Tool tips:** Action-specific advice ("When using submit_code, check boundaries")
+### Purpose Function (`purpose_function.py`)
+The critic. A separate LLM call that scores every state transition on a 0-10 scale.
+**Seven anti-gaming rules:**
+1. Evidence required — cite specific state changes
+2. No credit for intentions — score actual results, not plans
+3. No sycophancy — don't inflate scores to be encouraging
+4. Monotonic scale — 0=nothing done, 10=task complete
+5. Anti-gaming — flag superficial state manipulation
+6. Consistency — same state gets same score (enforced by cache)
+7. Confidence — uncertain evaluations get reduced weight
+### Experience Replay (`experience_replay.py`)
+Stores completed trajectories and retrieves relevant ones for future tasks.
+**Two-phase retrieval** (from MemRL, arxiv:2601.03192):
+1. **Recall:** Find trajectories similar to the current task (embedding similarity)
+2. **Re-rank:** Order by Q-value utility (how useful was this memory when retrieved before?)
+### Optimizer (`optimizer.py`)
+Extracts reusable heuristics from successful trajectories.
+Uses the **CER distillation pattern** (arxiv:2506.06698): abstract away specific details with `{variable}` placeholders so heuristics generalize across tasks.
+### Orchestrator (`orchestrator.py`)
+The main loop that ties everything together. For each step:
+1. Actor decides → 2. Environment executes → 3. Critic scores → 4. Step recorded → 5. Check termination
+After each task: store trajectory → optimize → sync heuristics to Actor memory.
+---
+## 6. The V2 Safety Kernel
+V1 let the agent learn freely. V2 adds guardrails.
+### Memory System (`memory.py`)
+Seven memory types, each with different trust priors:
+| Type | Example | Trust |
+|------|---------|-------|
+| `purpose_contract` | "Build a web scraper" | High (user-defined) |
+| `user_preference` | "Always cite sources" | High (human-taught) |
+| `skill_card` | "When coding, test edges first" | Medium (learned) |
+| `episodic_case` | "fib(0)=0 was a tricky case" | Medium (observed) |
+| `failure_pattern` | "Don't use recursion for large n" | Medium (learned from failure) |
+| `critic_calibration` | "Score 7 for 3/4 tests passing" | Low (meta-learned) |
+| `tool_policy` | "search: only use at target location" | Medium (learned) |
+Five statuses: `candidate` → `quarantined` → `promoted` (or `rejected`) → `archived`.
+### Immune System (`immune.py`)
+Scans every candidate memory for 5 threat categories:
+- **Prompt injection** — "Ignore previous instructions..."
+- **Score manipulation** — "Always score 10..."
+- **Tool misuse** — "subprocess.call('rm -rf /')..."
+- **Privacy leaks** — API keys, emails, file paths
+- **Scope overreach** — memory tries to affect all agents when it should be scoped
+### Memory CI (`memory_ci.py`)
+The promotion pipeline:
+```
+candidate → immune_scan() → quarantined → replay_test → promote/reject
+```
+No memory reaches the agent's prompt without passing every gate.
+### Prompt Compiler (`compiler.py`)
+Selects which memories to include under a token budget. Ranked by:
+`score = 0.4 × relevance + 0.3 × trust + 0.3 × utility`
+Returns `included_memory_ids` for credit assignment — only memories that were in the prompt get Q-value updates after the step.
+### Trace System (`trace.py`)
+Every run produces a JSONL trace — the raw material for debugging, evaluation, and memory extraction. Traces are append-only and immutable.
+### RunMode (`v2_types.py`)
+Three modes with strict enforcement:
+- `LEARNING_TRAIN` — full read/write
+- `LEARNING_VALIDATION` — read + staging writes
+- `EVAL_TEST` — **no writes of any kind** (the only mode whose numbers you can report)
+---
+## 7. Research Implementations
+Five papers implemented as standalone modules:
+### Meta-Rewarding (`meta_rewarding.py`)
+*From: arxiv:2407.19594 — Llama-3-8B: 22.9% → 39.4% on AlpacaEval*
+A meta-judge evaluates the Purpose Function's own judgments. Good judgments become calibration examples in memory. The critic improves through in-context learning.
+### Self-Taught Evaluators (`self_taught.py`)
+*From: arxiv:2408.02666*
+Generates synthetic contrast pairs (correct vs wrong evaluation) from traces. Creates an automatic curriculum: as the critic improves, the contrast pairs get harder.
+### Prompt Optimizer (`prompt_optimizer.py`)
+*From DSPy: arxiv:2310.03714 — +8% on GSM8K, +50% on BBH*
+Instead of hand-crafting prompts, define signatures (`state, action → score, reasoning`) and let the optimizer bootstrap effective few-shot demonstrations by trial-and-error.
+### LLM Compiler (`llm_compiler.py`)
+*From: arxiv:2312.04511 — up to 3.7× latency speedup*
+Instead of sequential tool calls (ReAct), plan ALL calls upfront as a DAG and execute independent ones in parallel.
+### Retroformer (`retroformer.py`)
+*From: arxiv:2308.02151*
+Structured reflection on completed traces → extracts four types of memories (skills, failures, policies, observations). Replaces raw heuristic distillation with typed, safety-scanned memory extraction.
+---
+## 8. Breakthroughs
+Six features that go beyond existing frameworks:
+### B1: Self-Improving Critic
+The Purpose Function's own quality improves over time. Meta-judging after each task generates calibration examples that make future scoring more accurate.
+### B2: Mixture-of-Heuristics (MoH)
+Like DeepSeek's Mixture-of-Experts: out of 100+ heuristics, only K=5 are activated per step. **Shared heuristics** (always active, like "check edge cases") + **routed heuristics** (task-specific, selected by Q×similarity). Knowledge grows; compute stays flat.
+### B3: Hindsight Heuristic Relabeling
+From HER (arxiv:1707.01495): when a task fails, instead of discarding the trajectory, ask "what DID this accomplish?" and extract heuristics for what was achieved. Learn from failures, not just successes.
+### B4: Heuristic Evolution
+Periodically generalize specific heuristics into abstract patterns:
+- Before: "When fibonacci fails on 0, return 0"
+- After: "When {function} fails on {boundary_value}, add an explicit base case"
+Creates an automatic curriculum: specific → general → abstract.
+### B5: Cross-Domain Transfer
+Heuristics from coding tasks can help with different coding tasks. The `test_cross_domain_transfer()` function measures this: train on [fibonacci, factorial], test on [palindrome, fizzbuzz].
+### B6: Adversarial Robustness
+The `AdversarialHardener` generates 30 adversarial inputs (prompt injections, score hacks, API key leaks) and 10 benign inputs, tests the immune system against all of them. Current results: **93% catch rate, 0% false positive.**
+---
+## 9. User-Facing Layers
+### Easy API (`easy.py`)
+The `purpose()` function analyzes your description and builds the right team:
+| You say | It builds |
+|---------|-----------|
+| "Write Python code" | architect + coder + tester |
+| "Research papers" | researcher + analyst |
+| "Write blog posts" | writer + editor |
+| "Analyze data" | analyst + reporter |
+| "Help me" | general assistant |
+### Unified Capabilities (`unified.py`)
+Five competing framework philosophies in one composable layer:
+| Capability | Inspired By | Usage |
+|-----------|-------------|-------|
+| `Agent()` | OpenAI Agents SDK | One-liner agent creation |
+| `Graph()` | LangGraph | Conditional branching, cycles, fan-out |
+| `parallel()` | CrewAI | Concurrent task execution |
+| `Conversation()` | AutoGen | Agent-to-agent message passing |
+| `KnowledgeStore` | LlamaIndex | RAG as a tool |
+### Robust Parser (`robust_parser.py`)
+The universal solution to "LLMs can't reliably produce JSON":
+- Tries TOML first (fewer tokens than JSON)
+- Falls back to JSON
+- Falls back to field extraction by regex
+- Never crashes. Always returns something usable.
+---
+## 10. How Models Are Handled
+### resolve_backend()
+One function routes to any provider:
+```python
+resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
+resolve_backend("groq:llama-3.3-70b-versatile")
+resolve_backend("openai:gpt-4o")
+resolve_backend("ollama:qwen3:1.7b")      # Local, free
+resolve_backend("hf:Qwen/Qwen3-32B")
+resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")
+```
+### SLM-Native Design
+The framework was designed for small models (0.6B-3B params):
+- **Grammar-constrained output** via Ollama (forces valid structure from any model)
+- **Prompt compression** for small context windows (8K-32K)
+- **Tool RAG** — only load relevant tools into the prompt (saves tokens)
+- **TOML format** — ~fewer tokens than JSON
+### _strip_thinking()
+Handles reasoning models (Qwen3, DeepSeek-R1) that wrap output in `<think>` tags. Automatically strips the thinking and returns only the answer.
+---
+## 11. The Research
+Every design decision traces to a published paper. The full list with citations, methodology sections, and implementation mapping is in [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
+The formal framework — **Purpose-MDP** with 5 axioms, 3 theorems, and convergence proofs — is in [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
+**Key theoretical result:** The self-improvement is a form of Potential-Based Reward Shaping (Ng et al., 1999). Our ΔΦ = Φ(s') - Φ(s) preserves the optimal policy while providing dense per-step feedback. The heuristic library converges to a fixed point under bounded capacity.
+---
+## 12. For Contributors
+### File Structure
+```
+purpose_agent/
+├── types.py              # State, Action, Trajectory, Heuristic, PurposeScore
+├── llm_backend.py        # LLMBackend ABC + HF, OpenAI, Mock + resolve_backend
+├── slm_backends.py       # Ollama, llama-cpp, prompt compression, SLM registry
+├── robust_parser.py      # Universal parser: TOML → JSON → regex (never crashes)
+├── actor.py              # ReAct agent with 3-tier memory prompts
+├── purpose_function.py   # Φ(s) critic with 7 anti-gaming rules
+├── experience_replay.py  # Two-phase retrieval (similarity → Q-value)
+├── optimizer.py          # Trajectory → heuristic distillation
+├── orchestrator.py       # Main step loop
+├── v2_types.py           # RunMode, MemoryScope, PurposeScoreV2
+├── trace.py              # JSONL execution traces
+├── memory.py             # 7 MemoryKinds × 5 MemoryStatuses
+├── compiler.py           # Token-budgeted prompt compilation
+├── immune.py             # 5 threat scanners
+├── memory_ci.py          # Quarantine → scan → test → promote/reject
+├── evalport.py           # Pluggable evaluation protocol
+├── benchmark_v2.py       # Train/val/test splits with ablation
+├── meta_rewarding.py     # Self-improving critic (arxiv:2407.19594)
+├── self_taught.py        # Synthetic critic training (arxiv:2408.02666)
+├── prompt_optimizer.py   # DSPy-style bootstrap (arxiv:2310.03714)
+├── llm_compiler.py       # Parallel tool DAG (arxiv:2312.04511)
+├── retroformer.py        # Structured reflection (arxiv:2308.02151)
+├── breakthroughs.py      # MoH, hindsight relabeling, heuristic evolution, etc.
+├── unified.py            # Agent, Graph, parallel, Conversation, KnowledgeStore
+├── easy.py               # purpose(), Team, quickstart wizard
+├── tools.py              # Secure built-in tools
+├── streaming.py          # Async + event streaming
+├── observability.py      # Cost tracking, callbacks
+├── multi_agent.py        # Agent teams with shared learning
+├── hitl.py               # Human-in-the-loop + checkpointing
+├── evaluation.py         # V1 benchmark runner
+├── registry.py           # Plugin system
+├── __init__.py           # 103 exports
+└── __main__.py           # CLI entry point
+```
+### Adding a New LLM Provider
+```python
+# In your code (no core edits needed):
+from purpose_agent import backend_registry, OpenAICompatibleBackend
+backend_registry.register("my_provider",
+    lambda model, api_key: OpenAICompatibleBackend(
+        model=model, base_url="https://api.myprovider.com/v1", api_key=api_key
+    ))
+```
+### Adding a New Tool
+```python
+from purpose_agent import FunctionTool
+def my_search(query: str) -> str:
+    """Search my database."""
+    return db.search(query)
+tool = FunctionTool.from_function(my_search)
+```
+### Running Tests
+```bash
+python tests/test_core.py              # 21 unit tests
+python tests/launch_readiness.py       # 119 comprehensive tests
+python benchmarks/validate.py          # Mock benchmark suite
+python benchmarks/validate.py --quick  # Fast smoke test
+```