README: complete rewrite with visual architecture, animations, feature map, usage guide

f970fc9 verified 20 days ago

17.4 kB

	---
	library_name: purpose-agent
	license: mit
	language:
	- en
	tags:
	- agents
	- self-improving
	- multi-agent
	- memory-system
	- local-first
	- slm
	- safety
	- event-driven
	- rag
	- tools
	pipeline_tag: text-generation
	---

	<div align="center">

	# 🧠 Purpose Agent

	### The framework where AI agents actually learn from experience.

	Local-first · Self-improving · Domain-agnostic · Production-hardened

	[![PyPI](https://img.shields.io/pypi/v/purpose-agent?color=blue&label=PyPI)](https://pypi.org/project/purpose-agent/)
	[![Python](https://img.shields.io/pypi/pyversions/purpose-agent)](https://pypi.org/project/purpose-agent/)
	[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
	[![Tests](https://img.shields.io/badge/tests-250%2B_passing-brightgreen)]()
	[![Papers](https://img.shields.io/badge/papers-13_implemented-purple)]()

	---

	```
	pip install purpose-agent
	```

	</div>

	---

	## 🎯 What Problem Does This Solve?

	Every other agent framework (LangChain, CrewAI, AutoGen) runs the same way every time. Your agent fails at a task? Next time, it fails the exact same way. No learning. No memory. No improvement.

	Purpose Agent is different. After every task:

	```
	┌─────────────────────────────────────────────────────────────┐
	│ │
	│ Task → Execute → Score → Extract Lessons → Remember │
	│ ↑ │ │
	│ └───── Next task uses lessons ──────────────┘ │
	│ │
	│ Run 1: Agent struggles ──────── Φ = 3.0 │
	│ Run 2: Uses learned heuristics ─ Φ = 7.0 │
	│ Run 3: Refined further ──────── Φ = 9.5 │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	No fine-tuning. No GPU training. Just memory + experience.

	---

	## ⚡ 3-Line Quickstart

	```python
	import purpose_agent as pa

	team = pa.purpose("Help me write Python code")
	result = team.run("Write a fibonacci function")
	```

	That's it. The framework auto-detects your model, builds the right team, executes the task, scores the result, and stores lessons for next time.

	---

	## 🏗️ Architecture at a Glance

	```
	╔══════════════════════════════════════════════════════════════════╗
	║ PURPOSE AGENT v3.0 ║
	╠══════════════════════════════════════════════════════════════════╣
	║ ║
	║ ┌──────────┐ ┌─────────────┐ ┌──────────────────┐ ║
	║ │ YOU │───▶│ EASY API │───▶│ ORCHESTRATOR │ ║
	║ │ (purpose) │ │ (auto-team) │ │ (step loop) │ ║
	║ └──────────┘ └─────────────┘ └────────┬─────────┘ ║
	║ │ ║
	║ ┌────────────────────────────────┼──────────┐ ║
	║ │ ▼ │ ║
	║ │ ┌──────────┐ ┌──────────────────┐ │ ║
	║ │ │ ACTOR │───▶│ ENVIRONMENT │ │ ║
	║ │ │ (decide) │ │ (execute) │ │ ║
	║ │ └──────────┘ └────────┬─────────┘ │ ║
	║ │ │ │ ║
	║ │ ┌───────────────────▼─────┐ │ ║
	║ │ │ PURPOSE FUNCTION (Φ) │ │ ║
	║ │ │ Score: 0 ──────── 10 │ │ ║
	║ │ │ O(1) state-delta mode │ │ ║
	║ │ └───────────────────┬─────┘ │ ║
	║ │ │ │ ║
	║ │ ┌────────────────────────▼─────────┐ │ ║
	║ │ │ MEMORY (immune-scanned) │ │ ║
	║ │ │ 7 types · 5 statuses · scoped │ │ ║
	║ │ │ quarantine → test → promote │ │ ║
	║ │ └──────────────────────────────────┘ │ ║
	║ │ │ ║
	║ └──── SELF-IMPROVEMENT LOOP ───────────────┘ ║
	║ ║
	╚══════════════════════════════════════════════════════════════════╝
	```

	---

	## 🎨 Three Ways to Use It

	### 🟢 Level 1 — Just Describe What You Want

	```python
	import purpose_agent as pa

	# Auto-detects the right team composition
	team = pa.purpose("Write Python code and test it") # → architect + coder + tester
	team = pa.purpose("Research quantum computing") # → researcher + analyst
	team = pa.purpose("Analyze sales data") # → analyst + reporter
	team = pa.purpose("Write a blog post") # → writer + editor

	result = team.run("Create a sorting algorithm")
	team.teach("Always handle edge cases") # Inject knowledge directly
	print(team.status()) # See what it's learned
	```

	### 🟡 Level 2 — Choose Your Model & Add Knowledge

	```python
	import purpose_agent as pa

	# 10+ providers supported
	team = pa.purpose("Code helper", model="ollama:qwen3:1.7b") # Local, free
	team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
	team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
	team = pa.purpose("Code helper", model="openai:gpt-4o")

	# Add your own documents as knowledge
	team = pa.purpose("Answer questions about our product",
	knowledge="./docs/", # Load entire folder
	model="qwen3:1.7b",
	)
	answer = team.ask("What's our refund policy?")
	```

	### 🔴 Level 3 — Full Control

	```python
	import purpose_agent as pa

	# ── Spark: single intelligent agent ──
	spark = pa.Spark("coder", model="ollama:qwen3:1.7b", tools=[pa.PythonExecTool()])
	result = spark.run("Write fibonacci")

	# ── Flow: workflow with conditional routing ──
	flow = pa.Flow()
	flow.add_node("research", pa.Spark("researcher"))
	flow.add_node("write", pa.Spark("writer"))
	flow.add_edge(pa.BEGIN, "research")
	flow.add_conditional_edge("write", check_fn, {"pass": pa.DONE_SIGNAL, "revise": "research"})
	result = flow.run(state)

	# ── swarm: parallel execution ──
	results = pa.swarm(["task_a", "task_b", "task_c"], agents=[a1, a2, a3])

	# ── Council: multi-agent deliberation ──
	council = pa.Council([pa.Spark("alice"), pa.Spark("bob"), pa.Spark("carol")])
	result = council.run("Should we use microservices?", rounds=3)

	# ── Vault: knowledge RAG ──
	vault = pa.Vault.from_directory("./research_papers/")
	agent = pa.Spark("analyst", tools=[vault.as_tool()])

	# ── Generate entire systems ──
	from purpose_agent.mas_generator import generate
	system = generate("Monitor GitHub repos for CVEs and alert the team")
	# → 4 agents + workflow + tools + eval suite + routing policy
	```

	---

	## 🛡️ Safety & Security

	```
	┌─────────────────────────────────────────────┐
	│ MEMORY IMMUNE SYSTEM │
	│ │
	│ candidate ──→ immune scan ──→ quarantine │
	│ │ │ │
	│ ┌─────▼─────┐ ┌────▼────┐ │
	│ │ REJECTED │ │ TEST │ │
	│ │ (5 scans) │ │ (replay)│ │
	│ └────────────┘ └────┬────┘ │
	│ │ │
	│ ┌─────▼─────┐ │
	│ │ PROMOTED │ │
	│ │ (active) │ │
	│ └───────────┘ │
	└─────────────────────────────────────────────┘
	```

	5 threat scanners: prompt injection, score manipulation, tool misuse, privacy leaks, scope overreach

	PEP 578 kernel sandbox: Unbypassable audit hooks at the C-interpreter level. No Docker needed.

	Falsification critic: Code is scored by CPU-executed assertions, not LLM hallucinations.

	---

	## 🔬 First-Principles Engineering

	\| Problem \| Old Approach \| Purpose Agent \|
	\|---------\|-------------\|---------------\|
	\| Token cost grows O(N²) \| Pass full history to critic \| O(1) state-delta — only pass what changed \|
	\| SLMs hallucinate scores \| "Rate this 0-10" → guess \| Falsification — generate asserts, CPU executes, score = math \|
	\| Sandbox bypassed via dynamic code \| AST analysis (weak) \| PEP 578 audit hooks — kernel-level, unbypassable \|
	\| Heuristics overflow context \| Inject all 200 heuristics \| MoH cap K=10 — only top heuristics by Q-value \|
	\| UNKNOWN action crashes \| Parse failure → crash \| Safe fallback to DONE — never propagates garbage \|

	---

	## 📦 What's Inside (45+ modules)

	<details>
	<summary><b>🔧 Core Engine</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `orchestrator.py` \| Main step loop with 3 critic modes (standard/delta/falsification) \|
	\| `actor.py` \| ReAct agent with 3-tier memory + heuristic cap \|
	\| `purpose_function.py` \| Φ(s) scorer with 7 anti-gaming rules \|
	\| `experience_replay.py` \| Thread-safe trajectory storage with Q-value retrieval \|
	\| `optimizer.py` \| Trajectory → heuristic distillation \|

	</details>

	<details>
	<summary><b>🧬 Self-Improvement</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `memory.py` \| 7 memory kinds × 5 statuses, scoped, versioned \|
	\| `memory_ci.py` \| Quarantine → immune scan → test → promote/reject \|
	\| `memory_homeostasis.py` \| Budget enforcement, consolidation, archive \|
	\| `immune.py` \| 5 threat scanners for memory safety \|
	\| `breakthroughs.py` \| Self-improving critic, MoH, hindsight relabeling, evolution \|

	</details>

	<details>
	<summary><b>⚡ First-Principles</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `state_delta.py` \| O(1) Markovian state-diff for critic \|
	\| `falsification_critic.py` \| Popperian scoring via adversarial assertions \|
	\| `sandbox_hooks.py` \| PEP 578 kernel-level audit hooks \|
	\| `hardening.py` \| Null safety, timeouts, validation, graceful degradation \|
	\| `sre_patches.py` \| 5 auto-applied critical vulnerability fixes \|

	</details>

	<details>
	<summary><b>🌐 Protocols & Interop</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `protocols/mcp_bridge.py` \| MCP tool server integration \|
	\| `protocols/a2a.py` \| Agent-to-Agent delegation with circuit breaker \|
	\| `protocols/agui.py` \| AG-UI frontend streaming \|
	\| `protocols/agents_md.py` \| AGENTS.md repo-local instructions \|
	\| `quorum.py` \| Consensus/disagreement topology switching \|

	</details>

	<details>
	<summary><b>🧠 Intelligence</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `routing.py` \| Smart model selection (local-first, cost-aware) \|
	\| `mas_generator.py` \| Use-case → complete multi-agent system \|
	\| `skills/schema.py` \| Versioned, evolvable, testable skill cards \|
	\| `skills/ci.py` \| Skill testing + rollback + Darwinian selection \|
	\| `llm_compiler.py` \| Parallel tool execution via DAG planning \|

	</details>

	<details>
	<summary><b>📈 Optimization</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `optimization/fingerprint.py` \| Capability profiling from traces \|
	\| `optimization/dataset.py` \| Trace → filtered training dataset \|
	\| `optimization/prompt_pack.py` \| Epigenetic prompt optimization \|
	\| `optimization/shadow_eval.py` \| Candidate vs baseline comparison \|
	\| `optimization/optimizer.py` \| Improving/plateau/degrading policy \|
	\| `optimization/lora_plan.py` \| LoRA/distillation dry-run planning \|

	</details>

	<details>
	<summary><b>🏗️ Runtime</b></summary>

	\| Module \| What \|
	\|--------\|------\|
	\| `runtime/events.py` \| 30 canonical event types \|
	\| `runtime/event_bus.py` \| Async pub/sub with backpressure \|
	\| `runtime/state.py` \| Typed execution state for checkpointing \|
	\| `runtime/checkpoint.py` \| InMemory/JSONL/SQLite durability \|
	\| `streaming_v3.py` \| AG-UI compatible stream adapters \|

	</details>

	---

	## 🔌 Supported Providers

	```python
	from purpose_agent import resolve_backend

	resolve_backend("ollama:qwen3:1.7b") # Local (free)
	resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
	resolve_backend("groq:llama-3.3-70b-versatile")
	resolve_backend("openai:gpt-4o")
	resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")
	resolve_backend("fireworks:accounts/fireworks/models/llama-v3p1-70b")
	resolve_backend("cerebras:llama-3.3-70b")
	resolve_backend("deepseek:deepseek-chat")
	resolve_backend("mistral:mistral-large-latest")
	resolve_backend("hf:Qwen/Qwen3-32B")
	```

	---

	## 📊 Real-World Test Results

	Tested with Llama-3.3-70B and Gemma-4-26B via OpenRouter:

	\| Test \| Llama-70B \| Gemma-26B \|
	\|------\|:---------:\|:---------:\|
	\| fibonacci (4 unit tests) \| ✅ 100% \| ✅ 100% \|
	\| fizzbuzz (4 unit tests) \| ✅ 100% \| ✅ 100% \|
	\| factorial (3 unit tests) \| ✅ 100% \| ✅ 100% \|
	\| Self-improvement (heuristic growth) \| 0→18 \| 0→11 \|
	\| Immune system (adversarial) \| 93% catch \| — \|
	\| Production test (19 checks) \| 19/19 ✅ \| — \|

	250+ automated tests. Zero failures required for release.

	---

	## 📚 Research Foundation

	Built on 13 published papers. Every module traces back to a specific result.

	\| Paper \| Module \| Contribution \|
	\|-------\|--------\|-------------\|
	\| Ng et al. 1999 (PBRS) \| purpose_function \| Φ preserves optimal policy \|
	\| MUSE (2510.08002) \| actor, optimizer \| 3-tier memory hierarchy \|
	\| REMEMBERER (2306.07929) \| experience_replay \| Q-value retrieval \|
	\| Reflexion (2303.11366) \| orchestrator \| Verbal reinforcement \|
	\| SPC (2504.19162) \| immune \| Anti-reward-hacking \|
	\| Meta-Rewarding (2407.19594) \| meta_rewarding \| Self-improving critic \|
	\| DSPy (2310.03714) \| prompt_optimizer \| Automatic few-shot bootstrap \|
	\| LLMCompiler (2312.04511) \| llm_compiler \| Parallel tool DAG \|
	\| Retroformer (2308.02151) \| retroformer \| Structured reflection \|
	\| TinyAgent (2409.00608) \| slm_backends \| SLM-native patterns \|
	\| DeepSeek MoE (2401.06066) \| breakthroughs \| MoH sparse selection \|
	\| HER (1707.01495) \| breakthroughs \| Hindsight relabeling \|
	\| Self-Taught Eval (2408.02666) \| self_taught \| Synthetic critic training \|

	Full proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md) · Research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)

	---

	## 🚀 Install

	```bash
	pip install purpose-agent # Core (zero dependencies)
	pip install purpose-agent[openai] # + OpenAI/Groq/OpenRouter
	pip install purpose-agent[ollama] # + Local Ollama
	pip install purpose-agent[all] # Everything
	```

	For local models (recommended — free, private):
	```bash
	curl -fsSL https://ollama.ai/install.sh \| sh
	ollama pull qwen3:1.7b
	```

	---

	## 🖥️ CLI

	```bash
	python -m purpose_agent # Interactive wizard
	purpose-agent # Same, via entry point
	```

	---

	## 📄 License

	MIT — use it for anything.

	---

	<div align="center">

	Built on 13 papers. Zero fine-tuning. Agents that actually improve.

	[PyPI](https://pypi.org/project/purpose-agent/) · [Architecture](ARCHITECTURE.md) · [Formal Proofs](PURPOSE_LEARNING.md) · [Changelog](CHANGELOG.md)

	</div>