docs: Final README with real-world results, pip install, 3 usage levels

f28a638 verified 16 days ago

7.48 kB

	---
	library_name: purpose-agent
	license: mit
	language:
	- en
	tags:
	- reinforcement-learning
	- agents
	- self-improving
	- memory-system
	- multi-agent
	- slm
	- local-first
	- evaluation
	- safety
	- immune-system
	pipeline_tag: text-generation
	---

	# Purpose Agent

	A local-first self-improvement kernel for AI agents.

	Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.

	```bash
	pip install purpose-agent
	```

	```python
	import purpose_agent as pa

	team = pa.purpose("Help me write Python code")
	result = team.run("Write a fibonacci function")
	print(result)

	team.teach("Always add type hints")
	# Next run uses what it learned
	```

	## How It Works (30-Second Version)

	1. You give it a purpose. "Help me write Python code."
	2. It builds a team. Architect + Coder + Tester — auto-selected from your description.
	3. It runs the task. The agent writes code. A separate critic (the Purpose Function) scores every step.
	4. It learns. Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
	5. Next run is better. Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.

	## Real-World Test Results

	Tested with Llama-3.3-70B and Gemma-4-26B via OpenRouter:

	\| Model \| fibonacci \| fizzbuzz \| factorial \| Self-Improvement \|
	\|-------\|-----------\|----------\|-----------\|-----------------\|
	\| Llama-3.3-70B \| ✓ 100% \| ✓ 100% \| ✓ 100% \| 0→3→9→18 heuristics \|
	\| Gemma-4-26B \| ✓ 100% \| ✓ 100% \| ✓ 100% \| 0→3→6→11 heuristics \|

	Immune system: 93% adversarial catch rate, 0% false positives.

	Test suite: 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).

	## Install

	```bash
	pip install purpose-agent # Core (zero dependencies)
	pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
	pip install purpose-agent[ollama] # + Local Ollama
	pip install purpose-agent[all] # Everything
	```

	## Three Levels of Usage

	### Level 1 — Describe what you want

	```python
	import purpose_agent as pa

	team = pa.purpose("Write Python code and test it") # → architect + coder + tester
	team = pa.purpose("Research quantum computing") # → researcher + analyst
	team = pa.purpose("Write blog posts about AI") # → writer + editor

	result = team.run("Write a sorting algorithm")
	team.teach("Always handle edge cases")
	print(team.status()) # See what it's learned
	```

	### Level 2 — Choose your model

	```python
	# Local (free, private)
	team = pa.purpose("Code helper", model="qwen3:1.7b")

	# Cloud
	team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
	team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
	team = pa.purpose("Code helper", model="openai:gpt-4o")

	# Any OpenAI-compatible API
	from purpose_agent import resolve_backend
	backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
	```

	Supported providers: OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.

	### Level 3 — Full control

	```python
	import purpose_agent as pa

	# Graph workflows (LangGraph-style)
	graph = pa.Graph()
	graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
	graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
	graph.add_edge(pa.START, "research")
	graph.add_edge("research", "write")
	graph.add_edge("write", pa.END)
	result = graph.run(pa.State(data={"topic": "AI safety"}))

	# Parallel execution (CrewAI-style)
	results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])

	# Agent conversations (AutoGen-style)
	chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
	result = chat.run("Design a web scraper", rounds=3)

	# Knowledge-aware agents (LlamaIndex-style)
	kb = pa.KnowledgeStore.from_directory("./docs")
	agent = pa.Agent("assistant", tools=[kb.as_tool()])

	# Parallel tool execution (LLMCompiler-style)
	compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
	result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
	```

	## Evidence-Gated Memory

	Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:

	```
	candidate → immune scan → quarantine → replay test → promote (or reject)
	```

	- Immune scan blocks prompt injection, score manipulation, API key leaks, tool misuse
	- Quarantine holds memories until they're tested
	- Promotion happens only after evidence shows the memory helps
	- Rejection preserves the memory for audit but never exposes it to the agent

	Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.

	## Honest Evaluation

	Three run modes enforce what the framework can mutate:

	```python
	from purpose_agent import RunMode

	RunMode.LEARNING_TRAIN # Full read/write — this is where agents learn
	RunMode.LEARNING_VALIDATION # Read + staging — validates before promoting
	RunMode.EVAL_TEST # NO writes — numbers you can trust
	```

	## Secure Tools

	- CalculatorTool — AST-validated, no `eval()` on arbitrary text
	- PythonExecTool — subprocess with timeout + isolated temp directory
	- ReadFile/WriteFile — sandboxed to declared root directory

	## Architecture

	See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.

	34 Python modules, ~500KB, organized in layers:

	```
	Core Engine → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
	V2 Kernel → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
	Research → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
	Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
	Capabilities → Agent, Graph, Parallel, Conversation, KnowledgeStore
	Easy API → purpose(), Team, quickstart wizard
	```

	## Literature

	Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
	Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

	\| Paper \| What it contributes \|
	\|-------\|-------------------\|
	\| [MUSE](https://arxiv.org/abs/2510.08002) \| 3-tier memory hierarchy \|
	\| [LATS](https://arxiv.org/abs/2310.04406) \| LLM-as-value-function \|
	\| [REMEMBERER](https://arxiv.org/abs/2306.07929) \| Q-value experience replay \|
	\| [Reflexion](https://arxiv.org/abs/2303.11366) \| Verbal reinforcement \|
	\| [SPC](https://arxiv.org/abs/2504.19162) \| Anti-reward-hacking \|
	\| [CER](https://arxiv.org/abs/2506.06698) \| Experience distillation \|
	\| [MemRL](https://arxiv.org/abs/2601.03192) \| Two-phase retrieval \|
	\| [TinyAgent](https://arxiv.org/abs/2409.00608) \| SLM-native patterns \|
	\| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) \| Self-improving critic \|
	\| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) \| Synthetic critic training \|
	\| [DSPy](https://arxiv.org/abs/2310.03714) \| Automatic prompt optimization \|
	\| [LLMCompiler](https://arxiv.org/abs/2312.04511) \| Parallel function calling \|
	\| [Retroformer](https://arxiv.org/abs/2308.02151) \| Structured reflection \|

	## CLI

	```bash
	python -m purpose_agent # Interactive wizard
	purpose-agent # Same, via entry point
	```

	## License

	MIT