File size: 8,668 Bytes
ca2cef5 276b221 ce80011 adb4257 276b221 adb4257 276b221 ca2cef5 ce80011 a99d027 f28a638 a99d027 ce80011 adb4257 f28a638 ce80011 a99d027 f28a638 a99d027 f28a638 adb4257 f28a638 adb4257 f28a638 320bde4 f28a638 276b221 adb4257 276b221 ce80011 f28a638 ce80011 276b221 f28a638 adb4257 f28a638 a99d027 f28a638 320bde4 f28a638 a99d027 f28a638 276b221 a99d027 320bde4 adb4257 f28a638 ce80011 320bde4 f28a638 320bde4 f28a638 320bde4 f28a638 320bde4 f28a638 320bde4 f28a638 276b221 ce80011 320bde4 f28a638 ce80011 f28a638 ce80011 f28a638 adb4257 a99d027 f28a638 a99d027 f28a638 a99d027 f28a638 ce80011 276b221 f28a638 276b221 f28a638 adb4257 a99d027 276b221 a99d027 f28a638 276b221 f28a638 a99d027 f28a638 adb4257 f28a638 adb4257 320bde4 adb4257 f28a638 320bde4 ce80011 f28a638 320bde4 f28a638 adb4257 f28a638 adb4257 a99d027 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | ---
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- memory-system
- multi-agent
- slm
- local-first
- evaluation
- safety
- immune-system
pipeline_tag: text-generation
---
# Purpose Agent
**A local-first self-improvement kernel for AI agents.**
Agents that learn from experience β without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
```bash
pip install purpose-agent
```
```python
import purpose_agent as pa
team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)
team.teach("Always add type hints")
# Next run uses what it learned
```
## How It Works (30-Second Version)
1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester β auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
## Real-World Test Results
Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | β 100% | β 100% | β 100% | 0β3β9β18 heuristics |
| Gemma-4-26B | β 100% | β 100% | β 100% | 0β3β6β11 heuristics |
**0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
**Immune system:** 93% adversarial catch rate, 0% false positives.
**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
## Install
```bash
pip install purpose-agent # Core (zero dependencies)
pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama] # + Local Ollama
pip install purpose-agent[all] # Everything
```
## Three Levels of Usage
### Level 1 β Describe what you want
```python
import purpose_agent as pa
team = pa.purpose("Write Python code and test it") # β architect + coder + tester
team = pa.purpose("Research quantum computing") # β researcher + analyst
team = pa.purpose("Write blog posts about AI") # β writer + editor
result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status()) # See what it's learned
```
### Level 2 β Choose your model
```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")
# Cloud providers
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")
# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```
Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
### Level 3 β Full control
Purpose Agent has its own API vocabulary β original names, not borrowed from other frameworks.
```python
import purpose_agent as pa
# ββ Spark: a single intelligent agent ββ
spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct")
result = spark.run("Write a fibonacci function")
# ββ Flow: workflow engine with conditional routing ββ
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b"))
flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b"))
flow.add_edge(pa.BEGIN, "research")
flow.add_edge("research", "write")
flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
result = flow.run(initial_state)
# ββ swarm: run tasks concurrently ββ
results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])
# ββ Council: agents deliberate together ββ
council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
result = council.run("Design a web scraper", rounds=3)
# ββ Vault: knowledge store with RAG-as-a-tool ββ
vault = pa.Vault.from_directory("./docs")
spark = pa.Spark("assistant", tools=[vault.as_tool()])
result = spark.run("What does the documentation say about X?")
# ββ LLMCompiler: parallel tool execution via DAG planning ββ
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```
## API Reference (Level 3)
| Name | What | Example |
|------|------|---------|
| `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
| `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
| `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
| `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
| `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
| `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
| `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |
## Evidence-Gated Memory
Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
```
candidate β immune scan β quarantine β replay test β promote (or reject)
```
- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent
Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
## Honest Evaluation
```python
from purpose_agent import RunMode
RunMode.LEARNING_TRAIN # Full read/write β this is where agents learn
RunMode.LEARNING_VALIDATION # Read + staging β validates before promoting
RunMode.EVAL_TEST # NO writes β numbers you can trust
```
## Secure Tools
- **CalculatorTool** β AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** β subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** β sandboxed to declared root directory
## Architecture
See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
34 Python modules, ~500KB:
```
Core Engine β Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel β Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research β Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs β Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities β Spark, Flow, swarm, Council, Vault
Easy API β purpose(), Team, quickstart wizard
```
## Literature
Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
## CLI
```bash
python -m purpose_agent # Interactive wizard
purpose-agent # Same, via entry point
```
## License
MIT
|