---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - memory-system
  - multi-agent
  - slm
  - local-first
  - evaluation
  - safety
  - immune-system
pipeline_tag: text-generation
---

# Purpose Agent

**A local-first self-improvement kernel for AI agents.**

Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.

```bash
pip install purpose-agent
```

```python
import purpose_agent as pa

team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)

team.teach("Always add type hints")
# Next run uses what it learned
```

## How It Works (30-Second Version)

1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.

## Real-World Test Results

Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:

| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
| Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |

**0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
**Immune system:** 93% adversarial catch rate, 0% false positives.
**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).

## Install

```bash
pip install purpose-agent                    # Core (zero dependencies)
pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama]            # + Local Ollama
pip install purpose-agent[all]               # Everything
```

## Three Levels of Usage

### Level 1 — Describe what you want

```python
import purpose_agent as pa

team = pa.purpose("Write Python code and test it")  # → architect + coder + tester
team = pa.purpose("Research quantum computing")       # → researcher + analyst
team = pa.purpose("Write blog posts about AI")        # → writer + editor

result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status())  # See what it's learned
```

### Level 2 — Choose your model

```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")

# Cloud providers
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")

# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```

Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**

### Level 3 — Full control

Purpose Agent has its own API vocabulary — original names, not borrowed from other frameworks.

```python
import purpose_agent as pa

# ── Spark: a single intelligent agent ──
spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct")
result = spark.run("Write a fibonacci function")

# ── Flow: workflow engine with conditional routing ──
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b"))
flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b"))
flow.add_edge(pa.BEGIN, "research")
flow.add_edge("research", "write")
flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
result = flow.run(initial_state)

# ── swarm: run tasks concurrently ──
results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])

# ── Council: agents deliberate together ──
council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
result = council.run("Design a web scraper", rounds=3)

# ── Vault: knowledge store with RAG-as-a-tool ──
vault = pa.Vault.from_directory("./docs")
spark = pa.Spark("assistant", tools=[vault.as_tool()])
result = spark.run("What does the documentation say about X?")

# ── LLMCompiler: parallel tool execution via DAG planning ──
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```

## API Reference (Level 3)

| Name | What | Example |
|------|------|---------|
| `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
| `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
| `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
| `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
| `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
| `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
| `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |

## Evidence-Gated Memory

Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:

```
candidate → immune scan → quarantine → replay test → promote (or reject)
```

- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent

Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.

## Honest Evaluation

```python
from purpose_agent import RunMode

RunMode.LEARNING_TRAIN       # Full read/write — this is where agents learn
RunMode.LEARNING_VALIDATION  # Read + staging — validates before promoting
RunMode.EVAL_TEST            # NO writes — numbers you can trust
```

## Secure Tools

- **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** — subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** — sandboxed to declared root directory

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.

34 Python modules, ~500KB:

```
Core Engine   → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel     → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research      → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs → Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities  → Spark, Flow, swarm, Council, Vault
Easy API      → purpose(), Team, quickstart wizard
```

## Literature

Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |

## CLI

```bash
python -m purpose_agent  # Interactive wizard
purpose-agent            # Same, via entry point
```

## License

MIT