File size: 7,481 Bytes

---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - memory-system
  - multi-agent
  - slm
  - local-first
  - evaluation
  - safety
  - immune-system
pipeline_tag: text-generation
---

# Purpose Agent

**A local-first self-improvement kernel for AI agents.**

Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.

```bash
pip install purpose-agent
```

```python
import purpose_agent as pa

team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)

team.teach("Always add type hints")
# Next run uses what it learned
```

## How It Works (30-Second Version)

1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.

## Real-World Test Results

Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:

| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
| Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |

**Immune system:** 93% adversarial catch rate, 0% false positives.

**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).

## Install

```bash
pip install purpose-agent                    # Core (zero dependencies)
pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama]            # + Local Ollama
pip install purpose-agent[all]               # Everything
```

## Three Levels of Usage

### Level 1 — Describe what you want

```python
import purpose_agent as pa

team = pa.purpose("Write Python code and test it")  # → architect + coder + tester
team = pa.purpose("Research quantum computing")       # → researcher + analyst
team = pa.purpose("Write blog posts about AI")        # → writer + editor

result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status())  # See what it's learned
```

### Level 2 — Choose your model

```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")

# Cloud
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")

# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```

Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**

### Level 3 — Full control

```python
import purpose_agent as pa

# Graph workflows (LangGraph-style)
graph = pa.Graph()
graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
graph.add_edge(pa.START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", pa.END)
result = graph.run(pa.State(data={"topic": "AI safety"}))

# Parallel execution (CrewAI-style)
results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])

# Agent conversations (AutoGen-style)
chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
result = chat.run("Design a web scraper", rounds=3)

# Knowledge-aware agents (LlamaIndex-style)
kb = pa.KnowledgeStore.from_directory("./docs")
agent = pa.Agent("assistant", tools=[kb.as_tool()])

# Parallel tool execution (LLMCompiler-style)
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```

## Evidence-Gated Memory

Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:

```
candidate → immune scan → quarantine → replay test → promote (or reject)
```

- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent

Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.

## Honest Evaluation

Three run modes enforce what the framework can mutate:

```python
from purpose_agent import RunMode

RunMode.LEARNING_TRAIN       # Full read/write — this is where agents learn
RunMode.LEARNING_VALIDATION  # Read + staging — validates before promoting
RunMode.EVAL_TEST            # NO writes — numbers you can trust
```

## Secure Tools

- **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** — subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** — sandboxed to declared root directory

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.

34 Python modules, ~500KB, organized in layers:

```
Core Engine  → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel    → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research     → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities → Agent, Graph, Parallel, Conversation, KnowledgeStore
Easy API     → purpose(), Team, quickstart wizard
```

## Literature

Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |

## CLI

```bash
python -m purpose_agent  # Interactive wizard
purpose-agent            # Same, via entry point
```

## License

MIT