File size: 8,668 Bytes

ca2cef5
 
 
 
 
 
 
 
 
276b221
ce80011
adb4257
276b221
adb4257
276b221
 
ca2cef5
 
 
ce80011
a99d027
f28a638
 
 
 
 
 
 
a99d027
 
ce80011
adb4257
f28a638
 
ce80011
a99d027
f28a638
 
a99d027
 
f28a638
adb4257
f28a638
 
 
 
 
adb4257
f28a638
 
 
 
 
 
 
 
 
320bde4
f28a638
 
 
 
 
 
 
 
 
 
276b221
adb4257
276b221
ce80011
f28a638
ce80011
276b221
f28a638
 
 
 
 
 
 
 
 
adb4257
 
f28a638
a99d027
 
f28a638
 
 
320bde4
f28a638
 
 
 
 
 
 
a99d027
 
f28a638
 
276b221
a99d027
320bde4
 
adb4257
f28a638
ce80011
320bde4
 
 
 
 
 
 
 
 
 
 
 
f28a638
320bde4
 
f28a638
320bde4
 
 
f28a638
320bde4
 
 
 
f28a638
320bde4
f28a638
 
276b221
ce80011
320bde4
 
 
 
 
 
 
 
 
 
 
 
f28a638
ce80011
f28a638
ce80011
f28a638
 
adb4257
a99d027
f28a638
 
 
 
a99d027
f28a638
a99d027
f28a638
ce80011
276b221
f28a638
276b221
f28a638
 
 
adb4257
a99d027
276b221
a99d027
f28a638
276b221
f28a638
a99d027
f28a638
adb4257
f28a638
adb4257
320bde4
adb4257
f28a638
320bde4
 
 
 
 
 
ce80011
 
f28a638
 
320bde4
f28a638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adb4257
 
f28a638
 
adb4257
a99d027

---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - memory-system
  - multi-agent
  - slm
  - local-first
  - evaluation
  - safety
  - immune-system
pipeline_tag: text-generation
---

# Purpose Agent

**A local-first self-improvement kernel for AI agents.**

Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.

```bash
pip install purpose-agent
```

```python
import purpose_agent as pa

team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)

team.teach("Always add type hints")
# Next run uses what it learned
```

## How It Works (30-Second Version)

1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.

## Real-World Test Results

Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:

| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
| Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |

**0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
**Immune system:** 93% adversarial catch rate, 0% false positives.
**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).

## Install

```bash
pip install purpose-agent                    # Core (zero dependencies)
pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama]            # + Local Ollama
pip install purpose-agent[all]               # Everything
```

## Three Levels of Usage

### Level 1 — Describe what you want

```python
import purpose_agent as pa

team = pa.purpose("Write Python code and test it")  # → architect + coder + tester
team = pa.purpose("Research quantum computing")       # → researcher + analyst
team = pa.purpose("Write blog posts about AI")        # → writer + editor

result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status())  # See what it's learned
```

### Level 2 — Choose your model

```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")

# Cloud providers
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")

# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```

Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**

### Level 3 — Full control

Purpose Agent has its own API vocabulary — original names, not borrowed from other frameworks.

```python
import purpose_agent as pa

# ── Spark: a single intelligent agent ──
spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct")
result = spark.run("Write a fibonacci function")

# ── Flow: workflow engine with conditional routing ──
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b"))
flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b"))
flow.add_edge(pa.BEGIN, "research")
flow.add_edge("research", "write")
flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
result = flow.run(initial_state)

# ── swarm: run tasks concurrently ──
results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])

# ── Council: agents deliberate together ──
council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
result = council.run("Design a web scraper", rounds=3)

# ── Vault: knowledge store with RAG-as-a-tool ──
vault = pa.Vault.from_directory("./docs")
spark = pa.Spark("assistant", tools=[vault.as_tool()])
result = spark.run("What does the documentation say about X?")

# ── LLMCompiler: parallel tool execution via DAG planning ──
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```

## API Reference (Level 3)

| Name | What | Example |
|------|------|---------|
| `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
| `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
| `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
| `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
| `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
| `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
| `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |

## Evidence-Gated Memory

Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:

```
candidate → immune scan → quarantine → replay test → promote (or reject)
```

- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent

Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.

## Honest Evaluation

```python
from purpose_agent import RunMode

RunMode.LEARNING_TRAIN       # Full read/write — this is where agents learn
RunMode.LEARNING_VALIDATION  # Read + staging — validates before promoting
RunMode.EVAL_TEST            # NO writes — numbers you can trust
```

## Secure Tools

- **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** — subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** — sandboxed to declared root directory

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.

34 Python modules, ~500KB:

```
Core Engine   → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel     → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research      → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs → Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities  → Spark, Flow, swarm, Council, Vault
Easy API      → purpose(), Team, quickstart wizard
```

## Literature

Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |

## CLI

```bash
python -m purpose_agent  # Interactive wizard
purpose-agent            # Same, via entry point
```

## License

MIT