File size: 7,481 Bytes
ca2cef5
 
 
 
 
 
 
 
 
276b221
ce80011
adb4257
276b221
adb4257
276b221
 
ca2cef5
 
 
ce80011
a99d027
f28a638
 
 
 
 
 
 
a99d027
 
ce80011
adb4257
f28a638
 
ce80011
a99d027
f28a638
 
a99d027
 
f28a638
adb4257
f28a638
 
 
 
 
adb4257
f28a638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276b221
adb4257
276b221
ce80011
f28a638
ce80011
276b221
f28a638
 
 
 
 
 
 
 
 
adb4257
 
f28a638
a99d027
 
f28a638
 
 
 
 
 
 
 
 
 
 
a99d027
 
f28a638
 
276b221
a99d027
adb4257
f28a638
ce80011
f28a638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276b221
ce80011
f28a638
ce80011
f28a638
ce80011
f28a638
 
adb4257
a99d027
f28a638
 
 
 
a99d027
f28a638
a99d027
f28a638
ce80011
f28a638
276b221
 
f28a638
276b221
f28a638
 
 
adb4257
a99d027
276b221
a99d027
f28a638
276b221
f28a638
a99d027
f28a638
adb4257
f28a638
adb4257
f28a638
adb4257
f28a638
 
 
 
 
 
 
ce80011
 
f28a638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adb4257
 
f28a638
 
adb4257
a99d027
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - memory-system
  - multi-agent
  - slm
  - local-first
  - evaluation
  - safety
  - immune-system
pipeline_tag: text-generation
---

# Purpose Agent

**A local-first self-improvement kernel for AI agents.**

Agents that learn from experience β€” without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.

```bash
pip install purpose-agent
```

```python
import purpose_agent as pa

team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)

team.teach("Always add type hints")
# Next run uses what it learned
```

## How It Works (30-Second Version)

1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester β€” auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.

## Real-World Test Results

Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:

| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’9β†’18 heuristics |
| Gemma-4-26B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’6β†’11 heuristics |

**Immune system:** 93% adversarial catch rate, 0% false positives.

**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).

## Install

```bash
pip install purpose-agent                    # Core (zero dependencies)
pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama]            # + Local Ollama
pip install purpose-agent[all]               # Everything
```

## Three Levels of Usage

### Level 1 β€” Describe what you want

```python
import purpose_agent as pa

team = pa.purpose("Write Python code and test it")  # β†’ architect + coder + tester
team = pa.purpose("Research quantum computing")       # β†’ researcher + analyst
team = pa.purpose("Write blog posts about AI")        # β†’ writer + editor

result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status())  # See what it's learned
```

### Level 2 β€” Choose your model

```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")

# Cloud
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")

# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```

Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**

### Level 3 β€” Full control

```python
import purpose_agent as pa

# Graph workflows (LangGraph-style)
graph = pa.Graph()
graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
graph.add_edge(pa.START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", pa.END)
result = graph.run(pa.State(data={"topic": "AI safety"}))

# Parallel execution (CrewAI-style)
results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])

# Agent conversations (AutoGen-style)
chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
result = chat.run("Design a web scraper", rounds=3)

# Knowledge-aware agents (LlamaIndex-style)
kb = pa.KnowledgeStore.from_directory("./docs")
agent = pa.Agent("assistant", tools=[kb.as_tool()])

# Parallel tool execution (LLMCompiler-style)
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```

## Evidence-Gated Memory

Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:

```
candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)
```

- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent

Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.

## Honest Evaluation

Three run modes enforce what the framework can mutate:

```python
from purpose_agent import RunMode

RunMode.LEARNING_TRAIN       # Full read/write β€” this is where agents learn
RunMode.LEARNING_VALIDATION  # Read + staging β€” validates before promoting
RunMode.EVAL_TEST            # NO writes β€” numbers you can trust
```

## Secure Tools

- **CalculatorTool** β€” AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** β€” subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** β€” sandboxed to declared root directory

## Architecture

See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.

34 Python modules, ~500KB, organized in layers:

```
Core Engine  β†’ Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel    β†’ Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research     β†’ Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities β†’ Agent, Graph, Parallel, Conversation, KnowledgeStore
Easy API     β†’ purpose(), Team, quickstart wizard
```

## Literature

Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).

| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |

## CLI

```bash
python -m purpose_agent  # Interactive wizard
purpose-agent            # Same, via entry point
```

## License

MIT