File size: 17,409 Bytes
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
485ddc5
ca2cef5
f970fc9
a99d027
f970fc9
f28a638
f970fc9
f28a638
f970fc9
 
 
 
 
 
 
 
 
f28a638
 
a99d027
f970fc9
a99d027
f970fc9
adb4257
f970fc9
adb4257
f970fc9
f28a638
f970fc9
f28a638
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
f28a638
f970fc9
f28a638
f970fc9
f28a638
f970fc9
adb4257
276b221
f28a638
 
f970fc9
 
adb4257
 
f970fc9
 
 
485ddc5
f970fc9
485ddc5
 
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a99d027
 
f970fc9
f28a638
f970fc9
 
 
 
 
f28a638
f970fc9
 
 
a99d027
 
f970fc9
f28a638
f970fc9
 
a99d027
f970fc9
 
 
 
 
 
 
 
 
 
 
 
485ddc5
320bde4
f970fc9
 
adb4257
f28a638
ce80011
f970fc9
 
 
320bde4
f970fc9
320bde4
f970fc9
 
320bde4
f970fc9
 
f28a638
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276b221
ce80011
485ddc5
adb4257
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adb4257
f970fc9
 
 
 
 
 
 
adb4257
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f28a638
 
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f28a638
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adb4257
 
485ddc5
f970fc9
485ddc5
f970fc9
adb4257
a99d027
f970fc9
 
 
 
 
a99d027
f970fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - agents
  - self-improving
  - multi-agent
  - memory-system
  - local-first
  - slm
  - safety
  - event-driven
  - rag
  - tools
pipeline_tag: text-generation
---

<div align="center">

# 🧠 Purpose Agent

### The framework where AI agents actually learn from experience.

**Local-first · Self-improving · Domain-agnostic · Production-hardened**

[![PyPI](https://img.shields.io/pypi/v/purpose-agent?color=blue&label=PyPI)](https://pypi.org/project/purpose-agent/)
[![Python](https://img.shields.io/pypi/pyversions/purpose-agent)](https://pypi.org/project/purpose-agent/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![Tests](https://img.shields.io/badge/tests-250%2B_passing-brightgreen)]()
[![Papers](https://img.shields.io/badge/papers-13_implemented-purple)]()

---

```
pip install purpose-agent
```

</div>

---

## 🎯 What Problem Does This Solve?

Every other agent framework (LangChain, CrewAI, AutoGen) runs **the same way every time**. Your agent fails at a task? Next time, it fails the exact same way. No learning. No memory. No improvement.

**Purpose Agent is different.** After every task:

```
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   Task → Execute → Score → Extract Lessons → Remember      │
│     ↑                                           │           │
│     └───── Next task uses lessons ──────────────┘           │
│                                                             │
│   Run 1: Agent struggles ──────── Φ = 3.0                  │
│   Run 2: Uses learned heuristics ─ Φ = 7.0                 │
│   Run 3: Refined further ──────── Φ = 9.5                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

**No fine-tuning. No GPU training. Just memory + experience.**

---

## ⚡ 3-Line Quickstart

```python
import purpose_agent as pa

team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
```

That's it. The framework auto-detects your model, builds the right team, executes the task, scores the result, and stores lessons for next time.

---

## 🏗️ Architecture at a Glance

```
╔══════════════════════════════════════════════════════════════════╗
║                     PURPOSE AGENT v3.0                          ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  ┌──────────┐    ┌─────────────┐    ┌──────────────────┐       ║
║  │  YOU      │───▶│  EASY API   │───▶│  ORCHESTRATOR    │       ║
║  │ (purpose) │    │ (auto-team) │    │ (step loop)      │       ║
║  └──────────┘    └─────────────┘    └────────┬─────────┘       ║
║                                               │                  ║
║              ┌────────────────────────────────┼──────────┐      ║
║              │                                ▼          │      ║
║              │    ┌──────────┐    ┌──────────────────┐   │      ║
║              │    │  ACTOR   │───▶│  ENVIRONMENT     │   │      ║
║              │    │ (decide) │    │  (execute)       │   │      ║
║              │    └──────────┘    └────────┬─────────┘   │      ║
║              │                             │             │      ║
║              │         ┌───────────────────▼─────┐      │      ║
║              │         │  PURPOSE FUNCTION (Φ)   │      │      ║
║              │         │  Score: 0 ──────── 10   │      │      ║
║              │         │  O(1) state-delta mode  │      │      ║
║              │         └───────────────────┬─────┘      │      ║
║              │                             │             │      ║
║              │    ┌────────────────────────▼─────────┐  │      ║
║              │    │  MEMORY (immune-scanned)          │  │      ║
║              │    │  7 types · 5 statuses · scoped   │  │      ║
║              │    │  quarantine → test → promote      │  │      ║
║              │    └──────────────────────────────────┘  │      ║
║              │                                          │      ║
║              └──── SELF-IMPROVEMENT LOOP ───────────────┘      ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

---

## 🎨 Three Ways to Use It

### 🟢 Level 1 — Just Describe What You Want

```python
import purpose_agent as pa

# Auto-detects the right team composition
team = pa.purpose("Write Python code and test it")   # → architect + coder + tester
team = pa.purpose("Research quantum computing")       # → researcher + analyst
team = pa.purpose("Analyze sales data")              # → analyst + reporter
team = pa.purpose("Write a blog post")               # → writer + editor

result = team.run("Create a sorting algorithm")
team.teach("Always handle edge cases")    # Inject knowledge directly
print(team.status())                       # See what it's learned
```

### 🟡 Level 2 — Choose Your Model & Add Knowledge

```python
import purpose_agent as pa

# 10+ providers supported
team = pa.purpose("Code helper", model="ollama:qwen3:1.7b")          # Local, free
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")

# Add your own documents as knowledge
team = pa.purpose("Answer questions about our product",
    knowledge="./docs/",              # Load entire folder
    model="qwen3:1.7b",
)
answer = team.ask("What's our refund policy?")
```

### 🔴 Level 3 — Full Control

```python
import purpose_agent as pa

# ── Spark: single intelligent agent ──
spark = pa.Spark("coder", model="ollama:qwen3:1.7b", tools=[pa.PythonExecTool()])
result = spark.run("Write fibonacci")

# ── Flow: workflow with conditional routing ──
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher"))
flow.add_node("write", pa.Spark("writer"))
flow.add_edge(pa.BEGIN, "research")
flow.add_conditional_edge("write", check_fn, {"pass": pa.DONE_SIGNAL, "revise": "research"})
result = flow.run(state)

# ── swarm: parallel execution ──
results = pa.swarm(["task_a", "task_b", "task_c"], agents=[a1, a2, a3])

# ── Council: multi-agent deliberation ──
council = pa.Council([pa.Spark("alice"), pa.Spark("bob"), pa.Spark("carol")])
result = council.run("Should we use microservices?", rounds=3)

# ── Vault: knowledge RAG ──
vault = pa.Vault.from_directory("./research_papers/")
agent = pa.Spark("analyst", tools=[vault.as_tool()])

# ── Generate entire systems ──
from purpose_agent.mas_generator import generate
system = generate("Monitor GitHub repos for CVEs and alert the team")
# → 4 agents + workflow + tools + eval suite + routing policy
```

---

## 🛡️ Safety & Security

```
┌─────────────────────────────────────────────┐
│           MEMORY IMMUNE SYSTEM              │
│                                             │
│  candidate ──→ immune scan ──→ quarantine   │
│                    │                │       │
│              ┌─────▼─────┐    ┌────▼────┐  │
│              │  REJECTED  │    │  TEST   │  │
│              │ (5 scans)  │    │ (replay)│  │
│              └────────────┘    └────┬────┘  │
│                                    │       │
│                              ┌─────▼─────┐ │
│                              │ PROMOTED  │ │
│                              │ (active)  │ │
│                              └───────────┘ │
└─────────────────────────────────────────────┘
```

**5 threat scanners:** prompt injection, score manipulation, tool misuse, privacy leaks, scope overreach

**PEP 578 kernel sandbox:** Unbypassable audit hooks at the C-interpreter level. No Docker needed.

**Falsification critic:** Code is scored by CPU-executed assertions, not LLM hallucinations.

---

## 🔬 First-Principles Engineering

| Problem | Old Approach | Purpose Agent |
|---------|-------------|---------------|
| Token cost grows O(N²) | Pass full history to critic | **O(1) state-delta** — only pass what changed |
| SLMs hallucinate scores | "Rate this 0-10" → guess | **Falsification** — generate asserts, CPU executes, score = math |
| Sandbox bypassed via dynamic code | AST analysis (weak) | **PEP 578 audit hooks** — kernel-level, unbypassable |
| Heuristics overflow context | Inject all 200 heuristics | **MoH cap K=10** — only top heuristics by Q-value |
| UNKNOWN action crashes | Parse failure → crash | **Safe fallback to DONE** — never propagates garbage |

---

## 📦 What's Inside (45+ modules)

<details>
<summary><b>🔧 Core Engine</b></summary>

| Module | What |
|--------|------|
| `orchestrator.py` | Main step loop with 3 critic modes (standard/delta/falsification) |
| `actor.py` | ReAct agent with 3-tier memory + heuristic cap |
| `purpose_function.py` | Φ(s) scorer with 7 anti-gaming rules |
| `experience_replay.py` | Thread-safe trajectory storage with Q-value retrieval |
| `optimizer.py` | Trajectory → heuristic distillation |

</details>

<details>
<summary><b>🧬 Self-Improvement</b></summary>

| Module | What |
|--------|------|
| `memory.py` | 7 memory kinds × 5 statuses, scoped, versioned |
| `memory_ci.py` | Quarantine → immune scan → test → promote/reject |
| `memory_homeostasis.py` | Budget enforcement, consolidation, archive |
| `immune.py` | 5 threat scanners for memory safety |
| `breakthroughs.py` | Self-improving critic, MoH, hindsight relabeling, evolution |

</details>

<details>
<summary><b>⚡ First-Principles</b></summary>

| Module | What |
|--------|------|
| `state_delta.py` | O(1) Markovian state-diff for critic |
| `falsification_critic.py` | Popperian scoring via adversarial assertions |
| `sandbox_hooks.py` | PEP 578 kernel-level audit hooks |
| `hardening.py` | Null safety, timeouts, validation, graceful degradation |
| `sre_patches.py` | 5 auto-applied critical vulnerability fixes |

</details>

<details>
<summary><b>🌐 Protocols & Interop</b></summary>

| Module | What |
|--------|------|
| `protocols/mcp_bridge.py` | MCP tool server integration |
| `protocols/a2a.py` | Agent-to-Agent delegation with circuit breaker |
| `protocols/agui.py` | AG-UI frontend streaming |
| `protocols/agents_md.py` | AGENTS.md repo-local instructions |
| `quorum.py` | Consensus/disagreement topology switching |

</details>

<details>
<summary><b>🧠 Intelligence</b></summary>

| Module | What |
|--------|------|
| `routing.py` | Smart model selection (local-first, cost-aware) |
| `mas_generator.py` | Use-case → complete multi-agent system |
| `skills/schema.py` | Versioned, evolvable, testable skill cards |
| `skills/ci.py` | Skill testing + rollback + Darwinian selection |
| `llm_compiler.py` | Parallel tool execution via DAG planning |

</details>

<details>
<summary><b>📈 Optimization</b></summary>

| Module | What |
|--------|------|
| `optimization/fingerprint.py` | Capability profiling from traces |
| `optimization/dataset.py` | Trace → filtered training dataset |
| `optimization/prompt_pack.py` | Epigenetic prompt optimization |
| `optimization/shadow_eval.py` | Candidate vs baseline comparison |
| `optimization/optimizer.py` | Improving/plateau/degrading policy |
| `optimization/lora_plan.py` | LoRA/distillation dry-run planning |

</details>

<details>
<summary><b>🏗️ Runtime</b></summary>

| Module | What |
|--------|------|
| `runtime/events.py` | 30 canonical event types |
| `runtime/event_bus.py` | Async pub/sub with backpressure |
| `runtime/state.py` | Typed execution state for checkpointing |
| `runtime/checkpoint.py` | InMemory/JSONL/SQLite durability |
| `streaming_v3.py` | AG-UI compatible stream adapters |

</details>

---

## 🔌 Supported Providers

```python
from purpose_agent import resolve_backend

resolve_backend("ollama:qwen3:1.7b")                    # Local (free)
resolve_backend("openrouter:meta-llama/llama-3.3-70b-instruct")
resolve_backend("groq:llama-3.3-70b-versatile")
resolve_backend("openai:gpt-4o")
resolve_backend("together:meta-llama/Llama-3.3-70B-Instruct-Turbo")
resolve_backend("fireworks:accounts/fireworks/models/llama-v3p1-70b")
resolve_backend("cerebras:llama-3.3-70b")
resolve_backend("deepseek:deepseek-chat")
resolve_backend("mistral:mistral-large-latest")
resolve_backend("hf:Qwen/Qwen3-32B")
```

---

## 📊 Real-World Test Results

Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:

| Test | Llama-70B | Gemma-26B |
|------|:---------:|:---------:|
| fibonacci (4 unit tests) | ✅ 100% | ✅ 100% |
| fizzbuzz (4 unit tests) | ✅ 100% | ✅ 100% |
| factorial (3 unit tests) | ✅ 100% | ✅ 100% |
| Self-improvement (heuristic growth) | 0→18 | 0→11 |
| Immune system (adversarial) | 93% catch | — |
| Production test (19 checks) | 19/19 ✅ | — |

**250+ automated tests. Zero failures required for release.**

---

## 📚 Research Foundation

Built on **13 published papers**. Every module traces back to a specific result.

| Paper | Module | Contribution |
|-------|--------|-------------|
| Ng et al. 1999 (PBRS) | purpose_function | Φ preserves optimal policy |
| MUSE (2510.08002) | actor, optimizer | 3-tier memory hierarchy |
| REMEMBERER (2306.07929) | experience_replay | Q-value retrieval |
| Reflexion (2303.11366) | orchestrator | Verbal reinforcement |
| SPC (2504.19162) | immune | Anti-reward-hacking |
| Meta-Rewarding (2407.19594) | meta_rewarding | Self-improving critic |
| DSPy (2310.03714) | prompt_optimizer | Automatic few-shot bootstrap |
| LLMCompiler (2312.04511) | llm_compiler | Parallel tool DAG |
| Retroformer (2308.02151) | retroformer | Structured reflection |
| TinyAgent (2409.00608) | slm_backends | SLM-native patterns |
| DeepSeek MoE (2401.06066) | breakthroughs | MoH sparse selection |
| HER (1707.01495) | breakthroughs | Hindsight relabeling |
| Self-Taught Eval (2408.02666) | self_taught | Synthetic critic training |

Full proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md) · Research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)

---

## 🚀 Install

```bash
pip install purpose-agent                    # Core (zero dependencies)
pip install purpose-agent[openai]            # + OpenAI/Groq/OpenRouter
pip install purpose-agent[ollama]            # + Local Ollama
pip install purpose-agent[all]              # Everything
```

**For local models (recommended — free, private):**
```bash
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:1.7b
```

---

## 🖥️ CLI

```bash
python -m purpose_agent     # Interactive wizard
purpose-agent               # Same, via entry point
```

---

## 📄 License

MIT — use it for anything.

---

<div align="center">

**Built on 13 papers. Zero fine-tuning. Agents that actually improve.**

[PyPI](https://pypi.org/project/purpose-agent/) · [Architecture](ARCHITECTURE.md) · [Formal Proofs](PURPOSE_LEARNING.md) · [Changelog](CHANGELOG.md)

</div>