purpose-agent / README.md
Rohan03's picture
v2.1.0: README with Spark/Flow/swarm/Council/Vault naming + API reference table
320bde4 verified
|
raw
history blame
8.67 kB
---
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- memory-system
- multi-agent
- slm
- local-first
- evaluation
- safety
- immune-system
pipeline_tag: text-generation
---
# Purpose Agent
**A local-first self-improvement kernel for AI agents.**
Agents that learn from experience β€” without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
```bash
pip install purpose-agent
```
```python
import purpose_agent as pa
team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)
team.teach("Always add type hints")
# Next run uses what it learned
```
## How It Works (30-Second Version)
1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester β€” auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
## Real-World Test Results
Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’9β†’18 heuristics |
| Gemma-4-26B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’6β†’11 heuristics |
**0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
**Immune system:** 93% adversarial catch rate, 0% false positives.
**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
## Install
```bash
pip install purpose-agent # Core (zero dependencies)
pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama] # + Local Ollama
pip install purpose-agent[all] # Everything
```
## Three Levels of Usage
### Level 1 β€” Describe what you want
```python
import purpose_agent as pa
team = pa.purpose("Write Python code and test it") # β†’ architect + coder + tester
team = pa.purpose("Research quantum computing") # β†’ researcher + analyst
team = pa.purpose("Write blog posts about AI") # β†’ writer + editor
result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status()) # See what it's learned
```
### Level 2 β€” Choose your model
```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")
# Cloud providers
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")
# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```
Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
### Level 3 β€” Full control
Purpose Agent has its own API vocabulary β€” original names, not borrowed from other frameworks.
```python
import purpose_agent as pa
# ── Spark: a single intelligent agent ──
spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct")
result = spark.run("Write a fibonacci function")
# ── Flow: workflow engine with conditional routing ──
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b"))
flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b"))
flow.add_edge(pa.BEGIN, "research")
flow.add_edge("research", "write")
flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
result = flow.run(initial_state)
# ── swarm: run tasks concurrently ──
results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])
# ── Council: agents deliberate together ──
council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
result = council.run("Design a web scraper", rounds=3)
# ── Vault: knowledge store with RAG-as-a-tool ──
vault = pa.Vault.from_directory("./docs")
spark = pa.Spark("assistant", tools=[vault.as_tool()])
result = spark.run("What does the documentation say about X?")
# ── LLMCompiler: parallel tool execution via DAG planning ──
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```
## API Reference (Level 3)
| Name | What | Example |
|------|------|---------|
| `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
| `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
| `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
| `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
| `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
| `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
| `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |
## Evidence-Gated Memory
Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
```
candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)
```
- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent
Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
## Honest Evaluation
```python
from purpose_agent import RunMode
RunMode.LEARNING_TRAIN # Full read/write β€” this is where agents learn
RunMode.LEARNING_VALIDATION # Read + staging β€” validates before promoting
RunMode.EVAL_TEST # NO writes β€” numbers you can trust
```
## Secure Tools
- **CalculatorTool** β€” AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** β€” subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** β€” sandboxed to declared root directory
## Architecture
See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
34 Python modules, ~500KB:
```
Core Engine β†’ Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel β†’ Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research β†’ Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs β†’ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities β†’ Spark, Flow, swarm, Council, Vault
Easy API β†’ purpose(), Team, quickstart wizard
```
## Literature
Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
## CLI
```bash
python -m purpose_agent # Interactive wizard
purpose-agent # Same, via entry point
```
## License
MIT