purpose-agent / README.md
Rohan03's picture
docs: Final README with real-world results, pip install, 3 usage levels
f28a638 verified
|
raw
history blame
7.48 kB
---
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- memory-system
- multi-agent
- slm
- local-first
- evaluation
- safety
- immune-system
pipeline_tag: text-generation
---
# Purpose Agent
**A local-first self-improvement kernel for AI agents.**
Agents that learn from experience β€” without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
```bash
pip install purpose-agent
```
```python
import purpose_agent as pa
team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)
team.teach("Always add type hints")
# Next run uses what it learned
```
## How It Works (30-Second Version)
1. **You give it a purpose.** "Help me write Python code."
2. **It builds a team.** Architect + Coder + Tester β€” auto-selected from your description.
3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
## Real-World Test Results
Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|-------|-----------|----------|-----------|-----------------|
| Llama-3.3-70B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’9β†’18 heuristics |
| Gemma-4-26B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’6β†’11 heuristics |
**Immune system:** 93% adversarial catch rate, 0% false positives.
**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
## Install
```bash
pip install purpose-agent # Core (zero dependencies)
pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama] # + Local Ollama
pip install purpose-agent[all] # Everything
```
## Three Levels of Usage
### Level 1 β€” Describe what you want
```python
import purpose_agent as pa
team = pa.purpose("Write Python code and test it") # β†’ architect + coder + tester
team = pa.purpose("Research quantum computing") # β†’ researcher + analyst
team = pa.purpose("Write blog posts about AI") # β†’ writer + editor
result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status()) # See what it's learned
```
### Level 2 β€” Choose your model
```python
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")
# Cloud
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")
# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
```
Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
### Level 3 β€” Full control
```python
import purpose_agent as pa
# Graph workflows (LangGraph-style)
graph = pa.Graph()
graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
graph.add_edge(pa.START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", pa.END)
result = graph.run(pa.State(data={"topic": "AI safety"}))
# Parallel execution (CrewAI-style)
results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
# Agent conversations (AutoGen-style)
chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
result = chat.run("Design a web scraper", rounds=3)
# Knowledge-aware agents (LlamaIndex-style)
kb = pa.KnowledgeStore.from_directory("./docs")
agent = pa.Agent("assistant", tools=[kb.as_tool()])
# Parallel tool execution (LLMCompiler-style)
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
```
## Evidence-Gated Memory
Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
```
candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)
```
- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
- **Quarantine** holds memories until they're tested
- **Promotion** happens only after evidence shows the memory helps
- **Rejection** preserves the memory for audit but never exposes it to the agent
Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
## Honest Evaluation
Three run modes enforce what the framework can mutate:
```python
from purpose_agent import RunMode
RunMode.LEARNING_TRAIN # Full read/write β€” this is where agents learn
RunMode.LEARNING_VALIDATION # Read + staging β€” validates before promoting
RunMode.EVAL_TEST # NO writes β€” numbers you can trust
```
## Secure Tools
- **CalculatorTool** β€” AST-validated, no `eval()` on arbitrary text
- **PythonExecTool** β€” subprocess with timeout + isolated temp directory
- **ReadFile/WriteFile** β€” sandboxed to declared root directory
## Architecture
See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
34 Python modules, ~500KB, organized in layers:
```
Core Engine β†’ Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel β†’ Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research β†’ Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities β†’ Agent, Graph, Parallel, Conversation, KnowledgeStore
Easy API β†’ purpose(), Team, quickstart wizard
```
## Literature
Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
## CLI
```bash
python -m purpose_agent # Interactive wizard
purpose-agent # Same, via entry point
```
## License
MIT