purpose-agent / README.md
Rohan03's picture
v0.2.0: Complete README with SLM-native, multi-agent, HITL, eval, streaming, tools, observability
adb4257 verified
|
raw
history blame
9.27 kB
---
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- experience-replay
- llm-as-judge
- state-value-evaluation
- memory-augmented
- react
- orchestration
- modular
- slm
- small-language-models
- multi-agent
- human-in-the-loop
- streaming
- tools
- evaluation
- ollama
- local-models
pipeline_tag: text-generation
---
# Purpose Agent v0.2.0
**The world's first SLM-native self-improving agentic framework.**
Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.
## What Makes This Different
| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
|---|:---:|:---:|:---:|:---:|:---:|:---:|
| **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ |
| **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ |
| **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ |
| **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ |
| **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ |
| **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
| **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ |
| **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ |
| **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ |
## Architecture
```
purpose_agent/
├── types.py # Core data types
├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
├── actor.py # ReAct agent with 3-tier memory
├── purpose_function.py # Non-hackable Φ(s) critic
├── experience_replay.py # Two-phase retrieval (similarity + Q-value)
├── optimizer.py # Trajectory → heuristic distillation
├── orchestrator.py # Main loop
├── streaming.py # 🆕 Async engine + event streaming
├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG
├── observability.py # 🆕 Cost tracking, callbacks, metrics
├── multi_agent.py # 🆕 Agent teams with shared learning
├── hitl.py # 🆕 Human-in-the-loop + checkpointing
└── evaluation.py # 🆕 Benchmark runner + improvement curves
```
## Quick Start — Local SLM (Zero Cost)
```bash
# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Pull a small model (1.7B params, runs on any laptop)
ollama pull qwen3:1.7b
# 3. Run your agent
python my_agent.py
```
```python
from purpose_agent import (
Orchestrator, OllamaBackend, State, Environment, Action,
CalculatorTool, ToolRegistry,
)
# SLM backend — runs locally, zero cost
llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params
# Or use a cloud LLM
# from purpose_agent import HFInferenceBackend
# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")
class MyEnv(Environment):
def execute(self, action, state):
return State(data={"result": "done"})
orch = Orchestrator(llm=llm, environment=MyEnv())
result = orch.run_task(purpose="Solve the problem", max_steps=10)
print(result.summary())
```
## SLM Model Registry
Pre-configured models optimized for agent tasks:
```python
from purpose_agent import create_slm_backend
backend = create_slm_backend("phi-4-mini") # 3.8B — best tool-use accuracy
backend = create_slm_backend("qwen3-1.7b") # 1.7B — best balance
backend = create_slm_backend("qwen3-0.6b") # 0.6B — ultra-light
backend = create_slm_backend("llama-3.2-1b") # 1B — 128K context
backend = create_slm_backend("smollm2-1.7b") # 1.7B — HF native
```
## Multi-Agent with Shared Learning
Agents learn from each other — when one agent solves a problem, all benefit:
```python
from purpose_agent import AgentSpec, AgentTeam, OllamaBackend
researcher = AgentSpec(
name="researcher", role="Find information",
model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM
expertise_keywords=["search", "find", "research"],
)
coder = AgentSpec(
name="coder", role="Write and debug code",
model=OllamaBackend(model="phi4-mini"), # Better SLM for code
expertise_keywords=["code", "program", "debug"],
)
team = AgentTeam(
agents=[researcher, coder],
default_model=OllamaBackend(model="qwen3:1.7b"),
environment=my_env,
)
# Auto-delegates to the best agent
result = team.run_task(purpose="Search for Python sorting algorithms")
print(team.get_learning_report()) # See shared knowledge
```
## Human-in-the-Loop
Humans can override Φ scores → the agent permanently learns preferences:
```python
from purpose_agent import HITLOrchestrator, CLIInputHandler
hitl = HITLOrchestrator(
orchestrator=orch,
input_handler=CLIInputHandler(),
approve_actions=True, # Approve each action
review_scores=True, # Override Φ scores
checkpoint_dir="./checkpoints",
)
result = hitl.run_task(purpose="Important task")
# Inject knowledge directly
hitl.inject_heuristic(
pattern="When facing {problem_type}",
strategy="Always try the simplest approach first",
)
```
## Streaming
Real-time event streaming for UIs:
```python
import asyncio
from purpose_agent import AsyncOrchestrator
async def main():
async_orch = AsyncOrchestrator(orch)
async for event in async_orch.run_task_stream(purpose="..."):
if event.event_type == "action":
print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
elif event.event_type == "score":
print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")
asyncio.run(main())
```
## Tool Framework
```python
from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool
# Create tool from any function
@FunctionTool.from_function
def search(query: str) -> str:
"""Search the web for information."""
return requests.get(f"https://api.search.com?q={query}").text
# Tool RAG for SLMs (only load relevant tools into prompt)
registry = ToolRegistry()
registry.register(CalculatorTool())
registry.register(PythonExecTool())
registry.register(search)
relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
# → [CalculatorTool, PythonExecTool] (search excluded — saves tokens)
```
## Cost Tracking
```python
from purpose_agent import CostTracker
tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
tracker.record(prompt_tokens=500, completion_tokens=200)
print(tracker.summary())
# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
```
## Benchmark & Prove Self-Improvement
```python
from purpose_agent import BenchmarkRunner, BenchmarkTask
runner = BenchmarkRunner(orchestrator=orch)
tasks = [
BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
]
result = runner.run(tasks, iterations=10, name="MazeTest")
print(result.summary())
# Iteration Success Rate Avg Φ Avg Steps Avg Reward
# -----------------------------------------------------------------
# 1 40.0% 4.20 12.0 3.20
# 5 70.0% 6.80 8.0 6.50
# 10 90.0% 8.50 6.0 8.90
# Improvement: 40.0% → 90.0% (+50.0%)
result.save("results/benchmark.json")
```
## Literature Foundation
| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |
## Installation
```bash
# Core (no dependencies beyond stdlib)
git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent
# For local SLMs
pip install ollama
# For cloud LLMs
pip install huggingface_hub # or: pip install openai
# Run demo (no API keys needed)
python demo.py
```
## License
MIT