v0.2.0: Complete README with SLM-native, multi-agent, HITL, eval, streaming, tools, observability
Browse files
README.md
CHANGED
|
@@ -14,166 +14,265 @@ tags:
|
|
| 14 |
- react
|
| 15 |
- orchestration
|
| 16 |
- modular
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# Purpose Agent
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Architecture
|
| 31 |
|
| 32 |
```
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
β β ββββββββββββββββββ β β
|
| 49 |
-
β β ββββββββββββββββββ β β
|
| 50 |
-
β ββββββββββββββββββ EXPERIENCE ββββββββββββ β
|
| 51 |
-
β β REPLAY BUFFER β β
|
| 52 |
-
β ββββββββββββββββββ β
|
| 53 |
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 54 |
```
|
| 55 |
|
| 56 |
-
##
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
| **Purpose Function** | `purpose_function.py` | Strict, non-hackable LLM critic that scores Ξ¦(s) transitions |
|
| 62 |
-
| **Experience Replay** | `experience_replay.py` | Trajectory storage with two-phase retrieval (similarity + Q-value) |
|
| 63 |
-
| **Optimizer** | `optimizer.py` | Distills winning trajectories into reusable heuristics |
|
| 64 |
-
| **Orchestrator** | `orchestrator.py` | Main loop tying everything together |
|
| 65 |
-
| **LLM Backend** | `llm_backend.py` | Swappable inference layer (HF, OpenAI, Ollama, custom) |
|
| 66 |
-
| **Types** | `types.py` | Shared data structures (State, Action, Trajectory, Heuristic, etc.) |
|
| 67 |
-
|
| 68 |
-
## Literature Foundation
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy (strategic/procedural/tool) |
|
| 73 |
-
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) pattern |
|
| 74 |
-
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay with Bellman updates |
|
| 75 |
-
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement via episodic self-reflection |
|
| 76 |
-
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking via adversarial critic patterns |
|
| 77 |
-
| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation (Dynamics + Skills) |
|
| 78 |
-
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval (semantic recall β Q-value re-rank) |
|
| 79 |
-
| [Voyager](https://arxiv.org/abs/2305.16291) | Skill library as long-term memory |
|
| 80 |
|
| 81 |
-
#
|
|
|
|
|
|
|
| 82 |
|
| 83 |
```python
|
| 84 |
-
from purpose_agent import
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
# 1. Define your environment
|
| 89 |
class MyEnv(Environment):
|
| 90 |
-
def execute(self, action,
|
| 91 |
-
|
| 92 |
-
return State(data={...})
|
| 93 |
-
|
| 94 |
-
# 2. Create orchestrator with any LLM backend
|
| 95 |
-
orch = Orchestrator(
|
| 96 |
-
llm=HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras"),
|
| 97 |
-
environment=MyEnv(),
|
| 98 |
-
available_actions={"search": "Search for items", "navigate": "Go somewhere"},
|
| 99 |
-
persistence_dir="./agent_memory",
|
| 100 |
-
)
|
| 101 |
|
| 102 |
-
|
| 103 |
-
result = orch.run_task(purpose="
|
| 104 |
print(result.summary())
|
| 105 |
-
print(orch.get_heuristic_report()) # See what it learned
|
| 106 |
```
|
| 107 |
|
| 108 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
```python
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
#
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
base_url="http://localhost:11434/v1",
|
| 123 |
-
api_key="ollama",
|
| 124 |
)
|
| 125 |
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
critic_llm=strong_model, # Purpose Function β needs accuracy
|
| 130 |
-
optimizer_llm=cheap_fast_model, # Runs infrequently
|
| 131 |
environment=my_env,
|
| 132 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
```
|
| 134 |
|
| 135 |
-
##
|
| 136 |
|
| 137 |
-
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
|
|
|
| 152 |
|
| 153 |
-
##
|
| 154 |
|
| 155 |
-
|
| 156 |
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
-
##
|
| 164 |
|
| 165 |
-
```
|
| 166 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
```
|
| 168 |
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
|
| 178 |
## License
|
| 179 |
|
|
|
|
| 14 |
- react
|
| 15 |
- orchestration
|
| 16 |
- modular
|
| 17 |
+
- slm
|
| 18 |
+
- small-language-models
|
| 19 |
+
- multi-agent
|
| 20 |
+
- human-in-the-loop
|
| 21 |
+
- streaming
|
| 22 |
+
- tools
|
| 23 |
+
- evaluation
|
| 24 |
+
- ollama
|
| 25 |
+
- local-models
|
| 26 |
pipeline_tag: text-generation
|
| 27 |
---
|
| 28 |
|
| 29 |
+
# Purpose Agent v0.2.0
|
| 30 |
|
| 31 |
+
**The world's first SLM-native self-improving agentic framework.**
|
| 32 |
|
| 33 |
+
Works with both **Small Language Models** (0.6Bβ3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Ξ¦(s) β no fine-tuning needed.
|
| 34 |
|
| 35 |
+
## What Makes This Different
|
| 36 |
|
| 37 |
+
| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
|
| 38 |
+
|---|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 39 |
+
| **Self-Improvement** | β
Ξ¦(s) + experience replay + heuristic distillation | β | β | β | β | β |
|
| 40 |
+
| **SLM-Native** | β
Grammar-constrained JSON, prompt compression, Tool RAG | β | β | β | β | β οΈ |
|
| 41 |
+
| **Anti-Reward-Hacking** | β
7 strict rules + cache consistency + anomaly detection | β | β | β | β | β |
|
| 42 |
+
| **3-Tier Memory** | β
Strategic/Procedural/Tool with Q-value retrieval | β | β οΈ | β οΈ | β | β |
|
| 43 |
+
| **Multi-Agent with Shared Learning** | β
Agents learn from each other | β | β οΈ | β
| β
| β οΈ |
|
| 44 |
+
| **Human Ξ¦ Override** | β
Humans teach the critic β permanent learning | β | β οΈ | β | β | β |
|
| 45 |
+
| **Streaming** | β
Event + token streaming | β
| β
| β οΈ | β οΈ | β
|
|
| 46 |
+
| **Tool Framework** | β
Schema, validation, retry, Tool RAG | β
| β
| β
| β
| β
|
|
| 47 |
+
| **Cost Tracking** | β
Per-call token + USD tracking | β οΈ | β οΈ | β | β | β |
|
| 48 |
+
| **Benchmark Harness** | β
Improvement curve tracking | β | β | β | β | β |
|
| 49 |
+
| **Lightweight** | β
~150KB, stdlib only | β | β | β οΈ | β οΈ | β
|
|
| 50 |
+
| **Literature-Grounded** | β
8 papers implemented | β | β | β | β | β |
|
| 51 |
|
| 52 |
## Architecture
|
| 53 |
|
| 54 |
```
|
| 55 |
+
purpose_agent/
|
| 56 |
+
βββ types.py # Core data types
|
| 57 |
+
βββ llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
|
| 58 |
+
βββ slm_backends.py # π SLM backends (Ollama, llama-cpp, prompt compression)
|
| 59 |
+
βββ actor.py # ReAct agent with 3-tier memory
|
| 60 |
+
βββ purpose_function.py # Non-hackable Ξ¦(s) critic
|
| 61 |
+
βββ experience_replay.py # Two-phase retrieval (similarity + Q-value)
|
| 62 |
+
βββ optimizer.py # Trajectory β heuristic distillation
|
| 63 |
+
βββ orchestrator.py # Main loop
|
| 64 |
+
βββ streaming.py # π Async engine + event streaming
|
| 65 |
+
βββ tools.py # π Tool framework + built-in tools + Tool RAG
|
| 66 |
+
βββ observability.py # π Cost tracking, callbacks, metrics
|
| 67 |
+
βββ multi_agent.py # π Agent teams with shared learning
|
| 68 |
+
βββ hitl.py # π Human-in-the-loop + checkpointing
|
| 69 |
+
βββ evaluation.py # π Benchmark runner + improvement curves
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
```
|
| 71 |
|
| 72 |
+
## Quick Start β Local SLM (Zero Cost)
|
| 73 |
|
| 74 |
+
```bash
|
| 75 |
+
# 1. Install Ollama
|
| 76 |
+
curl -fsSL https://ollama.ai/install.sh | sh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
# 2. Pull a small model (1.7B params, runs on any laptop)
|
| 79 |
+
ollama pull qwen3:1.7b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
# 3. Run your agent
|
| 82 |
+
python my_agent.py
|
| 83 |
+
```
|
| 84 |
|
| 85 |
```python
|
| 86 |
+
from purpose_agent import (
|
| 87 |
+
Orchestrator, OllamaBackend, State, Environment, Action,
|
| 88 |
+
CalculatorTool, ToolRegistry,
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
# SLM backend β runs locally, zero cost
|
| 92 |
+
llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params
|
| 93 |
+
|
| 94 |
+
# Or use a cloud LLM
|
| 95 |
+
# from purpose_agent import HFInferenceBackend
|
| 96 |
+
# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")
|
| 97 |
|
|
|
|
| 98 |
class MyEnv(Environment):
|
| 99 |
+
def execute(self, action, state):
|
| 100 |
+
return State(data={"result": "done"})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
orch = Orchestrator(llm=llm, environment=MyEnv())
|
| 103 |
+
result = orch.run_task(purpose="Solve the problem", max_steps=10)
|
| 104 |
print(result.summary())
|
|
|
|
| 105 |
```
|
| 106 |
|
| 107 |
+
## SLM Model Registry
|
| 108 |
+
|
| 109 |
+
Pre-configured models optimized for agent tasks:
|
| 110 |
+
|
| 111 |
+
```python
|
| 112 |
+
from purpose_agent import create_slm_backend
|
| 113 |
+
|
| 114 |
+
backend = create_slm_backend("phi-4-mini") # 3.8B β best tool-use accuracy
|
| 115 |
+
backend = create_slm_backend("qwen3-1.7b") # 1.7B β best balance
|
| 116 |
+
backend = create_slm_backend("qwen3-0.6b") # 0.6B β ultra-light
|
| 117 |
+
backend = create_slm_backend("llama-3.2-1b") # 1B β 128K context
|
| 118 |
+
backend = create_slm_backend("smollm2-1.7b") # 1.7B β HF native
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## Multi-Agent with Shared Learning
|
| 122 |
+
|
| 123 |
+
Agents learn from each other β when one agent solves a problem, all benefit:
|
| 124 |
|
| 125 |
```python
|
| 126 |
+
from purpose_agent import AgentSpec, AgentTeam, OllamaBackend
|
| 127 |
+
|
| 128 |
+
researcher = AgentSpec(
|
| 129 |
+
name="researcher", role="Find information",
|
| 130 |
+
model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM
|
| 131 |
+
expertise_keywords=["search", "find", "research"],
|
| 132 |
+
)
|
| 133 |
+
coder = AgentSpec(
|
| 134 |
+
name="coder", role="Write and debug code",
|
| 135 |
+
model=OllamaBackend(model="phi4-mini"), # Better SLM for code
|
| 136 |
+
expertise_keywords=["code", "program", "debug"],
|
|
|
|
|
|
|
| 137 |
)
|
| 138 |
|
| 139 |
+
team = AgentTeam(
|
| 140 |
+
agents=[researcher, coder],
|
| 141 |
+
default_model=OllamaBackend(model="qwen3:1.7b"),
|
|
|
|
|
|
|
| 142 |
environment=my_env,
|
| 143 |
)
|
| 144 |
+
|
| 145 |
+
# Auto-delegates to the best agent
|
| 146 |
+
result = team.run_task(purpose="Search for Python sorting algorithms")
|
| 147 |
+
print(team.get_learning_report()) # See shared knowledge
|
| 148 |
```
|
| 149 |
|
| 150 |
+
## Human-in-the-Loop
|
| 151 |
|
| 152 |
+
Humans can override Ξ¦ scores β the agent permanently learns preferences:
|
| 153 |
|
| 154 |
+
```python
|
| 155 |
+
from purpose_agent import HITLOrchestrator, CLIInputHandler
|
| 156 |
+
|
| 157 |
+
hitl = HITLOrchestrator(
|
| 158 |
+
orchestrator=orch,
|
| 159 |
+
input_handler=CLIInputHandler(),
|
| 160 |
+
approve_actions=True, # Approve each action
|
| 161 |
+
review_scores=True, # Override Ξ¦ scores
|
| 162 |
+
checkpoint_dir="./checkpoints",
|
| 163 |
+
)
|
| 164 |
+
result = hitl.run_task(purpose="Important task")
|
| 165 |
|
| 166 |
+
# Inject knowledge directly
|
| 167 |
+
hitl.inject_heuristic(
|
| 168 |
+
pattern="When facing {problem_type}",
|
| 169 |
+
strategy="Always try the simplest approach first",
|
| 170 |
+
)
|
| 171 |
+
```
|
| 172 |
|
| 173 |
+
## Streaming
|
| 174 |
|
| 175 |
+
Real-time event streaming for UIs:
|
| 176 |
|
| 177 |
+
```python
|
| 178 |
+
import asyncio
|
| 179 |
+
from purpose_agent import AsyncOrchestrator
|
| 180 |
+
|
| 181 |
+
async def main():
|
| 182 |
+
async_orch = AsyncOrchestrator(orch)
|
| 183 |
+
async for event in async_orch.run_task_stream(purpose="..."):
|
| 184 |
+
if event.event_type == "action":
|
| 185 |
+
print(f"π€ {event.data['name']}: {event.data['thought'][:100]}")
|
| 186 |
+
elif event.event_type == "score":
|
| 187 |
+
print(f"π Ξ¦: {event.data['phi_before']:.1f} β {event.data['phi_after']:.1f}")
|
| 188 |
+
|
| 189 |
+
asyncio.run(main())
|
| 190 |
+
```
|
| 191 |
|
| 192 |
+
## Tool Framework
|
| 193 |
|
| 194 |
+
```python
|
| 195 |
+
from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool
|
| 196 |
+
|
| 197 |
+
# Create tool from any function
|
| 198 |
+
@FunctionTool.from_function
|
| 199 |
+
def search(query: str) -> str:
|
| 200 |
+
"""Search the web for information."""
|
| 201 |
+
return requests.get(f"https://api.search.com?q={query}").text
|
| 202 |
+
|
| 203 |
+
# Tool RAG for SLMs (only load relevant tools into prompt)
|
| 204 |
+
registry = ToolRegistry()
|
| 205 |
+
registry.register(CalculatorTool())
|
| 206 |
+
registry.register(PythonExecTool())
|
| 207 |
+
registry.register(search)
|
| 208 |
+
|
| 209 |
+
relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
|
| 210 |
+
# β [CalculatorTool, PythonExecTool] (search excluded β saves tokens)
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
## Cost Tracking
|
| 214 |
+
|
| 215 |
+
```python
|
| 216 |
+
from purpose_agent import CostTracker
|
| 217 |
+
|
| 218 |
+
tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
|
| 219 |
+
tracker.record(prompt_tokens=500, completion_tokens=200)
|
| 220 |
+
print(tracker.summary())
|
| 221 |
+
# β {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
|
| 222 |
```
|
| 223 |
|
| 224 |
+
## Benchmark & Prove Self-Improvement
|
| 225 |
+
|
| 226 |
+
```python
|
| 227 |
+
from purpose_agent import BenchmarkRunner, BenchmarkTask
|
| 228 |
|
| 229 |
+
runner = BenchmarkRunner(orchestrator=orch)
|
| 230 |
+
tasks = [
|
| 231 |
+
BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
|
| 232 |
+
BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
|
| 233 |
+
]
|
| 234 |
|
| 235 |
+
result = runner.run(tasks, iterations=10, name="MazeTest")
|
| 236 |
+
print(result.summary())
|
| 237 |
+
# Iteration Success Rate Avg Ξ¦ Avg Steps Avg Reward
|
| 238 |
+
# -----------------------------------------------------------------
|
| 239 |
+
# 1 40.0% 4.20 12.0 3.20
|
| 240 |
+
# 5 70.0% 6.80 8.0 6.50
|
| 241 |
+
# 10 90.0% 8.50 6.0 8.90
|
| 242 |
+
# Improvement: 40.0% β 90.0% (+50.0%)
|
| 243 |
+
|
| 244 |
+
result.save("results/benchmark.json")
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
## Literature Foundation
|
| 248 |
+
|
| 249 |
+
| Paper | What it contributes |
|
| 250 |
+
|-------|-------------------|
|
| 251 |
+
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) |
|
| 252 |
+
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) |
|
| 253 |
+
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
|
| 254 |
+
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
|
| 255 |
+
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
|
| 256 |
+
| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
|
| 257 |
+
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
|
| 258 |
+
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |
|
| 259 |
+
|
| 260 |
+
## Installation
|
| 261 |
+
|
| 262 |
+
```bash
|
| 263 |
+
# Core (no dependencies beyond stdlib)
|
| 264 |
+
git clone https://huggingface.co/Rohan03/purpose-agent
|
| 265 |
+
cd purpose-agent
|
| 266 |
+
|
| 267 |
+
# For local SLMs
|
| 268 |
+
pip install ollama
|
| 269 |
+
|
| 270 |
+
# For cloud LLMs
|
| 271 |
+
pip install huggingface_hub # or: pip install openai
|
| 272 |
+
|
| 273 |
+
# Run demo (no API keys needed)
|
| 274 |
+
python demo.py
|
| 275 |
+
```
|
| 276 |
|
| 277 |
## License
|
| 278 |
|