metadata
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- experience-replay
- llm-as-judge
- state-value-evaluation
- memory-augmented
- react
- orchestration
- modular
- slm
- small-language-models
- multi-agent
- human-in-the-loop
- streaming
- tools
- evaluation
- ollama
- local-models
pipeline_tag: text-generation
Purpose Agent v0.2.0
The world's first SLM-native self-improving agentic framework.
Works with both Small Language Models (0.6Bβ3B params, local, $0 cost) and Large Language Models (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Ξ¦(s) β no fine-tuning needed.
What Makes This Different
| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
|---|---|---|---|---|---|---|
| Self-Improvement | β Ξ¦(s) + experience replay + heuristic distillation | β | β | β | β | β |
| SLM-Native | β Grammar-constrained JSON, prompt compression, Tool RAG | β | β | β | β | β οΈ |
| Anti-Reward-Hacking | β 7 strict rules + cache consistency + anomaly detection | β | β | β | β | β |
| 3-Tier Memory | β Strategic/Procedural/Tool with Q-value retrieval | β | β οΈ | β οΈ | β | β |
| Multi-Agent with Shared Learning | β Agents learn from each other | β | β οΈ | β | β | β οΈ |
| Human Ξ¦ Override | β Humans teach the critic β permanent learning | β | β οΈ | β | β | β |
| Streaming | β Event + token streaming | β | β | β οΈ | β οΈ | β |
| Tool Framework | β Schema, validation, retry, Tool RAG | β | β | β | β | β |
| Cost Tracking | β Per-call token + USD tracking | β οΈ | β οΈ | β | β | β |
| Benchmark Harness | β Improvement curve tracking | β | β | β | β | β |
| Lightweight | β ~150KB, stdlib only | β | β | β οΈ | β οΈ | β |
| Literature-Grounded | β 8 papers implemented | β | β | β | β | β |
Architecture
purpose_agent/
βββ types.py # Core data types
βββ llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
βββ slm_backends.py # π SLM backends (Ollama, llama-cpp, prompt compression)
βββ actor.py # ReAct agent with 3-tier memory
βββ purpose_function.py # Non-hackable Ξ¦(s) critic
βββ experience_replay.py # Two-phase retrieval (similarity + Q-value)
βββ optimizer.py # Trajectory β heuristic distillation
βββ orchestrator.py # Main loop
βββ streaming.py # π Async engine + event streaming
βββ tools.py # π Tool framework + built-in tools + Tool RAG
βββ observability.py # π Cost tracking, callbacks, metrics
βββ multi_agent.py # π Agent teams with shared learning
βββ hitl.py # π Human-in-the-loop + checkpointing
βββ evaluation.py # π Benchmark runner + improvement curves
Quick Start β Local SLM (Zero Cost)
# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Pull a small model (1.7B params, runs on any laptop)
ollama pull qwen3:1.7b
# 3. Run your agent
python my_agent.py
from purpose_agent import (
Orchestrator, OllamaBackend, State, Environment, Action,
CalculatorTool, ToolRegistry,
)
# SLM backend β runs locally, zero cost
llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params
# Or use a cloud LLM
# from purpose_agent import HFInferenceBackend
# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")
class MyEnv(Environment):
def execute(self, action, state):
return State(data={"result": "done"})
orch = Orchestrator(llm=llm, environment=MyEnv())
result = orch.run_task(purpose="Solve the problem", max_steps=10)
print(result.summary())
SLM Model Registry
Pre-configured models optimized for agent tasks:
from purpose_agent import create_slm_backend
backend = create_slm_backend("phi-4-mini") # 3.8B β best tool-use accuracy
backend = create_slm_backend("qwen3-1.7b") # 1.7B β best balance
backend = create_slm_backend("qwen3-0.6b") # 0.6B β ultra-light
backend = create_slm_backend("llama-3.2-1b") # 1B β 128K context
backend = create_slm_backend("smollm2-1.7b") # 1.7B β HF native
Multi-Agent with Shared Learning
Agents learn from each other β when one agent solves a problem, all benefit:
from purpose_agent import AgentSpec, AgentTeam, OllamaBackend
researcher = AgentSpec(
name="researcher", role="Find information",
model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM
expertise_keywords=["search", "find", "research"],
)
coder = AgentSpec(
name="coder", role="Write and debug code",
model=OllamaBackend(model="phi4-mini"), # Better SLM for code
expertise_keywords=["code", "program", "debug"],
)
team = AgentTeam(
agents=[researcher, coder],
default_model=OllamaBackend(model="qwen3:1.7b"),
environment=my_env,
)
# Auto-delegates to the best agent
result = team.run_task(purpose="Search for Python sorting algorithms")
print(team.get_learning_report()) # See shared knowledge
Human-in-the-Loop
Humans can override Ξ¦ scores β the agent permanently learns preferences:
from purpose_agent import HITLOrchestrator, CLIInputHandler
hitl = HITLOrchestrator(
orchestrator=orch,
input_handler=CLIInputHandler(),
approve_actions=True, # Approve each action
review_scores=True, # Override Ξ¦ scores
checkpoint_dir="./checkpoints",
)
result = hitl.run_task(purpose="Important task")
# Inject knowledge directly
hitl.inject_heuristic(
pattern="When facing {problem_type}",
strategy="Always try the simplest approach first",
)
Streaming
Real-time event streaming for UIs:
import asyncio
from purpose_agent import AsyncOrchestrator
async def main():
async_orch = AsyncOrchestrator(orch)
async for event in async_orch.run_task_stream(purpose="..."):
if event.event_type == "action":
print(f"π€ {event.data['name']}: {event.data['thought'][:100]}")
elif event.event_type == "score":
print(f"π Ξ¦: {event.data['phi_before']:.1f} β {event.data['phi_after']:.1f}")
asyncio.run(main())
Tool Framework
from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool
# Create tool from any function
@FunctionTool.from_function
def search(query: str) -> str:
"""Search the web for information."""
return requests.get(f"https://api.search.com?q={query}").text
# Tool RAG for SLMs (only load relevant tools into prompt)
registry = ToolRegistry()
registry.register(CalculatorTool())
registry.register(PythonExecTool())
registry.register(search)
relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
# β [CalculatorTool, PythonExecTool] (search excluded β saves tokens)
Cost Tracking
from purpose_agent import CostTracker
tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
tracker.record(prompt_tokens=500, completion_tokens=200)
print(tracker.summary())
# β {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
Benchmark & Prove Self-Improvement
from purpose_agent import BenchmarkRunner, BenchmarkTask
runner = BenchmarkRunner(orchestrator=orch)
tasks = [
BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
]
result = runner.run(tasks, iterations=10, name="MazeTest")
print(result.summary())
# Iteration Success Rate Avg Ξ¦ Avg Steps Avg Reward
# -----------------------------------------------------------------
# 1 40.0% 4.20 12.0 3.20
# 5 70.0% 6.80 8.0 6.50
# 10 90.0% 8.50 6.0 8.90
# Improvement: 40.0% β 90.0% (+50.0%)
result.save("results/benchmark.json")
Literature Foundation
| Paper | What it contributes |
|---|---|
| MUSE | 3-tier memory (strategic/procedural/tool) |
| LATS | LLM-as-value-function V(s) |
| REMEMBERER | Q-value experience replay |
| Reflexion | Verbal reinforcement |
| SPC | Anti-reward-hacking |
| CER | Contextual experience distillation |
| MemRL | Two-phase retrieval |
| TinyAgent | SLM-native agent patterns |
Installation
# Core (no dependencies beyond stdlib)
git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent
# For local SLMs
pip install ollama
# For cloud LLMs
pip install huggingface_hub # or: pip install openai
# Run demo (no API keys needed)
python demo.py
License
MIT