---
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - experience-replay
  - llm-as-judge
  - state-value-evaluation
  - memory-augmented
  - react
  - orchestration
  - modular
  - slm
  - small-language-models
  - multi-agent
  - human-in-the-loop
  - streaming
  - tools
  - evaluation
  - ollama
  - local-models
pipeline_tag: text-generation
---

# Purpose Agent v0.2.0

**The world's first SLM-native self-improving agentic framework.**

Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.

## What Makes This Different

| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
|---|:---:|:---:|:---:|:---:|:---:|:---:|
| **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ |
| **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ |
| **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ |
| **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ |
| **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ |
| **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
| **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ |
| **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ |
| **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ |
| **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ |

## Architecture

```
purpose_agent/
├── types.py              # Core data types
├── llm_backend.py        # Cloud LLM backends (HF, OpenAI, Mock)
├── slm_backends.py       # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
├── actor.py              # ReAct agent with 3-tier memory
├── purpose_function.py   # Non-hackable Φ(s) critic
├── experience_replay.py  # Two-phase retrieval (similarity + Q-value)
├── optimizer.py          # Trajectory → heuristic distillation
├── orchestrator.py       # Main loop
├── streaming.py          # 🆕 Async engine + event streaming
├── tools.py              # 🆕 Tool framework + built-in tools + Tool RAG
├── observability.py      # 🆕 Cost tracking, callbacks, metrics
├── multi_agent.py        # 🆕 Agent teams with shared learning
├── hitl.py               # 🆕 Human-in-the-loop + checkpointing
└── evaluation.py         # 🆕 Benchmark runner + improvement curves
```

## Quick Start — Local SLM (Zero Cost)

```bash
# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# 2. Pull a small model (1.7B params, runs on any laptop)
ollama pull qwen3:1.7b

# 3. Run your agent
python my_agent.py
```

```python
from purpose_agent import (
    Orchestrator, OllamaBackend, State, Environment, Action,
    CalculatorTool, ToolRegistry,
)

# SLM backend — runs locally, zero cost
llm = OllamaBackend(model="qwen3:1.7b")   # 1.7B params

# Or use a cloud LLM
# from purpose_agent import HFInferenceBackend
# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")

class MyEnv(Environment):
    def execute(self, action, state):
        return State(data={"result": "done"})

orch = Orchestrator(llm=llm, environment=MyEnv())
result = orch.run_task(purpose="Solve the problem", max_steps=10)
print(result.summary())
```

## SLM Model Registry

Pre-configured models optimized for agent tasks:

```python
from purpose_agent import create_slm_backend

backend = create_slm_backend("phi-4-mini")    # 3.8B — best tool-use accuracy
backend = create_slm_backend("qwen3-1.7b")    # 1.7B — best balance
backend = create_slm_backend("qwen3-0.6b")    # 0.6B — ultra-light
backend = create_slm_backend("llama-3.2-1b")  # 1B — 128K context
backend = create_slm_backend("smollm2-1.7b")  # 1.7B — HF native
```

## Multi-Agent with Shared Learning

Agents learn from each other — when one agent solves a problem, all benefit:

```python
from purpose_agent import AgentSpec, AgentTeam, OllamaBackend

researcher = AgentSpec(
    name="researcher", role="Find information",
    model=OllamaBackend(model="qwen3:1.7b"),     # Cheap SLM
    expertise_keywords=["search", "find", "research"],
)
coder = AgentSpec(
    name="coder", role="Write and debug code",
    model=OllamaBackend(model="phi4-mini"),       # Better SLM for code
    expertise_keywords=["code", "program", "debug"],
)

team = AgentTeam(
    agents=[researcher, coder],
    default_model=OllamaBackend(model="qwen3:1.7b"),
    environment=my_env,
)

# Auto-delegates to the best agent
result = team.run_task(purpose="Search for Python sorting algorithms")
print(team.get_learning_report())  # See shared knowledge
```

## Human-in-the-Loop

Humans can override Φ scores → the agent permanently learns preferences:

```python
from purpose_agent import HITLOrchestrator, CLIInputHandler

hitl = HITLOrchestrator(
    orchestrator=orch,
    input_handler=CLIInputHandler(),
    approve_actions=True,      # Approve each action
    review_scores=True,        # Override Φ scores
    checkpoint_dir="./checkpoints",
)
result = hitl.run_task(purpose="Important task")

# Inject knowledge directly
hitl.inject_heuristic(
    pattern="When facing {problem_type}",
    strategy="Always try the simplest approach first",
)
```

## Streaming

Real-time event streaming for UIs:

```python
import asyncio
from purpose_agent import AsyncOrchestrator

async def main():
    async_orch = AsyncOrchestrator(orch)
    async for event in async_orch.run_task_stream(purpose="..."):
        if event.event_type == "action":
            print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
        elif event.event_type == "score":
            print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")

asyncio.run(main())
```

## Tool Framework

```python
from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool

# Create tool from any function
@FunctionTool.from_function
def search(query: str) -> str:
    """Search the web for information."""
    return requests.get(f"https://api.search.com?q={query}").text

# Tool RAG for SLMs (only load relevant tools into prompt)
registry = ToolRegistry()
registry.register(CalculatorTool())
registry.register(PythonExecTool())
registry.register(search)

relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
# → [CalculatorTool, PythonExecTool]  (search excluded — saves tokens)
```

## Cost Tracking

```python
from purpose_agent import CostTracker

tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
tracker.record(prompt_tokens=500, completion_tokens=200)
print(tracker.summary())
# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
```

## Benchmark & Prove Self-Improvement

```python
from purpose_agent import BenchmarkRunner, BenchmarkTask

runner = BenchmarkRunner(orchestrator=orch)
tasks = [
    BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
    BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
]

result = runner.run(tasks, iterations=10, name="MazeTest")
print(result.summary())
# Iteration    Success Rate      Avg Φ    Avg Steps   Avg Reward
# -----------------------------------------------------------------
#          1          40.0%       4.20          12.0         3.20
#          5          70.0%       6.80           8.0         6.50
#         10          90.0%       8.50           6.0         8.90
# Improvement: 40.0% → 90.0% (+50.0%)

result.save("results/benchmark.json")
```

## Literature Foundation

| Paper | What it contributes |
|-------|-------------------|
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) |
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) |
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |

## Installation

```bash
# Core (no dependencies beyond stdlib)
git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent

# For local SLMs
pip install ollama

# For cloud LLMs
pip install huggingface_hub  # or: pip install openai

# Run demo (no API keys needed)
python demo.py
```

## License

MIT