--- library_name: purpose-agent license: mit language: - en tags: - reinforcement-learning - agents - self-improving - experience-replay - llm-as-judge - state-value-evaluation - memory-augmented - react - orchestration - modular - slm - small-language-models - multi-agent - human-in-the-loop - streaming - tools - evaluation - ollama - local-models pipeline_tag: text-generation --- # Purpose Agent v0.2.0 **The world's first SLM-native self-improving agentic framework.** Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed. ## What Makes This Different | Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents | |---|:---:|:---:|:---:|:---:|:---:|:---:| | **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ | | **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ | | **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ | | **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ | | **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ | | **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ | | **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ | | **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ | | **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ | | **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ | | **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ | | **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ | ## Architecture ``` purpose_agent/ ├── types.py # Core data types ├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock) ├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression) ├── actor.py # ReAct agent with 3-tier memory ├── purpose_function.py # Non-hackable Φ(s) critic ├── experience_replay.py # Two-phase retrieval (similarity + Q-value) ├── optimizer.py # Trajectory → heuristic distillation ├── orchestrator.py # Main loop ├── streaming.py # 🆕 Async engine + event streaming ├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG ├── observability.py # 🆕 Cost tracking, callbacks, metrics ├── multi_agent.py # 🆕 Agent teams with shared learning ├── hitl.py # 🆕 Human-in-the-loop + checkpointing └── evaluation.py # 🆕 Benchmark runner + improvement curves ``` ## Quick Start — Local SLM (Zero Cost) ```bash # 1. Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # 2. Pull a small model (1.7B params, runs on any laptop) ollama pull qwen3:1.7b # 3. Run your agent python my_agent.py ``` ```python from purpose_agent import ( Orchestrator, OllamaBackend, State, Environment, Action, CalculatorTool, ToolRegistry, ) # SLM backend — runs locally, zero cost llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params # Or use a cloud LLM # from purpose_agent import HFInferenceBackend # llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras") class MyEnv(Environment): def execute(self, action, state): return State(data={"result": "done"}) orch = Orchestrator(llm=llm, environment=MyEnv()) result = orch.run_task(purpose="Solve the problem", max_steps=10) print(result.summary()) ``` ## SLM Model Registry Pre-configured models optimized for agent tasks: ```python from purpose_agent import create_slm_backend backend = create_slm_backend("phi-4-mini") # 3.8B — best tool-use accuracy backend = create_slm_backend("qwen3-1.7b") # 1.7B — best balance backend = create_slm_backend("qwen3-0.6b") # 0.6B — ultra-light backend = create_slm_backend("llama-3.2-1b") # 1B — 128K context backend = create_slm_backend("smollm2-1.7b") # 1.7B — HF native ``` ## Multi-Agent with Shared Learning Agents learn from each other — when one agent solves a problem, all benefit: ```python from purpose_agent import AgentSpec, AgentTeam, OllamaBackend researcher = AgentSpec( name="researcher", role="Find information", model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM expertise_keywords=["search", "find", "research"], ) coder = AgentSpec( name="coder", role="Write and debug code", model=OllamaBackend(model="phi4-mini"), # Better SLM for code expertise_keywords=["code", "program", "debug"], ) team = AgentTeam( agents=[researcher, coder], default_model=OllamaBackend(model="qwen3:1.7b"), environment=my_env, ) # Auto-delegates to the best agent result = team.run_task(purpose="Search for Python sorting algorithms") print(team.get_learning_report()) # See shared knowledge ``` ## Human-in-the-Loop Humans can override Φ scores → the agent permanently learns preferences: ```python from purpose_agent import HITLOrchestrator, CLIInputHandler hitl = HITLOrchestrator( orchestrator=orch, input_handler=CLIInputHandler(), approve_actions=True, # Approve each action review_scores=True, # Override Φ scores checkpoint_dir="./checkpoints", ) result = hitl.run_task(purpose="Important task") # Inject knowledge directly hitl.inject_heuristic( pattern="When facing {problem_type}", strategy="Always try the simplest approach first", ) ``` ## Streaming Real-time event streaming for UIs: ```python import asyncio from purpose_agent import AsyncOrchestrator async def main(): async_orch = AsyncOrchestrator(orch) async for event in async_orch.run_task_stream(purpose="..."): if event.event_type == "action": print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}") elif event.event_type == "score": print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}") asyncio.run(main()) ``` ## Tool Framework ```python from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool # Create tool from any function @FunctionTool.from_function def search(query: str) -> str: """Search the web for information.""" return requests.get(f"https://api.search.com?q={query}").text # Tool RAG for SLMs (only load relevant tools into prompt) registry = ToolRegistry() registry.register(CalculatorTool()) registry.register(PythonExecTool()) registry.register(search) relevant = registry.get_relevant_tools("compute 2+2", top_k=2) # → [CalculatorTool, PythonExecTool] (search excluded — saves tokens) ``` ## Cost Tracking ```python from purpose_agent import CostTracker tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005) tracker.record(prompt_tokens=500, completion_tokens=200) print(tracker.summary()) # → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005} ``` ## Benchmark & Prove Self-Improvement ```python from purpose_agent import BenchmarkRunner, BenchmarkTask runner = BenchmarkRunner(orchestrator=orch) tasks = [ BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...), BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...), ] result = runner.run(tasks, iterations=10, name="MazeTest") print(result.summary()) # Iteration Success Rate Avg Φ Avg Steps Avg Reward # ----------------------------------------------------------------- # 1 40.0% 4.20 12.0 3.20 # 5 70.0% 6.80 8.0 6.50 # 10 90.0% 8.50 6.0 8.90 # Improvement: 40.0% → 90.0% (+50.0%) result.save("results/benchmark.json") ``` ## Literature Foundation | Paper | What it contributes | |-------|-------------------| | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) | | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) | | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay | | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement | | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking | | [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation | | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval | | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns | ## Installation ```bash # Core (no dependencies beyond stdlib) git clone https://huggingface.co/Rohan03/purpose-agent cd purpose-agent # For local SLMs pip install ollama # For cloud LLMs pip install huggingface_hub # or: pip install openai # Run demo (no API keys needed) python demo.py ``` ## License MIT