| --- |
| library_name: purpose-agent |
| license: mit |
| language: |
| - en |
| tags: |
| - reinforcement-learning |
| - agents |
| - self-improving |
| - experience-replay |
| - llm-as-judge |
| - state-value-evaluation |
| - memory-augmented |
| - react |
| - orchestration |
| - modular |
| - slm |
| - small-language-models |
| - multi-agent |
| - human-in-the-loop |
| - streaming |
| - tools |
| - evaluation |
| - ollama |
| - local-models |
| pipeline_tag: text-generation |
| --- |
| |
| # Purpose Agent v0.2.0 |
|
|
| **The world's first SLM-native self-improving agentic framework.** |
|
|
| Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed. |
|
|
| ## What Makes This Different |
|
|
| | Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents | |
| |---|:---:|:---:|:---:|:---:|:---:|:---:| |
| | **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ | |
| | **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ | |
| | **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ | |
| | **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ | |
| | **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ | |
| | **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ | |
| | **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ | |
| | **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ | |
| | **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ | |
| | **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ | |
| | **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ | |
|
|
| ## Architecture |
|
|
| ``` |
| purpose_agent/ |
| ├── types.py # Core data types |
| ├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock) |
| ├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression) |
| ├── actor.py # ReAct agent with 3-tier memory |
| ├── purpose_function.py # Non-hackable Φ(s) critic |
| ├── experience_replay.py # Two-phase retrieval (similarity + Q-value) |
| ├── optimizer.py # Trajectory → heuristic distillation |
| ├── orchestrator.py # Main loop |
| ├── streaming.py # 🆕 Async engine + event streaming |
| ├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG |
| ├── observability.py # 🆕 Cost tracking, callbacks, metrics |
| ├── multi_agent.py # 🆕 Agent teams with shared learning |
| ├── hitl.py # 🆕 Human-in-the-loop + checkpointing |
| └── evaluation.py # 🆕 Benchmark runner + improvement curves |
| ``` |
|
|
| ## Quick Start — Local SLM (Zero Cost) |
|
|
| ```bash |
| # 1. Install Ollama |
| curl -fsSL https://ollama.ai/install.sh | sh |
| |
| # 2. Pull a small model (1.7B params, runs on any laptop) |
| ollama pull qwen3:1.7b |
| |
| # 3. Run your agent |
| python my_agent.py |
| ``` |
|
|
| ```python |
| from purpose_agent import ( |
| Orchestrator, OllamaBackend, State, Environment, Action, |
| CalculatorTool, ToolRegistry, |
| ) |
| |
| # SLM backend — runs locally, zero cost |
| llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params |
| |
| # Or use a cloud LLM |
| # from purpose_agent import HFInferenceBackend |
| # llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras") |
| |
| class MyEnv(Environment): |
| def execute(self, action, state): |
| return State(data={"result": "done"}) |
| |
| orch = Orchestrator(llm=llm, environment=MyEnv()) |
| result = orch.run_task(purpose="Solve the problem", max_steps=10) |
| print(result.summary()) |
| ``` |
|
|
| ## SLM Model Registry |
|
|
| Pre-configured models optimized for agent tasks: |
|
|
| ```python |
| from purpose_agent import create_slm_backend |
| |
| backend = create_slm_backend("phi-4-mini") # 3.8B — best tool-use accuracy |
| backend = create_slm_backend("qwen3-1.7b") # 1.7B — best balance |
| backend = create_slm_backend("qwen3-0.6b") # 0.6B — ultra-light |
| backend = create_slm_backend("llama-3.2-1b") # 1B — 128K context |
| backend = create_slm_backend("smollm2-1.7b") # 1.7B — HF native |
| ``` |
|
|
| ## Multi-Agent with Shared Learning |
|
|
| Agents learn from each other — when one agent solves a problem, all benefit: |
|
|
| ```python |
| from purpose_agent import AgentSpec, AgentTeam, OllamaBackend |
| |
| researcher = AgentSpec( |
| name="researcher", role="Find information", |
| model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM |
| expertise_keywords=["search", "find", "research"], |
| ) |
| coder = AgentSpec( |
| name="coder", role="Write and debug code", |
| model=OllamaBackend(model="phi4-mini"), # Better SLM for code |
| expertise_keywords=["code", "program", "debug"], |
| ) |
| |
| team = AgentTeam( |
| agents=[researcher, coder], |
| default_model=OllamaBackend(model="qwen3:1.7b"), |
| environment=my_env, |
| ) |
| |
| # Auto-delegates to the best agent |
| result = team.run_task(purpose="Search for Python sorting algorithms") |
| print(team.get_learning_report()) # See shared knowledge |
| ``` |
|
|
| ## Human-in-the-Loop |
|
|
| Humans can override Φ scores → the agent permanently learns preferences: |
|
|
| ```python |
| from purpose_agent import HITLOrchestrator, CLIInputHandler |
| |
| hitl = HITLOrchestrator( |
| orchestrator=orch, |
| input_handler=CLIInputHandler(), |
| approve_actions=True, # Approve each action |
| review_scores=True, # Override Φ scores |
| checkpoint_dir="./checkpoints", |
| ) |
| result = hitl.run_task(purpose="Important task") |
| |
| # Inject knowledge directly |
| hitl.inject_heuristic( |
| pattern="When facing {problem_type}", |
| strategy="Always try the simplest approach first", |
| ) |
| ``` |
|
|
| ## Streaming |
|
|
| Real-time event streaming for UIs: |
|
|
| ```python |
| import asyncio |
| from purpose_agent import AsyncOrchestrator |
| |
| async def main(): |
| async_orch = AsyncOrchestrator(orch) |
| async for event in async_orch.run_task_stream(purpose="..."): |
| if event.event_type == "action": |
| print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}") |
| elif event.event_type == "score": |
| print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}") |
| |
| asyncio.run(main()) |
| ``` |
|
|
| ## Tool Framework |
|
|
| ```python |
| from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool |
| |
| # Create tool from any function |
| @FunctionTool.from_function |
| def search(query: str) -> str: |
| """Search the web for information.""" |
| return requests.get(f"https://api.search.com?q={query}").text |
| |
| # Tool RAG for SLMs (only load relevant tools into prompt) |
| registry = ToolRegistry() |
| registry.register(CalculatorTool()) |
| registry.register(PythonExecTool()) |
| registry.register(search) |
| |
| relevant = registry.get_relevant_tools("compute 2+2", top_k=2) |
| # → [CalculatorTool, PythonExecTool] (search excluded — saves tokens) |
| ``` |
|
|
| ## Cost Tracking |
|
|
| ```python |
| from purpose_agent import CostTracker |
| |
| tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005) |
| tracker.record(prompt_tokens=500, completion_tokens=200) |
| print(tracker.summary()) |
| # → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005} |
| ``` |
|
|
| ## Benchmark & Prove Self-Improvement |
|
|
| ```python |
| from purpose_agent import BenchmarkRunner, BenchmarkTask |
| |
| runner = BenchmarkRunner(orchestrator=orch) |
| tasks = [ |
| BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...), |
| BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...), |
| ] |
| |
| result = runner.run(tasks, iterations=10, name="MazeTest") |
| print(result.summary()) |
| # Iteration Success Rate Avg Φ Avg Steps Avg Reward |
| # ----------------------------------------------------------------- |
| # 1 40.0% 4.20 12.0 3.20 |
| # 5 70.0% 6.80 8.0 6.50 |
| # 10 90.0% 8.50 6.0 8.90 |
| # Improvement: 40.0% → 90.0% (+50.0%) |
| |
| result.save("results/benchmark.json") |
| ``` |
|
|
| ## Literature Foundation |
|
|
| | Paper | What it contributes | |
| |-------|-------------------| |
| | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) | |
| | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) | |
| | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay | |
| | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement | |
| | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking | |
| | [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation | |
| | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval | |
| | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns | |
|
|
| ## Installation |
|
|
| ```bash |
| # Core (no dependencies beyond stdlib) |
| git clone https://huggingface.co/Rohan03/purpose-agent |
| cd purpose-agent |
| |
| # For local SLMs |
| pip install ollama |
| |
| # For cloud LLMs |
| pip install huggingface_hub # or: pip install openai |
| |
| # Run demo (no API keys needed) |
| python demo.py |
| ``` |
|
|
| ## License |
|
|
| MIT |
|
|