Rohan03
/

purpose-agent

@@ -11,269 +11,197 @@ tags:
   - llm-as-judge
   - state-value-evaluation
   - memory-augmented
-  - react
-  - orchestration
-  - modular
   - slm
   - small-language-models
-  - multi-agent
   - human-in-the-loop
   - streaming
   - tools
   - evaluation
   - ollama
   - local-models
 pipeline_tag: text-generation
 ---
-# Purpose Agent v0.2.0
-**The world's first SLM-native self-improving agentic framework.**
-Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.
-## What Makes This Different
-| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
-|---|:---:|:---:|:---:|:---:|:---:|:---:|
-| **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ |
-| **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ |
-| **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ |
-| **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ |
-| **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ |
-| **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ |
-| **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
-| **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ |
-| **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ |
-| **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ |
-| **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ |
-| **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ |
-## Architecture
-```
-purpose_agent/
-├── types.py              # Core data types
-├── llm_backend.py        # Cloud LLM backends (HF, OpenAI, Mock)
-├── slm_backends.py       # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
-├── actor.py              # ReAct agent with 3-tier memory
-├── purpose_function.py   # Non-hackable Φ(s) critic
-├── experience_replay.py  # Two-phase retrieval (similarity + Q-value)
-├── optimizer.py          # Trajectory → heuristic distillation
-├── orchestrator.py       # Main loop
-├── streaming.py          # 🆕 Async engine + event streaming
-├── tools.py              # 🆕 Tool framework + built-in tools + Tool RAG
-├── observability.py      # 🆕 Cost tracking, callbacks, metrics
-├── multi_agent.py        # 🆕 Agent teams with shared learning
-├── hitl.py               # 🆕 Human-in-the-loop + checkpointing
-└── evaluation.py         # 🆕 Benchmark runner + improvement curves
-```
-## Quick Start — Local SLM (Zero Cost)
-```bash
-# 1. Install Ollama
-curl -fsSL https://ollama.ai/install.sh | sh
-# 2. Pull a small model (1.7B params, runs on any laptop)
-ollama pull qwen3:1.7b
-# 3. Run your agent
-python my_agent.py
-```
 ```python
-from purpose_agent import (
-    Orchestrator, OllamaBackend, State, Environment, Action,
-    CalculatorTool, ToolRegistry,
-)
-# SLM backend — runs locally, zero cost
-llm = OllamaBackend(model="qwen3:1.7b")   # 1.7B params
-# Or use a cloud LLM
-# from purpose_agent import HFInferenceBackend
-# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")
-class MyEnv(Environment):
-    def execute(self, action, state):
-        return State(data={"result": "done"})
-orch = Orchestrator(llm=llm, environment=MyEnv())
-result = orch.run_task(purpose="Solve the problem", max_steps=10)
-print(result.summary())
 ```
-## SLM Model Registry
-Pre-configured models optimized for agent tasks:
-```python
-from purpose_agent import create_slm_backend
-backend = create_slm_backend("phi-4-mini")    # 3.8B — best tool-use accuracy
-backend = create_slm_backend("qwen3-1.7b")    # 1.7B — best balance
-backend = create_slm_backend("qwen3-0.6b")    # 0.6B — ultra-light
-backend = create_slm_backend("llama-3.2-1b")  # 1B — 128K context
-backend = create_slm_backend("smollm2-1.7b")  # 1.7B — HF native
 ```
-## Multi-Agent with Shared Learning
-Agents learn from each other — when one agent solves a problem, all benefit:
 ```python
-from purpose_agent import AgentSpec, AgentTeam, OllamaBackend
-researcher = AgentSpec(
-    name="researcher", role="Find information",
-    model=OllamaBackend(model="qwen3:1.7b"),     # Cheap SLM
-    expertise_keywords=["search", "find", "research"],
-)
-coder = AgentSpec(
-    name="coder", role="Write and debug code",
-    model=OllamaBackend(model="phi4-mini"),       # Better SLM for code
-    expertise_keywords=["code", "program", "debug"],
 )
-team = AgentTeam(
-    agents=[researcher, coder],
-    default_model=OllamaBackend(model="qwen3:1.7b"),
-    environment=my_env,
 )
-# Auto-delegates to the best agent
-result = team.run_task(purpose="Search for Python sorting algorithms")
-print(team.get_learning_report())  # See shared knowledge
 ```
-## Human-in-the-Loop
-Humans can override Φ scores → the agent permanently learns preferences:
 ```python
-from purpose_agent import HITLOrchestrator, CLIInputHandler
-hitl = HITLOrchestrator(
-    orchestrator=orch,
-    input_handler=CLIInputHandler(),
-    approve_actions=True,      # Approve each action
-    review_scores=True,        # Override Φ scores
-    checkpoint_dir="./checkpoints",
-)
-result = hitl.run_task(purpose="Important task")
-# Inject knowledge directly
-hitl.inject_heuristic(
-    pattern="When facing {problem_type}",
-    strategy="Always try the simplest approach first",
-)
 ```
-## Streaming
-Real-time event streaming for UIs:
-```python
-import asyncio
-from purpose_agent import AsyncOrchestrator
-async def main():
-    async_orch = AsyncOrchestrator(orch)
-    async for event in async_orch.run_task_stream(purpose="..."):
-        if event.event_type == "action":
-            print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
-        elif event.event_type == "score":
-            print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")
-asyncio.run(main())
 ```
-## Tool Framework
-```python
-from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool
-# Create tool from any function
-@FunctionTool.from_function
-def search(query: str) -> str:
-    """Search the web for information."""
-    return requests.get(f"https://api.search.com?q={query}").text
-# Tool RAG for SLMs (only load relevant tools into prompt)
-registry = ToolRegistry()
-registry.register(CalculatorTool())
-registry.register(PythonExecTool())
-registry.register(search)
-relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
-# → [CalculatorTool, PythonExecTool]  (search excluded — saves tokens)
-```
-## Cost Tracking
-```python
-from purpose_agent import CostTracker
-tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
-tracker.record(prompt_tokens=500, completion_tokens=200)
-print(tracker.summary())
-# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
 ```
-## Benchmark & Prove Self-Improvement
 ```python
-from purpose_agent import BenchmarkRunner, BenchmarkTask
-runner = BenchmarkRunner(orchestrator=orch)
-tasks = [
-    BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
-    BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
-]
-result = runner.run(tasks, iterations=10, name="MazeTest")
-print(result.summary())
-# Iteration    Success Rate      Avg Φ    Avg Steps   Avg Reward
-# -----------------------------------------------------------------
-#          1          40.0%       4.20          12.0         3.20
-#          5          70.0%       6.80           8.0         6.50
-#         10          90.0%       8.50           6.0         8.90
-# Improvement: 40.0% → 90.0% (+50.0%)
-result.save("results/benchmark.json")
 ```
-## Literature Foundation
-| Paper | What it contributes |
-|-------|-------------------|
-| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory (strategic/procedural/tool) |
-| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function V(s) |
-| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
-| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
-| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
-| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
-| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
-| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |
 ## Installation
 ```bash
-# Core (no dependencies beyond stdlib)
 git clone https://huggingface.co/Rohan03/purpose-agent
 cd purpose-agent
-# For local SLMs
 pip install ollama
-# For cloud LLMs
-pip install huggingface_hub  # or: pip install openai
 # Run demo (no API keys needed)
 python demo.py
 ```
 ## License
 MIT

   - llm-as-judge
   - state-value-evaluation
   - memory-augmented
+  - multi-agent
   - slm
   - small-language-models
   - human-in-the-loop
   - streaming
   - tools
   - evaluation
   - ollama
   - local-models
+  - no-code
+  - easy-to-use
 pipeline_tag: text-generation
 ---
+# Purpose Agent
+**Build self-improving AI agent teams with just a purpose.**
+No PhD required. No infrastructure costs. Runs on your laptop.
 ```python
+import purpose_agent as pa
+# One line. That's all you need.
+team = pa.purpose("Help me research and summarize scientific papers")
+# Give it tasks. It gets smarter every time.
+result = team.run("Find recent breakthroughs in quantum computing")
+print(result)
+# Teach it your preferences
+team.teach("Always cite your sources")
+team.teach("Keep summaries under 200 words")
+# Check what it's learned
+print(team.status())
 ```
+## Three Levels of Usage
+**Pick your level. You can always go deeper later.**
+### Level 1 — Beginner (no technical knowledge needed)
+```python
+import purpose_agent as pa
+# Describe what you want. The framework builds the right team.
+team = pa.purpose("Write Python code and test it")
+result = team.run("Create a function that calculates fibonacci numbers")
+print(result)
+# It auto-detects the best team:
+# "Write code"     → architect + coder + tester
+# "Research X"     → researcher + analyst
+# "Write blog"     → writer + editor
+# "Analyze data"   → analyst + reporter
+# "Help me"        → general assistant
 ```
+### Level 2 — Intermediate (customize your team)
 ```python
+import purpose_agent as pa
+# Build a custom team
+team = pa.Team.build(
+    purpose="Customer support assistant",
+    agents=["greeter", "resolver", "escalator"],
+    model="qwen3:1.7b",  # Free local model
 )
+result = team.run("Customer says: I can't log in to my account")
+# Add knowledge from your docs
+team = pa.purpose(
+    "Answer questions about our product",
+    knowledge="./docs/",           # Load all files from a folder
+    model="qwen3:1.7b",
 )
+result = team.ask("What is our refund policy?")
 ```
+### Level 3 — Advanced (full control)
 ```python
+import purpose_agent as pa
+# Graph workflows (like LangGraph)
+graph = pa.Graph()
+graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
+graph.add_node("write", pa.Agent("writer", model="phi4-mini"))
+graph.add_edge(pa.START, "research")
+graph.add_conditional_edge("write", review_fn, {"pass": pa.END, "fail": "research"})
+result = graph.run(initial_state)
+# Parallel execution (like CrewAI)
+results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
+# Agent conversations (like AutoGen)
+chat = pa.Conversation([researcher, coder, reviewer])
+result = chat.run("Design a web scraper", rounds=5)
+# Knowledge-aware agents (like LlamaIndex)
+kb = pa.KnowledgeStore.from_directory("./docs")
+agent = pa.Agent("assistant", tools=[kb.as_tool()])
+# Human-in-the-loop (like LangGraph)
+hitl = pa.HITLOrchestrator(orch, input_handler=pa.CLIInputHandler(),
+                           approve_actions=True, review_scores=True)
 ```
+## What Makes This Different
+**The only framework where agents actually learn from experience.**
+Every other framework (LangChain, CrewAI, AutoGen) runs the same way every time. Purpose Agent gets smarter with each task via the **Φ self-improvement loop**:
+```
+Task 1: Agent struggles, takes 12 steps → Φ evaluates → learns heuristics
+Task 5: Agent uses learned patterns, takes 8 steps → learns more
+Task 10: Agent is efficient, takes 5 steps → keeps refining
 ```
+Plus it absorbs the best of every competing framework:
+| You want... | Others say use... | Purpose Agent gives you... |
+|---|---|---|
+| **Control** (graphs, conditions, loops) | LangGraph | `pa.Graph()` — same power, with self-improvement |
+| **Speed** (parallel execution) | CrewAI | `pa.parallel()` — real threads, not fake async |
+| **Agents talking** | AutoGen | `pa.Conversation()` — with Φ-scored turns |
+| **Plug-and-play** | OpenAI Agents SDK | `pa.purpose()` — even simpler, one function |
+| **Knowledge** (RAG) | LlamaIndex | `pa.KnowledgeStore` — RAG as a tool |
+| **Self-improvement** | Nobody | **Only Purpose Agent** |
+## Runs on Your Laptop (Free, Private)
+```bash
+# Install Ollama (one-time)
+curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull qwen3:1.7b   # 1.7B params, runs on CPU
+# That's it. No API keys. No cloud. No cost.
 ```
+```python
+team = pa.purpose("Research assistant", model="qwen3:1.7b")
+```
+Also works with cloud models:
 ```python
+team = pa.purpose("Research assistant", model="gpt-4o")         # OpenAI
+team = pa.purpose("Research assistant", model="Qwen/Qwen3-32B") # HuggingFace
 ```
+## Interactive CLI
+```bash
+python -m purpose_agent
+```
+Walks you through setup step-by-step. No coding required.
 ## Installation
 ```bash
 git clone https://huggingface.co/Rohan03/purpose-agent
 cd purpose-agent
+# For local models (recommended)
 pip install ollama
 # Run demo (no API keys needed)
 python demo.py
 ```
+## Literature Foundation
+Built on 8 published papers — every design decision has empirical backing.
+See [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) for the full research trace.
+| Paper | What it contributes |
+|-------|-------------------|
+| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
+| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
+| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
+| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
+| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
+| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
+| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
+| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
 ## License
 MIT