v0.2.0: Complete README with SLM-native, multi-agent, HITL, eval, streaming, tools, observability

adb4257 verified 25 days ago

9.27 kB

library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - experience-replay
  - llm-as-judge
  - state-value-evaluation
  - memory-augmented
  - react
  - orchestration
  - modular
  - slm
  - small-language-models
  - multi-agent
  - human-in-the-loop
  - streaming
  - tools
  - evaluation
  - ollama
  - local-models
pipeline_tag: text-generation

Purpose Agent v0.2.0

The world's first SLM-native self-improving agentic framework.

Works with both Small Language Models (0.6B–3B params, local, $0 cost) and Large Language Models (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.

What Makes This Different

Feature	Purpose Agent	LangChain	LangGraph	CrewAI	AutoGen	smolagents
Self-Improvement	✅ Φ(s) + experience replay + heuristic distillation	❌	❌	❌	❌	❌
SLM-Native	✅ Grammar-constrained JSON, prompt compression, Tool RAG	❌	❌	❌	❌	⚠️
Anti-Reward-Hacking	✅ 7 strict rules + cache consistency + anomaly detection	❌	❌	❌	❌	❌
3-Tier Memory	✅ Strategic/Procedural/Tool with Q-value retrieval	❌	⚠️	⚠️	❌	❌
Multi-Agent with Shared Learning	✅ Agents learn from each other	❌	⚠️	✅	✅	⚠️
Human Φ Override	✅ Humans teach the critic → permanent learning	❌	⚠️	❌	❌	❌
Streaming	✅ Event + token streaming	✅	✅	⚠️	⚠️	✅
Tool Framework	✅ Schema, validation, retry, Tool RAG	✅	✅	✅	✅	✅
Cost Tracking	✅ Per-call token + USD tracking	⚠️	⚠️	❌	❌	❌
Benchmark Harness	✅ Improvement curve tracking	❌	❌	❌	❌	❌
Lightweight	✅ ~150KB, stdlib only	❌	❌	⚠️	⚠️	✅
Literature-Grounded	✅ 8 papers implemented	❌	❌	❌	❌	❌

Architecture

purpose_agent/
├── types.py              # Core data types
├── llm_backend.py        # Cloud LLM backends (HF, OpenAI, Mock)
├── slm_backends.py       # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
├── actor.py              # ReAct agent with 3-tier memory
├── purpose_function.py   # Non-hackable Φ(s) critic
├── experience_replay.py  # Two-phase retrieval (similarity + Q-value)
├── optimizer.py          # Trajectory → heuristic distillation
├── orchestrator.py       # Main loop
├── streaming.py          # 🆕 Async engine + event streaming
├── tools.py              # 🆕 Tool framework + built-in tools + Tool RAG
├── observability.py      # 🆕 Cost tracking, callbacks, metrics
├── multi_agent.py        # 🆕 Agent teams with shared learning
├── hitl.py               # 🆕 Human-in-the-loop + checkpointing
└── evaluation.py         # 🆕 Benchmark runner + improvement curves

Quick Start — Local SLM (Zero Cost)

# 1. Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# 2. Pull a small model (1.7B params, runs on any laptop)
ollama pull qwen3:1.7b

# 3. Run your agent
python my_agent.py

from purpose_agent import (
    Orchestrator, OllamaBackend, State, Environment, Action,
    CalculatorTool, ToolRegistry,
)

# SLM backend — runs locally, zero cost
llm = OllamaBackend(model="qwen3:1.7b")   # 1.7B params

# Or use a cloud LLM
# from purpose_agent import HFInferenceBackend
# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")

class MyEnv(Environment):
    def execute(self, action, state):
        return State(data={"result": "done"})

orch = Orchestrator(llm=llm, environment=MyEnv())
result = orch.run_task(purpose="Solve the problem", max_steps=10)
print(result.summary())

SLM Model Registry

Pre-configured models optimized for agent tasks:

from purpose_agent import create_slm_backend

backend = create_slm_backend("phi-4-mini")    # 3.8B — best tool-use accuracy
backend = create_slm_backend("qwen3-1.7b")    # 1.7B — best balance
backend = create_slm_backend("qwen3-0.6b")    # 0.6B — ultra-light
backend = create_slm_backend("llama-3.2-1b")  # 1B — 128K context
backend = create_slm_backend("smollm2-1.7b")  # 1.7B — HF native

Multi-Agent with Shared Learning

Agents learn from each other — when one agent solves a problem, all benefit:

from purpose_agent import AgentSpec, AgentTeam, OllamaBackend

researcher = AgentSpec(
    name="researcher", role="Find information",
    model=OllamaBackend(model="qwen3:1.7b"),     # Cheap SLM
    expertise_keywords=["search", "find", "research"],
)
coder = AgentSpec(
    name="coder", role="Write and debug code",
    model=OllamaBackend(model="phi4-mini"),       # Better SLM for code
    expertise_keywords=["code", "program", "debug"],
)

team = AgentTeam(
    agents=[researcher, coder],
    default_model=OllamaBackend(model="qwen3:1.7b"),
    environment=my_env,
)

# Auto-delegates to the best agent
result = team.run_task(purpose="Search for Python sorting algorithms")
print(team.get_learning_report())  # See shared knowledge

Human-in-the-Loop

Humans can override Φ scores → the agent permanently learns preferences:

from purpose_agent import HITLOrchestrator, CLIInputHandler

hitl = HITLOrchestrator(
    orchestrator=orch,
    input_handler=CLIInputHandler(),
    approve_actions=True,      # Approve each action
    review_scores=True,        # Override Φ scores
    checkpoint_dir="./checkpoints",
)
result = hitl.run_task(purpose="Important task")

# Inject knowledge directly
hitl.inject_heuristic(
    pattern="When facing {problem_type}",
    strategy="Always try the simplest approach first",
)

Streaming

Real-time event streaming for UIs:

import asyncio
from purpose_agent import AsyncOrchestrator

async def main():
    async_orch = AsyncOrchestrator(orch)
    async for event in async_orch.run_task_stream(purpose="..."):
        if event.event_type == "action":
            print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
        elif event.event_type == "score":
            print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")

asyncio.run(main())

Tool Framework

from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool

# Create tool from any function
@FunctionTool.from_function
def search(query: str) -> str:
    """Search the web for information."""
    return requests.get(f"https://api.search.com?q={query}").text

# Tool RAG for SLMs (only load relevant tools into prompt)
registry = ToolRegistry()
registry.register(CalculatorTool())
registry.register(PythonExecTool())
registry.register(search)

relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
# → [CalculatorTool, PythonExecTool]  (search excluded — saves tokens)

Cost Tracking

from purpose_agent import CostTracker

tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
tracker.record(prompt_tokens=500, completion_tokens=200)
print(tracker.summary())
# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}

Benchmark & Prove Self-Improvement

from purpose_agent import BenchmarkRunner, BenchmarkTask

runner = BenchmarkRunner(orchestrator=orch)
tasks = [
    BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
    BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
]

result = runner.run(tasks, iterations=10, name="MazeTest")
print(result.summary())
# Iteration    Success Rate      Avg Φ    Avg Steps   Avg Reward
# -----------------------------------------------------------------
#          1          40.0%       4.20          12.0         3.20
#          5          70.0%       6.80           8.0         6.50
#         10          90.0%       8.50           6.0         8.90
# Improvement: 40.0% → 90.0% (+50.0%)

result.save("results/benchmark.json")

Literature Foundation

Paper	What it contributes
MUSE	3-tier memory (strategic/procedural/tool)
LATS	LLM-as-value-function V(s)
REMEMBERER	Q-value experience replay
Reflexion	Verbal reinforcement
SPC	Anti-reward-hacking
CER	Contextual experience distillation
MemRL	Two-phase retrieval
TinyAgent	SLM-native agent patterns

Installation

# Core (no dependencies beyond stdlib)
git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent

# For local SLMs
pip install ollama

# For cloud LLMs
pip install huggingface_hub  # or: pip install openai

# Run demo (no API keys needed)
python demo.py

License

MIT