v0.2.0: Complete README with SLM-native, multi-agent, HITL, eval, streaming, tools, observability

adb4257 verified 26 days ago

9.27 kB

	---
	library_name: purpose-agent
	license: mit
	language:
	- en
	tags:
	- reinforcement-learning
	- agents
	- self-improving
	- experience-replay
	- llm-as-judge
	- state-value-evaluation
	- memory-augmented
	- react
	- orchestration
	- modular
	- slm
	- small-language-models
	- multi-agent
	- human-in-the-loop
	- streaming
	- tools
	- evaluation
	- ollama
	- local-models
	pipeline_tag: text-generation
	---

	# Purpose Agent v0.2.0

	The world's first SLM-native self-improving agentic framework.

	Works with both Small Language Models (0.6B–3B params, local, $0 cost) and Large Language Models (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.

	## What Makes This Different

	\| Feature \| Purpose Agent \| LangChain \| LangGraph \| CrewAI \| AutoGen \| smolagents \|
	\|---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| Self-Improvement \| ✅ Φ(s) + experience replay + heuristic distillation \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \|
	\| SLM-Native \| ✅ Grammar-constrained JSON, prompt compression, Tool RAG \| ❌ \| ❌ \| ❌ \| ❌ \| ⚠️ \|
	\| Anti-Reward-Hacking \| ✅ 7 strict rules + cache consistency + anomaly detection \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \|
	\| 3-Tier Memory \| ✅ Strategic/Procedural/Tool with Q-value retrieval \| ❌ \| ⚠️ \| ⚠️ \| ❌ \| ❌ \|
	\| Multi-Agent with Shared Learning \| ✅ Agents learn from each other \| ❌ \| ⚠️ \| ✅ \| ✅ \| ⚠️ \|
	\| Human Φ Override \| ✅ Humans teach the critic → permanent learning \| ❌ \| ⚠️ \| ❌ \| ❌ \| ❌ \|
	\| Streaming \| ✅ Event + token streaming \| ✅ \| ✅ \| ⚠️ \| ⚠️ \| ✅ \|
	\| Tool Framework \| ✅ Schema, validation, retry, Tool RAG \| ✅ \| ✅ \| ✅ \| ✅ \| ✅ \|
	\| Cost Tracking \| ✅ Per-call token + USD tracking \| ⚠️ \| ⚠️ \| ❌ \| ❌ \| ❌ \|
	\| Benchmark Harness \| ✅ Improvement curve tracking \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \|
	\| Lightweight \| ✅ ~150KB, stdlib only \| ❌ \| ❌ \| ⚠️ \| ⚠️ \| ✅ \|
	\| Literature-Grounded \| ✅ 8 papers implemented \| ❌ \| ❌ \| ❌ \| ❌ \| ❌ \|

	## Architecture

	```
	purpose_agent/
	├── types.py # Core data types
	├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
	├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
	├── actor.py # ReAct agent with 3-tier memory
	├── purpose_function.py # Non-hackable Φ(s) critic
	├── experience_replay.py # Two-phase retrieval (similarity + Q-value)
	├── optimizer.py # Trajectory → heuristic distillation
	├── orchestrator.py # Main loop
	├── streaming.py # 🆕 Async engine + event streaming
	├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG
	├── observability.py # 🆕 Cost tracking, callbacks, metrics
	├── multi_agent.py # 🆕 Agent teams with shared learning
	├── hitl.py # 🆕 Human-in-the-loop + checkpointing
	└── evaluation.py # 🆕 Benchmark runner + improvement curves
	```

	## Quick Start — Local SLM (Zero Cost)

	```bash
	# 1. Install Ollama
	curl -fsSL https://ollama.ai/install.sh \| sh

	# 2. Pull a small model (1.7B params, runs on any laptop)
	ollama pull qwen3:1.7b

	# 3. Run your agent
	python my_agent.py
	```

	```python
	from purpose_agent import (
	Orchestrator, OllamaBackend, State, Environment, Action,
	CalculatorTool, ToolRegistry,
	)

	# SLM backend — runs locally, zero cost
	llm = OllamaBackend(model="qwen3:1.7b") # 1.7B params

	# Or use a cloud LLM
	# from purpose_agent import HFInferenceBackend
	# llm = HFInferenceBackend(model_id="Qwen/Qwen3-32B", provider="cerebras")

	class MyEnv(Environment):
	def execute(self, action, state):
	return State(data={"result": "done"})

	orch = Orchestrator(llm=llm, environment=MyEnv())
	result = orch.run_task(purpose="Solve the problem", max_steps=10)
	print(result.summary())
	```

	## SLM Model Registry

	Pre-configured models optimized for agent tasks:

	```python
	from purpose_agent import create_slm_backend

	backend = create_slm_backend("phi-4-mini") # 3.8B — best tool-use accuracy
	backend = create_slm_backend("qwen3-1.7b") # 1.7B — best balance
	backend = create_slm_backend("qwen3-0.6b") # 0.6B — ultra-light
	backend = create_slm_backend("llama-3.2-1b") # 1B — 128K context
	backend = create_slm_backend("smollm2-1.7b") # 1.7B — HF native
	```

	## Multi-Agent with Shared Learning

	Agents learn from each other — when one agent solves a problem, all benefit:

	```python
	from purpose_agent import AgentSpec, AgentTeam, OllamaBackend

	researcher = AgentSpec(
	name="researcher", role="Find information",
	model=OllamaBackend(model="qwen3:1.7b"), # Cheap SLM
	expertise_keywords=["search", "find", "research"],
	)
	coder = AgentSpec(
	name="coder", role="Write and debug code",
	model=OllamaBackend(model="phi4-mini"), # Better SLM for code
	expertise_keywords=["code", "program", "debug"],
	)

	team = AgentTeam(
	agents=[researcher, coder],
	default_model=OllamaBackend(model="qwen3:1.7b"),
	environment=my_env,
	)

	# Auto-delegates to the best agent
	result = team.run_task(purpose="Search for Python sorting algorithms")
	print(team.get_learning_report()) # See shared knowledge
	```

	## Human-in-the-Loop

	Humans can override Φ scores → the agent permanently learns preferences:

	```python
	from purpose_agent import HITLOrchestrator, CLIInputHandler

	hitl = HITLOrchestrator(
	orchestrator=orch,
	input_handler=CLIInputHandler(),
	approve_actions=True, # Approve each action
	review_scores=True, # Override Φ scores
	checkpoint_dir="./checkpoints",
	)
	result = hitl.run_task(purpose="Important task")

	# Inject knowledge directly
	hitl.inject_heuristic(
	pattern="When facing {problem_type}",
	strategy="Always try the simplest approach first",
	)
	```

	## Streaming

	Real-time event streaming for UIs:

	```python
	import asyncio
	from purpose_agent import AsyncOrchestrator

	async def main():
	async_orch = AsyncOrchestrator(orch)
	async for event in async_orch.run_task_stream(purpose="..."):
	if event.event_type == "action":
	print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
	elif event.event_type == "score":
	print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")

	asyncio.run(main())
	```

	## Tool Framework

	```python
	from purpose_agent import FunctionTool, ToolRegistry, CalculatorTool, PythonExecTool

	# Create tool from any function
	@FunctionTool.from_function
	def search(query: str) -> str:
	"""Search the web for information."""
	return requests.get(f"https://api.search.com?q={query}").text

	# Tool RAG for SLMs (only load relevant tools into prompt)
	registry = ToolRegistry()
	registry.register(CalculatorTool())
	registry.register(PythonExecTool())
	registry.register(search)

	relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
	# → [CalculatorTool, PythonExecTool] (search excluded — saves tokens)
	```

	## Cost Tracking

	```python
	from purpose_agent import CostTracker

	tracker = CostTracker(model_name="qwen3:1.7b", cost_per_1m_input=0.005)
	tracker.record(prompt_tokens=500, completion_tokens=200)
	print(tracker.summary())
	# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
	```

	## Benchmark & Prove Self-Improvement

	```python
	from purpose_agent import BenchmarkRunner, BenchmarkTask

	runner = BenchmarkRunner(orchestrator=orch)
	tasks = [
	BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
	BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
	]

	result = runner.run(tasks, iterations=10, name="MazeTest")
	print(result.summary())
	# Iteration Success Rate Avg Φ Avg Steps Avg Reward
	# -----------------------------------------------------------------
	# 1 40.0% 4.20 12.0 3.20
	# 5 70.0% 6.80 8.0 6.50
	# 10 90.0% 8.50 6.0 8.90
	# Improvement: 40.0% → 90.0% (+50.0%)

	result.save("results/benchmark.json")
	```

	## Literature Foundation

	\| Paper \| What it contributes \|
	\|-------\|-------------------\|
	\| [MUSE](https://arxiv.org/abs/2510.08002) \| 3-tier memory (strategic/procedural/tool) \|
	\| [LATS](https://arxiv.org/abs/2310.04406) \| LLM-as-value-function V(s) \|
	\| [REMEMBERER](https://arxiv.org/abs/2306.07929) \| Q-value experience replay \|
	\| [Reflexion](https://arxiv.org/abs/2303.11366) \| Verbal reinforcement \|
	\| [SPC](https://arxiv.org/abs/2504.19162) \| Anti-reward-hacking \|
	\| [CER](https://arxiv.org/abs/2506.06698) \| Contextual experience distillation \|
	\| [MemRL](https://arxiv.org/abs/2601.03192) \| Two-phase retrieval \|
	\| [TinyAgent](https://arxiv.org/abs/2409.00608) \| SLM-native agent patterns \|

	## Installation

	```bash
	# Core (no dependencies beyond stdlib)
	git clone https://huggingface.co/Rohan03/purpose-agent
	cd purpose-agent

	# For local SLMs
	pip install ollama

	# For cloud LLMs
	pip install huggingface_hub # or: pip install openai

	# Run demo (no API keys needed)
	python demo.py
	```

	## License

	MIT