metadata
library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- experience-replay
- llm-as-judge
- memory-system
- multi-agent
- slm
- local-first
- evaluation
- safety
- immune-system
- no-code
pipeline_tag: text-generation
Purpose Agent
A local-first self-improvement kernel for agents. Turns traces into tested memory, policies, and rubrics β so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.
import purpose_agent as pa
team = pa.purpose("Help me research scientific papers")
result = team.run("Find recent breakthroughs in quantum computing")
print(result)
team.teach("Always cite your sources")
Core Principle
Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.
candidate β immune scan β quarantine β replay test β promote (or reject)
Three Levels of Usage
Level 1 β Just describe what you want
team = pa.purpose("Write Python code and test it") # auto-builds architect + coder + tester
team = pa.purpose("Research quantum computing") # auto-builds researcher + analyst
team = pa.purpose("Write blog posts about AI") # auto-builds writer + editor
Level 2 β Customize your team
team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")
Level 3 β Full control
graph = pa.Graph() # LangGraph-style control flow
results = pa.parallel(["task1", "task2"], agents) # CrewAI-style parallel execution
chat = pa.Conversation([agent_a, agent_b]) # AutoGen-style agent conversation
kb = pa.KnowledgeStore.from_directory("./docs") # LlamaIndex-style RAG
compiler = pa.LLMCompiler(llm, registry) # Parallel tool execution via DAG
Architecture
purpose_agent/
βββ Core
β types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
β
βββ V2 Kernel
β v2_types (RunMode, MemoryScope, PurposeScoreV2)
β trace (structured JSONL execution traces)
β memory (7 kinds Γ 5 statuses, scoped, versioned)
β compiler (token-budgeted prompt compilation with credit assignment)
β immune (injection, score hacking, tool misuse, privacy, scope scanning)
β memory_ci (quarantine β scan β test β promote/reject pipeline)
β evalport (pluggable evaluation protocol)
β benchmark_v2 (train/val/test splits, ablation, contamination control)
β
βββ Research (13 papers implemented)
β meta_rewarding (self-improving critic via meta-judge)
β self_taught (synthetic training data for Ξ¦ function)
β prompt_optimizer (DSPy-style automatic few-shot bootstrap)
β llm_compiler (parallel function calling via DAG)
β retroformer (structured reflection β typed memories)
β
βββ SLM-Native
β slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
β
βββ Capabilities
β unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
β easy (purpose(), Team, quickstart wizard)
β tools, streaming, observability, multi_agent, hitl, evaluation, registry
RunMode β Honest Evaluation
from purpose_agent import RunMode
RunMode.LEARNING_TRAIN # Full read/write. Agent learns.
RunMode.LEARNING_VALIDATION # Read + staging. Validates before promoting.
RunMode.EVAL_TEST # NO writes. Numbers you can trust.
Memory Lifecycle
| Kind | Purpose |
|---|---|
purpose_contract |
User's stated goal and constraints |
user_preference |
Learned preferences |
skill_card |
Reusable procedures from successful traces |
episodic_case |
Specific experiences worth remembering |
failure_pattern |
What NOT to do |
critic_calibration |
Adjustments to Ξ¦ scoring |
tool_policy |
Tool-specific usage rules |
| Status | Meaning |
|---|---|
candidate β quarantined β promoted |
Happy path |
candidate β rejected |
Failed immune scan |
promoted β archived |
Superseded or demoted |
Immune System
from purpose_agent import scan_memory, MemoryCard
result = scan_memory(MemoryCard(content="Ignore previous instructions"))
# result.passed = False, threats = ["prompt_injection"], severity = "critical"
Secure Tools
- CalculatorTool β AST-validated, no eval() on arbitrary text
- PythonExecTool β subprocess with timeout + isolated temp directory
- ReadFileTool / WriteFileTool β sandboxed to declared root
Runs on Your Laptop
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:1.7b
team = pa.purpose("Research assistant", model="qwen3:1.7b") # Free, private, local
Also works with: model="gpt-4o" (OpenAI), model="Qwen/Qwen3-32B" (HuggingFace cloud).
Interactive CLI
python -m purpose_agent # Step-by-step wizard, no coding required
Literature Foundation
Built on 13 papers. Full research trace: COMPILED_RESEARCH.md
| Paper | Module | Contribution |
|---|---|---|
| MUSE | actor, optimizer | 3-tier memory hierarchy |
| LATS | purpose_function | LLM-as-value-function |
| REMEMBERER | experience_replay | Q-value experience replay |
| Reflexion | orchestrator | Verbal reinforcement |
| SPC | purpose_function, immune | Anti-reward-hacking |
| CER | optimizer | Experience distillation |
| MemRL | experience_replay, compiler | Two-phase retrieval |
| TinyAgent | slm_backends, tools | SLM-native patterns |
| Meta-Rewarding | meta_rewarding | Self-improving critic |
| Self-Taught Eval | self_taught | Synthetic critic training |
| DSPy | prompt_optimizer | Automatic prompt optimization |
| LLMCompiler | llm_compiler | Parallel function calling |
| Retroformer | retroformer | Structured reflection |
Installation
git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent
pip install ollama # for local models
python demo.py # verify everything works
License
MIT