library_name: purpose-agent
license: mit
language:
- en
tags:
- reinforcement-learning
- agents
- self-improving
- memory-system
- multi-agent
- slm
- local-first
- evaluation
- safety
- immune-system
pipeline_tag: text-generation
Purpose Agent
A local-first self-improvement kernel for AI agents.
Agents that learn from experience β without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
pip install purpose-agent
import purpose_agent as pa
team = pa.purpose("Help me write Python code")
result = team.run("Write a fibonacci function")
print(result)
team.teach("Always add type hints")
# Next run uses what it learned
How It Works (30-Second Version)
- You give it a purpose. "Help me write Python code."
- It builds a team. Architect + Coder + Tester β auto-selected from your description.
- It runs the task. The agent writes code. A separate critic (the Purpose Function) scores every step.
- It learns. Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
- Next run is better. Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
Real-World Test Results
Tested with Llama-3.3-70B and Gemma-4-26B via OpenRouter:
| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
|---|---|---|---|---|
| Llama-3.3-70B | β 100% | β 100% | β 100% | 0β3β9β18 heuristics |
| Gemma-4-26B | β 100% | β 100% | β 100% | 0β3β6β11 heuristics |
0-day production test: 19/19 pass on Llama-3.3-70B across all 3 usage levels. Immune system: 93% adversarial catch rate, 0% false positives. Test suite: 119 unit tests, all passing. See LAUNCH_READINESS.md.
Install
pip install purpose-agent # Core (zero dependencies)
pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
pip install purpose-agent[ollama] # + Local Ollama
pip install purpose-agent[all] # Everything
Three Levels of Usage
Level 1 β Describe what you want
import purpose_agent as pa
team = pa.purpose("Write Python code and test it") # β architect + coder + tester
team = pa.purpose("Research quantum computing") # β researcher + analyst
team = pa.purpose("Write blog posts about AI") # β writer + editor
result = team.run("Write a sorting algorithm")
team.teach("Always handle edge cases")
print(team.status()) # See what it's learned
Level 2 β Choose your model
# Local (free, private)
team = pa.purpose("Code helper", model="qwen3:1.7b")
# Cloud providers
team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
team = pa.purpose("Code helper", model="openai:gpt-4o")
# Any OpenAI-compatible API
from purpose_agent import resolve_backend
backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
Supported providers: OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.
Level 3 β Full control
Purpose Agent has its own API vocabulary β original names, not borrowed from other frameworks.
import purpose_agent as pa
# ββ Spark: a single intelligent agent ββ
spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct")
result = spark.run("Write a fibonacci function")
# ββ Flow: workflow engine with conditional routing ββ
flow = pa.Flow()
flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b"))
flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b"))
flow.add_edge(pa.BEGIN, "research")
flow.add_edge("research", "write")
flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
result = flow.run(initial_state)
# ββ swarm: run tasks concurrently ββ
results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])
# ββ Council: agents deliberate together ββ
council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
result = council.run("Design a web scraper", rounds=3)
# ββ Vault: knowledge store with RAG-as-a-tool ββ
vault = pa.Vault.from_directory("./docs")
spark = pa.Spark("assistant", tools=[vault.as_tool()])
result = spark.run("What does the documentation say about X?")
# ββ LLMCompiler: parallel tool execution via DAG planning ββ
compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
API Reference (Level 3)
| Name | What | Example |
|---|---|---|
pa.Spark(name, model, tools) |
Create an intelligent agent | pa.Spark("coder", model="qwen3:1.7b") |
pa.Flow() |
Workflow engine with nodes and edges | flow.add_node("step", handler) |
pa.swarm(tasks, agents) |
Run tasks concurrently | pa.swarm(["a","b"], [s1, s2]) |
pa.Council(agents) |
Agent deliberation rounds | council.run("topic", rounds=3) |
pa.Vault.from_texts(list) |
Knowledge store for RAG | vault.query("search term") |
pa.BEGIN |
Flow start node | flow.add_edge(pa.BEGIN, "first") |
pa.DONE_SIGNAL |
Flow end node | flow.add_edge("last", pa.DONE_SIGNAL) |
Evidence-Gated Memory
Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
candidate β immune scan β quarantine β replay test β promote (or reject)
- Immune scan blocks prompt injection, score manipulation, API key leaks, tool misuse
- Quarantine holds memories until they're tested
- Promotion happens only after evidence shows the memory helps
- Rejection preserves the memory for audit but never exposes it to the agent
Seven memory types: purpose_contract, user_preference, skill_card, episodic_case, failure_pattern, critic_calibration, tool_policy.
Honest Evaluation
from purpose_agent import RunMode
RunMode.LEARNING_TRAIN # Full read/write β this is where agents learn
RunMode.LEARNING_VALIDATION # Read + staging β validates before promoting
RunMode.EVAL_TEST # NO writes β numbers you can trust
Secure Tools
- CalculatorTool β AST-validated, no
eval()on arbitrary text - PythonExecTool β subprocess with timeout + isolated temp directory
- ReadFile/WriteFile β sandboxed to declared root directory
Architecture
See ARCHITECTURE.md for the complete technical documentation.
34 Python modules, ~500KB:
Core Engine β Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
V2 Kernel β Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
Research β Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
Breakthroughs β Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
Capabilities β Spark, Flow, swarm, Council, Vault
Easy API β purpose(), Team, quickstart wizard
Literature
Built on 13 published papers. Full research trace: COMPILED_RESEARCH.md. Formal proofs: PURPOSE_LEARNING.md.
| Paper | What it contributes |
|---|---|
| MUSE | 3-tier memory hierarchy |
| LATS | LLM-as-value-function |
| REMEMBERER | Q-value experience replay |
| Reflexion | Verbal reinforcement |
| SPC | Anti-reward-hacking |
| CER | Experience distillation |
| MemRL | Two-phase retrieval |
| TinyAgent | SLM-native patterns |
| Meta-Rewarding | Self-improving critic |
| Self-Taught Eval | Synthetic critic training |
| DSPy | Automatic prompt optimization |
| LLMCompiler | Parallel function calling |
| Retroformer | Structured reflection |
CLI
python -m purpose_agent # Interactive wizard
purpose-agent # Same, via entry point
License
MIT