--- library_name: purpose-agent license: mit language: - en tags: - reinforcement-learning - agents - self-improving - memory-system - multi-agent - slm - local-first - evaluation - safety - immune-system pipeline_tag: text-generation --- # Purpose Agent **A local-first self-improvement kernel for AI agents.** Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI. ```bash pip install purpose-agent ``` ```python import purpose_agent as pa team = pa.purpose("Help me write Python code") result = team.run("Write a fibonacci function") print(result) team.teach("Always add type hints") # Next run uses what it learned ``` ## How It Works (30-Second Version) 1. **You give it a purpose.** "Help me write Python code." 2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description. 3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step. 4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system. 5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates. ## Real-World Test Results Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter: | Model | fibonacci | fizzbuzz | factorial | Self-Improvement | |-------|-----------|----------|-----------|-----------------| | Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics | | Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics | **0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels. **Immune system:** 93% adversarial catch rate, 0% false positives. **Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md). ## Install ```bash pip install purpose-agent # Core (zero dependencies) pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter pip install purpose-agent[ollama] # + Local Ollama pip install purpose-agent[all] # Everything ``` ## Three Levels of Usage ### Level 1 — Describe what you want ```python import purpose_agent as pa team = pa.purpose("Write Python code and test it") # → architect + coder + tester team = pa.purpose("Research quantum computing") # → researcher + analyst team = pa.purpose("Write blog posts about AI") # → writer + editor result = team.run("Write a sorting algorithm") team.teach("Always handle edge cases") print(team.status()) # See what it's learned ``` ### Level 2 — Choose your model ```python # Local (free, private) team = pa.purpose("Code helper", model="qwen3:1.7b") # Cloud providers team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct") team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile") team = pa.purpose("Code helper", model="openai:gpt-4o") # Any OpenAI-compatible API from purpose_agent import resolve_backend backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...") ``` Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.** ### Level 3 — Full control Purpose Agent has its own API vocabulary — original names, not borrowed from other frameworks. ```python import purpose_agent as pa # ── Spark: a single intelligent agent ── spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct") result = spark.run("Write a fibonacci function") # ── Flow: workflow engine with conditional routing ── flow = pa.Flow() flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b")) flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b")) flow.add_edge(pa.BEGIN, "research") flow.add_edge("research", "write") flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"}) result = flow.run(initial_state) # ── swarm: run tasks concurrently ── results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c]) # ── Council: agents deliberate together ── council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")]) result = council.run("Design a web scraper", rounds=3) # ── Vault: knowledge store with RAG-as-a-tool ── vault = pa.Vault.from_directory("./docs") spark = pa.Spark("assistant", tools=[vault.as_tool()]) result = spark.run("What does the documentation say about X?") # ── LLMCompiler: parallel tool execution via DAG planning ── compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry) result = compiler.compile_and_execute("Calculate X and search Y simultaneously") ``` ## API Reference (Level 3) | Name | What | Example | |------|------|---------| | `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` | | `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` | | `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` | | `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` | | `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` | | `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` | | `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` | ## Evidence-Gated Memory Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline: ``` candidate → immune scan → quarantine → replay test → promote (or reject) ``` - **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse - **Quarantine** holds memories until they're tested - **Promotion** happens only after evidence shows the memory helps - **Rejection** preserves the memory for audit but never exposes it to the agent Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`. ## Honest Evaluation ```python from purpose_agent import RunMode RunMode.LEARNING_TRAIN # Full read/write — this is where agents learn RunMode.LEARNING_VALIDATION # Read + staging — validates before promoting RunMode.EVAL_TEST # NO writes — numbers you can trust ``` ## Secure Tools - **CalculatorTool** — AST-validated, no `eval()` on arbitrary text - **PythonExecTool** — subprocess with timeout + isolated temp directory - **ReadFile/WriteFile** — sandboxed to declared root directory ## Architecture See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation. 34 Python modules, ~500KB: ``` Core Engine → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator V2 Kernel → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark Research → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer Breakthroughs → Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution Capabilities → Spark, Flow, swarm, Council, Vault Easy API → purpose(), Team, quickstart wizard ``` ## Literature Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md). | Paper | What it contributes | |-------|-------------------| | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy | | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function | | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay | | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement | | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking | | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation | | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval | | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns | | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic | | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training | | [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization | | [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling | | [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection | ## CLI ```bash python -m purpose_agent # Interactive wizard purpose-agent # Same, via entry point ``` ## License MIT