| --- |
| library_name: purpose-agent |
| license: mit |
| language: |
| - en |
| tags: |
| - reinforcement-learning |
| - agents |
| - self-improving |
| - memory-system |
| - multi-agent |
| - slm |
| - local-first |
| - evaluation |
| - safety |
| - immune-system |
| pipeline_tag: text-generation |
| --- |
| |
| # Purpose Agent |
|
|
| **A local-first self-improvement kernel for AI agents.** |
|
|
| Agents that learn from experience β without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI. |
|
|
| ```bash |
| pip install purpose-agent |
| ``` |
|
|
| ```python |
| import purpose_agent as pa |
| |
| team = pa.purpose("Help me write Python code") |
| result = team.run("Write a fibonacci function") |
| print(result) |
| |
| team.teach("Always add type hints") |
| # Next run uses what it learned |
| ``` |
|
|
| ## How It Works (30-Second Version) |
|
|
| 1. **You give it a purpose.** "Help me write Python code." |
| 2. **It builds a team.** Architect + Coder + Tester β auto-selected from your description. |
| 3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step. |
| 4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system. |
| 5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates. |
|
|
| ## Real-World Test Results |
|
|
| Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter: |
|
|
| | Model | fibonacci | fizzbuzz | factorial | Self-Improvement | |
| |-------|-----------|----------|-----------|-----------------| |
| | Llama-3.3-70B | β 100% | β 100% | β 100% | 0β3β9β18 heuristics | |
| | Gemma-4-26B | β 100% | β 100% | β 100% | 0β3β6β11 heuristics | |
|
|
| **0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels. |
| **Immune system:** 93% adversarial catch rate, 0% false positives. |
| **Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md). |
|
|
| ## Install |
|
|
| ```bash |
| pip install purpose-agent # Core (zero dependencies) |
| pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter |
| pip install purpose-agent[ollama] # + Local Ollama |
| pip install purpose-agent[all] # Everything |
| ``` |
|
|
| ## Three Levels of Usage |
|
|
| ### Level 1 β Describe what you want |
|
|
| ```python |
| import purpose_agent as pa |
| |
| team = pa.purpose("Write Python code and test it") # β architect + coder + tester |
| team = pa.purpose("Research quantum computing") # β researcher + analyst |
| team = pa.purpose("Write blog posts about AI") # β writer + editor |
| |
| result = team.run("Write a sorting algorithm") |
| team.teach("Always handle edge cases") |
| print(team.status()) # See what it's learned |
| ``` |
|
|
| ### Level 2 β Choose your model |
|
|
| ```python |
| # Local (free, private) |
| team = pa.purpose("Code helper", model="qwen3:1.7b") |
| |
| # Cloud providers |
| team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct") |
| team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile") |
| team = pa.purpose("Code helper", model="openai:gpt-4o") |
| |
| # Any OpenAI-compatible API |
| from purpose_agent import resolve_backend |
| backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...") |
| ``` |
|
|
| Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.** |
|
|
| ### Level 3 β Full control |
|
|
| Purpose Agent has its own API vocabulary β original names, not borrowed from other frameworks. |
|
|
| ```python |
| import purpose_agent as pa |
| |
| # ββ Spark: a single intelligent agent ββ |
| spark = pa.Spark("coder", model="openrouter:meta-llama/llama-3.3-70b-instruct") |
| result = spark.run("Write a fibonacci function") |
| |
| # ββ Flow: workflow engine with conditional routing ββ |
| flow = pa.Flow() |
| flow.add_node("research", pa.Spark("researcher", model="qwen3:1.7b")) |
| flow.add_node("write", pa.Spark("writer", model="qwen3:1.7b")) |
| flow.add_edge(pa.BEGIN, "research") |
| flow.add_edge("research", "write") |
| flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"}) |
| result = flow.run(initial_state) |
| |
| # ββ swarm: run tasks concurrently ββ |
| results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c]) |
| |
| # ββ Council: agents deliberate together ββ |
| council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")]) |
| result = council.run("Design a web scraper", rounds=3) |
| |
| # ββ Vault: knowledge store with RAG-as-a-tool ββ |
| vault = pa.Vault.from_directory("./docs") |
| spark = pa.Spark("assistant", tools=[vault.as_tool()]) |
| result = spark.run("What does the documentation say about X?") |
| |
| # ββ LLMCompiler: parallel tool execution via DAG planning ββ |
| compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry) |
| result = compiler.compile_and_execute("Calculate X and search Y simultaneously") |
| ``` |
|
|
| ## API Reference (Level 3) |
|
|
| | Name | What | Example | |
| |------|------|---------| |
| | `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` | |
| | `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` | |
| | `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` | |
| | `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` | |
| | `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` | |
| | `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` | |
| | `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` | |
|
|
| ## Evidence-Gated Memory |
|
|
| Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline: |
|
|
| ``` |
| candidate β immune scan β quarantine β replay test β promote (or reject) |
| ``` |
|
|
| - **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse |
| - **Quarantine** holds memories until they're tested |
| - **Promotion** happens only after evidence shows the memory helps |
| - **Rejection** preserves the memory for audit but never exposes it to the agent |
|
|
| Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`. |
|
|
| ## Honest Evaluation |
|
|
| ```python |
| from purpose_agent import RunMode |
| |
| RunMode.LEARNING_TRAIN # Full read/write β this is where agents learn |
| RunMode.LEARNING_VALIDATION # Read + staging β validates before promoting |
| RunMode.EVAL_TEST # NO writes β numbers you can trust |
| ``` |
|
|
| ## Secure Tools |
|
|
| - **CalculatorTool** β AST-validated, no `eval()` on arbitrary text |
| - **PythonExecTool** β subprocess with timeout + isolated temp directory |
| - **ReadFile/WriteFile** β sandboxed to declared root directory |
|
|
| ## Architecture |
|
|
| See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation. |
|
|
| 34 Python modules, ~500KB: |
|
|
| ``` |
| Core Engine β Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator |
| V2 Kernel β Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark |
| Research β Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer |
| Breakthroughs β Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution |
| Capabilities β Spark, Flow, swarm, Council, Vault |
| Easy API β purpose(), Team, quickstart wizard |
| ``` |
|
|
| ## Literature |
|
|
| Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md). |
|
|
| | Paper | What it contributes | |
| |-------|-------------------| |
| | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy | |
| | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function | |
| | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay | |
| | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement | |
| | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking | |
| | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation | |
| | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval | |
| | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns | |
| | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic | |
| | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training | |
| | [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization | |
| | [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling | |
| | [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection | |
|
|
| ## CLI |
|
|
| ```bash |
| python -m purpose_agent # Interactive wizard |
| purpose-agent # Same, via entry point |
| ``` |
|
|
| ## License |
|
|
| MIT |
|
|