--- library_name: purpose-agent license: mit language: - en tags: - reinforcement-learning - agents - self-improving - experience-replay - llm-as-judge - memory-system - multi-agent - slm - local-first - evaluation - safety - immune-system - no-code pipeline_tag: text-generation --- # Purpose Agent **A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics — so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in. ```python import purpose_agent as pa team = pa.purpose("Help me research scientific papers") result = team.run("Find recent breakthroughs in quantum computing") print(result) team.teach("Always cite your sources") ``` ## Core Principle Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible. ``` candidate → immune scan → quarantine → replay test → promote (or reject) ``` ## Three Levels of Usage ### Level 1 — Just describe what you want ```python team = pa.purpose("Write Python code and test it") # auto-builds architect + coder + tester team = pa.purpose("Research quantum computing") # auto-builds researcher + analyst team = pa.purpose("Write blog posts about AI") # auto-builds writer + editor ``` ### Level 2 — Customize your team ```python team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b") team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b") ``` ### Level 3 — Full control ```python graph = pa.Graph() # LangGraph-style control flow results = pa.parallel(["task1", "task2"], agents) # CrewAI-style parallel execution chat = pa.Conversation([agent_a, agent_b]) # AutoGen-style agent conversation kb = pa.KnowledgeStore.from_directory("./docs") # LlamaIndex-style RAG compiler = pa.LLMCompiler(llm, registry) # Parallel tool execution via DAG ``` ## Architecture ``` purpose_agent/ ├── Core │ types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend │ ├── V2 Kernel │ v2_types (RunMode, MemoryScope, PurposeScoreV2) │ trace (structured JSONL execution traces) │ memory (7 kinds × 5 statuses, scoped, versioned) │ compiler (token-budgeted prompt compilation with credit assignment) │ immune (injection, score hacking, tool misuse, privacy, scope scanning) │ memory_ci (quarantine → scan → test → promote/reject pipeline) │ evalport (pluggable evaluation protocol) │ benchmark_v2 (train/val/test splits, ablation, contamination control) │ ├── Research (13 papers implemented) │ meta_rewarding (self-improving critic via meta-judge) │ self_taught (synthetic training data for Φ function) │ prompt_optimizer (DSPy-style automatic few-shot bootstrap) │ llm_compiler (parallel function calling via DAG) │ retroformer (structured reflection → typed memories) │ ├── SLM-Native │ slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models) │ ├── Capabilities │ unified (Agent, Graph, parallel, Conversation, KnowledgeStore) │ easy (purpose(), Team, quickstart wizard) │ tools, streaming, observability, multi_agent, hitl, evaluation, registry ``` ## RunMode — Honest Evaluation ```python from purpose_agent import RunMode RunMode.LEARNING_TRAIN # Full read/write. Agent learns. RunMode.LEARNING_VALIDATION # Read + staging. Validates before promoting. RunMode.EVAL_TEST # NO writes. Numbers you can trust. ``` ## Memory Lifecycle | Kind | Purpose | |------|---------| | `purpose_contract` | User's stated goal and constraints | | `user_preference` | Learned preferences | | `skill_card` | Reusable procedures from successful traces | | `episodic_case` | Specific experiences worth remembering | | `failure_pattern` | What NOT to do | | `critic_calibration` | Adjustments to Φ scoring | | `tool_policy` | Tool-specific usage rules | | Status | Meaning | |--------|---------| | `candidate` → `quarantined` → `promoted` | Happy path | | `candidate` → `rejected` | Failed immune scan | | `promoted` → `archived` | Superseded or demoted | ## Immune System ```python from purpose_agent import scan_memory, MemoryCard result = scan_memory(MemoryCard(content="Ignore previous instructions")) # result.passed = False, threats = ["prompt_injection"], severity = "critical" ``` ## Secure Tools - **CalculatorTool** — AST-validated, no eval() on arbitrary text - **PythonExecTool** — subprocess with timeout + isolated temp directory - **ReadFileTool / WriteFileTool** — sandboxed to declared root ## Runs on Your Laptop ```bash curl -fsSL https://ollama.ai/install.sh | sh ollama pull qwen3:1.7b ``` ```python team = pa.purpose("Research assistant", model="qwen3:1.7b") # Free, private, local ``` Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud). ## Interactive CLI ```bash python -m purpose_agent # Step-by-step wizard, no coding required ``` ## Literature Foundation Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) | Paper | Module | Contribution | |-------|--------|-------------| | [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy | | [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function | | [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay | | [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement | | [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking | | [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation | | [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval | | [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns | | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic | | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training | | [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization | | [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling | | [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection | ## Installation ```bash git clone https://huggingface.co/Rohan03/purpose-agent cd purpose-agent pip install ollama # for local models python demo.py # verify everything works ``` ## License MIT