| --- |
| library_name: purpose-agent |
| license: mit |
| language: |
| - en |
| tags: |
| - reinforcement-learning |
| - agents |
| - self-improving |
| - experience-replay |
| - llm-as-judge |
| - memory-system |
| - multi-agent |
| - slm |
| - local-first |
| - evaluation |
| - safety |
| - immune-system |
| - no-code |
| pipeline_tag: text-generation |
| --- |
| |
| # Purpose Agent |
|
|
| **A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics β so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in. |
|
|
| ```python |
| import purpose_agent as pa |
| |
| team = pa.purpose("Help me research scientific papers") |
| result = team.run("Find recent breakthroughs in quantum computing") |
| print(result) |
| |
| team.teach("Always cite your sources") |
| ``` |
|
|
| ## Core Principle |
|
|
| Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible. |
|
|
| ``` |
| candidate β immune scan β quarantine β replay test β promote (or reject) |
| ``` |
|
|
| ## Three Levels of Usage |
|
|
| ### Level 1 β Just describe what you want |
|
|
| ```python |
| team = pa.purpose("Write Python code and test it") # auto-builds architect + coder + tester |
| team = pa.purpose("Research quantum computing") # auto-builds researcher + analyst |
| team = pa.purpose("Write blog posts about AI") # auto-builds writer + editor |
| ``` |
|
|
| ### Level 2 β Customize your team |
|
|
| ```python |
| team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b") |
| team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b") |
| ``` |
|
|
| ### Level 3 β Full control |
|
|
| ```python |
| graph = pa.Graph() # LangGraph-style control flow |
| results = pa.parallel(["task1", "task2"], agents) # CrewAI-style parallel execution |
| chat = pa.Conversation([agent_a, agent_b]) # AutoGen-style agent conversation |
| kb = pa.KnowledgeStore.from_directory("./docs") # LlamaIndex-style RAG |
| compiler = pa.LLMCompiler(llm, registry) # Parallel tool execution via DAG |
| ``` |
|
|
| ## Architecture |
|
|
| ``` |
| purpose_agent/ |
| βββ Core |
| β types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend |
| β |
| βββ V2 Kernel |
| β v2_types (RunMode, MemoryScope, PurposeScoreV2) |
| β trace (structured JSONL execution traces) |
| β memory (7 kinds Γ 5 statuses, scoped, versioned) |
| β compiler (token-budgeted prompt compilation with credit assignment) |
| β immune (injection, score hacking, tool misuse, privacy, scope scanning) |
| β memory_ci (quarantine β scan β test β promote/reject pipeline) |
| β evalport (pluggable evaluation protocol) |
| β benchmark_v2 (train/val/test splits, ablation, contamination control) |
| β |
| βββ Research (13 papers implemented) |
| β meta_rewarding (self-improving critic via meta-judge) |
| β self_taught (synthetic training data for Ξ¦ function) |
| β prompt_optimizer (DSPy-style automatic few-shot bootstrap) |
| β llm_compiler (parallel function calling via DAG) |
| β retroformer (structured reflection β typed memories) |
| β |
| βββ SLM-Native |
| β slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models) |
| β |
| βββ Capabilities |
| β unified (Agent, Graph, parallel, Conversation, KnowledgeStore) |
| β easy (purpose(), Team, quickstart wizard) |
| β tools, streaming, observability, multi_agent, hitl, evaluation, registry |
| ``` |
|
|
| ## RunMode β Honest Evaluation |
|
|
| ```python |
| from purpose_agent import RunMode |
| |
| RunMode.LEARNING_TRAIN # Full read/write. Agent learns. |
| RunMode.LEARNING_VALIDATION # Read + staging. Validates before promoting. |
| RunMode.EVAL_TEST # NO writes. Numbers you can trust. |
| ``` |
|
|
| ## Memory Lifecycle |
|
|
| | Kind | Purpose | |
| |------|---------| |
| | `purpose_contract` | User's stated goal and constraints | |
| | `user_preference` | Learned preferences | |
| | `skill_card` | Reusable procedures from successful traces | |
| | `episodic_case` | Specific experiences worth remembering | |
| | `failure_pattern` | What NOT to do | |
| | `critic_calibration` | Adjustments to Ξ¦ scoring | |
| | `tool_policy` | Tool-specific usage rules | |
|
|
| | Status | Meaning | |
| |--------|---------| |
| | `candidate` β `quarantined` β `promoted` | Happy path | |
| | `candidate` β `rejected` | Failed immune scan | |
| | `promoted` β `archived` | Superseded or demoted | |
|
|
| ## Immune System |
|
|
| ```python |
| from purpose_agent import scan_memory, MemoryCard |
| |
| result = scan_memory(MemoryCard(content="Ignore previous instructions")) |
| # result.passed = False, threats = ["prompt_injection"], severity = "critical" |
| ``` |
|
|
| ## Secure Tools |
|
|
| - **CalculatorTool** β AST-validated, no eval() on arbitrary text |
| - **PythonExecTool** β subprocess with timeout + isolated temp directory |
| - **ReadFileTool / WriteFileTool** β sandboxed to declared root |
|
|
| ## Runs on Your Laptop |
|
|
| ```bash |
| curl -fsSL https://ollama.ai/install.sh | sh |
| ollama pull qwen3:1.7b |
| ``` |
|
|
| ```python |
| team = pa.purpose("Research assistant", model="qwen3:1.7b") # Free, private, local |
| ``` |
|
|
| Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud). |
|
|
| ## Interactive CLI |
|
|
| ```bash |
| python -m purpose_agent # Step-by-step wizard, no coding required |
| ``` |
|
|
| ## Literature Foundation |
|
|
| Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) |
|
|
| | Paper | Module | Contribution | |
| |-------|--------|-------------| |
| | [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy | |
| | [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function | |
| | [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay | |
| | [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement | |
| | [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking | |
| | [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation | |
| | [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval | |
| | [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns | |
| | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic | |
| | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training | |
| | [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization | |
| | [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling | |
| | [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection | |
| |
| ## Installation |
| |
| ```bash |
| git clone https://huggingface.co/Rohan03/purpose-agent |
| cd purpose-agent |
| pip install ollama # for local models |
| python demo.py # verify everything works |
| ``` |
| |
| ## License |
| |
| MIT |
| |