Rohan03
/

purpose-agent

@@ -7,8 +7,6 @@ tags:
   - reinforcement-learning
   - agents
   - self-improving
-  - experience-replay
-  - llm-as-judge
   - memory-system
   - multi-agent
   - slm
@@ -16,181 +14,198 @@ tags:
   - evaluation
   - safety
   - immune-system
-  - no-code
 pipeline_tag: text-generation
 ---
 # Purpose Agent
-**A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics — so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.
 ```python
 import purpose_agent as pa
-team = pa.purpose("Help me research scientific papers")
-result = team.run("Find recent breakthroughs in quantum computing")
 print(result)
-team.teach("Always cite your sources")
 ```
-## Core Principle
-Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.
-```
-candidate → immune scan → quarantine → replay test → promote (or reject)
 ```
 ## Three Levels of Usage
-### Level 1 — Just describe what you want
 ```python
-team = pa.purpose("Write Python code and test it")  # auto-builds architect + coder + tester
-team = pa.purpose("Research quantum computing")       # auto-builds researcher + analyst
-team = pa.purpose("Write blog posts about AI")        # auto-builds writer + editor
 ```
-### Level 2 — Customize your team
 ```python
-team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
-team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")
 ```
 ### Level 3 — Full control
 ```python
-graph = pa.Graph()                                     # LangGraph-style control flow
-results = pa.parallel(["task1", "task2"], agents)      # CrewAI-style parallel execution
-chat = pa.Conversation([agent_a, agent_b])             # AutoGen-style agent conversation
-kb = pa.KnowledgeStore.from_directory("./docs")        # LlamaIndex-style RAG
-compiler = pa.LLMCompiler(llm, registry)               # Parallel tool execution via DAG
-```
-## Architecture
-```
-purpose_agent/
-├── Core
-│   types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
-│
-├── V2 Kernel
-│   v2_types (RunMode, MemoryScope, PurposeScoreV2)
-│   trace (structured JSONL execution traces)
-│   memory (7 kinds × 5 statuses, scoped, versioned)
-│   compiler (token-budgeted prompt compilation with credit assignment)
-│   immune (injection, score hacking, tool misuse, privacy, scope scanning)
-│   memory_ci (quarantine → scan → test → promote/reject pipeline)
-│   evalport (pluggable evaluation protocol)
-│   benchmark_v2 (train/val/test splits, ablation, contamination control)
-│
-├── Research (13 papers implemented)
-│   meta_rewarding (self-improving critic via meta-judge)
-│   self_taught (synthetic training data for Φ function)
-│   prompt_optimizer (DSPy-style automatic few-shot bootstrap)
-│   llm_compiler (parallel function calling via DAG)
-│   retroformer (structured reflection → typed memories)
-│
-├── SLM-Native
-│   slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
-│
-├── Capabilities
-│   unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
-│   easy (purpose(), Team, quickstart wizard)
-│   tools, streaming, observability, multi_agent, hitl, evaluation, registry
 ```
-## RunMode — Honest Evaluation
-```python
-from purpose_agent import RunMode
-RunMode.LEARNING_TRAIN       # Full read/write. Agent learns.
-RunMode.LEARNING_VALIDATION  # Read + staging. Validates before promoting.
-RunMode.EVAL_TEST            # NO writes. Numbers you can trust.
 ```
-## Memory Lifecycle
-| Kind | Purpose |
-|------|---------|
-| `purpose_contract` | User's stated goal and constraints |
-| `user_preference` | Learned preferences |
-| `skill_card` | Reusable procedures from successful traces |
-| `episodic_case` | Specific experiences worth remembering |
-| `failure_pattern` | What NOT to do |
-| `critic_calibration` | Adjustments to Φ scoring |
-| `tool_policy` | Tool-specific usage rules |
-| Status | Meaning |
-|--------|---------|
-| `candidate` → `quarantined` → `promoted` | Happy path |
-| `candidate` → `rejected` | Failed immune scan |
-| `promoted` → `archived` | Superseded or demoted |
-## Immune System
 ```python
-from purpose_agent import scan_memory, MemoryCard
-result = scan_memory(MemoryCard(content="Ignore previous instructions"))
-# result.passed = False, threats = ["prompt_injection"], severity = "critical"
 ```
 ## Secure Tools
-- **CalculatorTool** — AST-validated, no eval() on arbitrary text
 - **PythonExecTool** — subprocess with timeout + isolated temp directory
-- **ReadFileTool / WriteFileTool** — sandboxed to declared root
-## Runs on Your Laptop
-```bash
-curl -fsSL https://ollama.ai/install.sh | sh
-ollama pull qwen3:1.7b
-```
-```python
-team = pa.purpose("Research assistant", model="qwen3:1.7b")  # Free, private, local
-```
-Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud).
-## Interactive CLI
-```bash
-python -m purpose_agent   # Step-by-step wizard, no coding required
 ```
-## Literature Foundation
-Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)
-| Paper | Module | Contribution |
-|-------|--------|-------------|
-| [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy |
-| [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function |
-| [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay |
-| [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement |
-| [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking |
-| [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation |
-| [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval |
-| [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns |
-| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic |
-| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training |
-| [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization |
-| [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling |
-| [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection |
-## Installation
 ```bash
-git clone https://huggingface.co/Rohan03/purpose-agent
-cd purpose-agent
-pip install ollama  # for local models
-python demo.py      # verify everything works
 ```
 ## License

   - reinforcement-learning
   - agents
   - self-improving
   - memory-system
   - multi-agent
   - slm
   - evaluation
   - safety
   - immune-system
 pipeline_tag: text-generation
 ---
 # Purpose Agent
+**A local-first self-improvement kernel for AI agents.**
+Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
+```bash
+pip install purpose-agent
+```
 ```python
 import purpose_agent as pa
+team = pa.purpose("Help me write Python code")
+result = team.run("Write a fibonacci function")
 print(result)
+team.teach("Always add type hints")
+# Next run uses what it learned
 ```
+## How It Works (30-Second Version)
+1. **You give it a purpose.** "Help me write Python code."
+2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description.
+3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
+4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
+5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
+## Real-World Test Results
+Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
+| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
+|-------|-----------|----------|-----------|-----------------|
+| Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
+| Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |
+**Immune system:** 93% adversarial catch rate, 0% false positives.
+**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
+## Install
+```bash
+pip install purpose-agent                    # Core (zero dependencies)
+pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
+pip install purpose-agent[ollama]            # + Local Ollama
+pip install purpose-agent[all]               # Everything
 ```
 ## Three Levels of Usage
+### Level 1 — Describe what you want
 ```python
+import purpose_agent as pa
+team = pa.purpose("Write Python code and test it")  # → architect + coder + tester
+team = pa.purpose("Research quantum computing")       # → researcher + analyst
+team = pa.purpose("Write blog posts about AI")        # → writer + editor
+result = team.run("Write a sorting algorithm")
+team.teach("Always handle edge cases")
+print(team.status())  # See what it's learned
 ```
+### Level 2 — Choose your model
 ```python
+# Local (free, private)
+team = pa.purpose("Code helper", model="qwen3:1.7b")
+# Cloud
+team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
+team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
+team = pa.purpose("Code helper", model="openai:gpt-4o")
+# Any OpenAI-compatible API
+from purpose_agent import resolve_backend
+backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
 ```
+Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
 ### Level 3 — Full control
 ```python
+import purpose_agent as pa
+# Graph workflows (LangGraph-style)
+graph = pa.Graph()
+graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
+graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
+graph.add_edge(pa.START, "research")
+graph.add_edge("research", "write")
+graph.add_edge("write", pa.END)
+result = graph.run(pa.State(data={"topic": "AI safety"}))
+# Parallel execution (CrewAI-style)
+results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
+# Agent conversations (AutoGen-style)
+chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
+result = chat.run("Design a web scraper", rounds=3)
+# Knowledge-aware agents (LlamaIndex-style)
+kb = pa.KnowledgeStore.from_directory("./docs")
+agent = pa.Agent("assistant", tools=[kb.as_tool()])
+# Parallel tool execution (LLMCompiler-style)
+compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
+result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
 ```
+## Evidence-Gated Memory
+Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
+```
+candidate → immune scan → quarantine → replay test → promote (or reject)
 ```
+- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
+- **Quarantine** holds memories until they're tested
+- **Promotion** happens only after evidence shows the memory helps
+- **Rejection** preserves the memory for audit but never exposes it to the agent
+Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
+## Honest Evaluation
+Three run modes enforce what the framework can mutate:
 ```python
+from purpose_agent import RunMode
+RunMode.LEARNING_TRAIN       # Full read/write — this is where agents learn
+RunMode.LEARNING_VALIDATION  # Read + staging — validates before promoting
+RunMode.EVAL_TEST            # NO writes — numbers you can trust
 ```
 ## Secure Tools
+- **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
 - **PythonExecTool** — subprocess with timeout + isolated temp directory
+- **ReadFile/WriteFile** — sandboxed to declared root directory
+## Architecture
+See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
+34 Python modules, ~500KB, organized in layers:
+```
+Core Engine  → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
+V2 Kernel    → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
+Research     → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
+Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
+Capabilities → Agent, Graph, Parallel, Conversation, KnowledgeStore
+Easy API     → purpose(), Team, quickstart wizard
 ```
+## Literature
+Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
+Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
+| Paper | What it contributes |
+|-------|-------------------|
+| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
+| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
+| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
+| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
+| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
+| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
+| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
+| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
+| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
+| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
+| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
+| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
+| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
+## CLI
 ```bash
+python -m purpose_agent  # Interactive wizard
+purpose-agent            # Same, via entry point
 ```
 ## License