Rohan03
/

purpose-agent

@@ -9,199 +9,190 @@ tags:
   - self-improving
   - experience-replay
   - llm-as-judge
-  - state-value-evaluation
-  - memory-augmented
   - multi-agent
   - slm
-  - small-language-models
-  - human-in-the-loop
-  - streaming
-  - tools
   - evaluation
-  - ollama
-  - local-models
   - no-code
-  - easy-to-use
 pipeline_tag: text-generation
 ---
 # Purpose Agent
-**Build self-improving AI agent teams with just a purpose.**
-No PhD required. No infrastructure costs. Runs on your laptop.
 ```python
 import purpose_agent as pa
-# One line. That's all you need.
-team = pa.purpose("Help me research and summarize scientific papers")
-# Give it tasks. It gets smarter every time.
 result = team.run("Find recent breakthroughs in quantum computing")
 print(result)
-# Teach it your preferences
 team.teach("Always cite your sources")
-team.teach("Keep summaries under 200 words")
-# Check what it's learned
-print(team.status())
 ```
-## Three Levels of Usage
-**Pick your level. You can always go deeper later.**
-### Level 1 — Beginner (no technical knowledge needed)
-```python
-import purpose_agent as pa
-# Describe what you want. The framework builds the right team.
-team = pa.purpose("Write Python code and test it")
-result = team.run("Create a function that calculates fibonacci numbers")
-print(result)
-# It auto-detects the best team:
-# "Write code"     → architect + coder + tester
-# "Research X"     → researcher + analyst
-# "Write blog"     → writer + editor
-# "Analyze data"   → analyst + reporter
-# "Help me"        → general assistant
 ```
-### Level 2 — Intermediate (customize your team)
 ```python
-import purpose_agent as pa
-# Build a custom team
-team = pa.Team.build(
-    purpose="Customer support assistant",
-    agents=["greeter", "resolver", "escalator"],
-    model="qwen3:1.7b",  # Free local model
-)
-result = team.run("Customer says: I can't log in to my account")
-# Add knowledge from your docs
-team = pa.purpose(
-    "Answer questions about our product",
-    knowledge="./docs/",           # Load all files from a folder
-    model="qwen3:1.7b",
-)
-result = team.ask("What is our refund policy?")
 ```
-### Level 3 — Advanced (full control)
 ```python
-import purpose_agent as pa
-# Graph workflows (like LangGraph)
-graph = pa.Graph()
-graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
-graph.add_node("write", pa.Agent("writer", model="phi4-mini"))
-graph.add_edge(pa.START, "research")
-graph.add_conditional_edge("write", review_fn, {"pass": pa.END, "fail": "research"})
-result = graph.run(initial_state)
-# Parallel execution (like CrewAI)
-results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
-# Agent conversations (like AutoGen)
-chat = pa.Conversation([researcher, coder, reviewer])
-result = chat.run("Design a web scraper", rounds=5)
-# Knowledge-aware agents (like LlamaIndex)
-kb = pa.KnowledgeStore.from_directory("./docs")
-agent = pa.Agent("assistant", tools=[kb.as_tool()])
-# Human-in-the-loop (like LangGraph)
-hitl = pa.HITLOrchestrator(orch, input_handler=pa.CLIInputHandler(),
-                           approve_actions=True, review_scores=True)
 ```
-## What Makes This Different
-**The only framework where agents actually learn from experience.**
-Every other framework (LangChain, CrewAI, AutoGen) runs the same way every time. Purpose Agent gets smarter with each task via the **Φ self-improvement loop**:
-```
-Task 1: Agent struggles, takes 12 steps → Φ evaluates → learns heuristics
-Task 5: Agent uses learned patterns, takes 8 steps → learns more
-Task 10: Agent is efficient, takes 5 steps → keeps refining
 ```
-Plus it absorbs the best of every competing framework:
-| You want... | Others say use... | Purpose Agent gives you... |
-|---|---|---|
-| **Control** (graphs, conditions, loops) | LangGraph | `pa.Graph()` — same power, with self-improvement |
-| **Speed** (parallel execution) | CrewAI | `pa.parallel()` — real threads, not fake async |
-| **Agents talking** | AutoGen | `pa.Conversation()` — with Φ-scored turns |
-| **Plug-and-play** | OpenAI Agents SDK | `pa.purpose()` — even simpler, one function |
-| **Knowledge** (RAG) | LlamaIndex | `pa.KnowledgeStore` — RAG as a tool |
-| **Self-improvement** | Nobody | **Only Purpose Agent** |
-## Runs on Your Laptop (Free, Private)
 ```bash
-# Install Ollama (one-time)
 curl -fsSL https://ollama.ai/install.sh | sh
-ollama pull qwen3:1.7b   # 1.7B params, runs on CPU
-# That's it. No API keys. No cloud. No cost.
 ```
 ```python
-team = pa.purpose("Research assistant", model="qwen3:1.7b")
 ```
-Also works with cloud models:
-```python
-team = pa.purpose("Research assistant", model="gpt-4o")         # OpenAI
-team = pa.purpose("Research assistant", model="Qwen/Qwen3-32B") # HuggingFace
-```
 ## Interactive CLI
 ```bash
-python -m purpose_agent
 ```
-Walks you through setup step-by-step. No coding required.
 ## Installation
 ```bash
 git clone https://huggingface.co/Rohan03/purpose-agent
 cd purpose-agent
-# For local models (recommended)
-pip install ollama
-# Run demo (no API keys needed)
-python demo.py
 ```
-## Literature Foundation
-Built on 8 published papers — every design decision has empirical backing.
-See [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) for the full research trace.
-| Paper | What it contributes |
-|-------|-------------------|
-| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
-| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
-| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
-| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
-| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
-| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
-| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
-| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
 ## License
 MIT

   - self-improving
   - experience-replay
   - llm-as-judge
+  - memory-system
   - multi-agent
   - slm
+  - local-first
   - evaluation
+  - safety
+  - immune-system
   - no-code
 pipeline_tag: text-generation
 ---
 # Purpose Agent
+**A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics — so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.
 ```python
 import purpose_agent as pa
+team = pa.purpose("Help me research scientific papers")
 result = team.run("Find recent breakthroughs in quantum computing")
 print(result)
 team.teach("Always cite your sources")
 ```
+## Core Principle
+Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.
+```
+candidate → immune scan → quarantine → replay test → promote (or reject)
+```
+## Three Levels of Usage
+### Level 1 — Just describe what you want
+```python
+team = pa.purpose("Write Python code and test it")  # auto-builds architect + coder + tester
+team = pa.purpose("Research quantum computing")       # auto-builds researcher + analyst
+team = pa.purpose("Write blog posts about AI")        # auto-builds writer + editor
 ```
+### Level 2 — Customize your team
 ```python
+team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
+team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")
 ```
+### Level 3 — Full control
 ```python
+graph = pa.Graph()                                     # LangGraph-style control flow
+results = pa.parallel(["task1", "task2"], agents)      # CrewAI-style parallel execution
+chat = pa.Conversation([agent_a, agent_b])             # AutoGen-style agent conversation
+kb = pa.KnowledgeStore.from_directory("./docs")        # LlamaIndex-style RAG
+compiler = pa.LLMCompiler(llm, registry)               # Parallel tool execution via DAG
+```
+## Architecture
+```
+purpose_agent/
+├── Core
+│   types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
+│
+├── V2 Kernel
+│   v2_types (RunMode, MemoryScope, PurposeScoreV2)
+│   trace (structured JSONL execution traces)
+│   memory (7 kinds × 5 statuses, scoped, versioned)
+│   compiler (token-budgeted prompt compilation with credit assignment)
+│   immune (injection, score hacking, tool misuse, privacy, scope scanning)
+│   memory_ci (quarantine → scan → test → promote/reject pipeline)
+│   evalport (pluggable evaluation protocol)
+│   benchmark_v2 (train/val/test splits, ablation, contamination control)
+│
+├── Research (13 papers implemented)
+│   meta_rewarding (self-improving critic via meta-judge)
+│   self_taught (synthetic training data for Φ function)
+│   prompt_optimizer (DSPy-style automatic few-shot bootstrap)
+│   llm_compiler (parallel function calling via DAG)
+│   retroformer (structured reflection → typed memories)
+│
+├── SLM-Native
+│   slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
+│
+├── Capabilities
+│   unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
+│   easy (purpose(), Team, quickstart wizard)
+│   tools, streaming, observability, multi_agent, hitl, evaluation, registry
+```
+## RunMode — Honest Evaluation
+```python
+from purpose_agent import RunMode
+RunMode.LEARNING_TRAIN       # Full read/write. Agent learns.
+RunMode.LEARNING_VALIDATION  # Read + staging. Validates before promoting.
+RunMode.EVAL_TEST            # NO writes. Numbers you can trust.
 ```
+## Memory Lifecycle
+| Kind | Purpose |
+|------|---------|
+| `purpose_contract` | User's stated goal and constraints |
+| `user_preference` | Learned preferences |
+| `skill_card` | Reusable procedures from successful traces |
+| `episodic_case` | Specific experiences worth remembering |
+| `failure_pattern` | What NOT to do |
+| `critic_calibration` | Adjustments to Φ scoring |
+| `tool_policy` | Tool-specific usage rules |
+| Status | Meaning |
+|--------|---------|
+| `candidate` → `quarantined` → `promoted` | Happy path |
+| `candidate` → `rejected` | Failed immune scan |
+| `promoted` → `archived` | Superseded or demoted |
+## Immune System
+```python
+from purpose_agent import scan_memory, MemoryCard
+result = scan_memory(MemoryCard(content="Ignore previous instructions"))
+# result.passed = False, threats = ["prompt_injection"], severity = "critical"
 ```
+## Secure Tools
+- **CalculatorTool** — AST-validated, no eval() on arbitrary text
+- **PythonExecTool** — subprocess with timeout + isolated temp directory
+- **ReadFileTool / WriteFileTool** — sandboxed to declared root
+## Runs on Your Laptop
 ```bash
 curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull qwen3:1.7b
 ```
 ```python
+team = pa.purpose("Research assistant", model="qwen3:1.7b")  # Free, private, local
 ```
+Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud).
 ## Interactive CLI
 ```bash
+python -m purpose_agent   # Step-by-step wizard, no coding required
 ```
+## Literature Foundation
+Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)
+| Paper | Module | Contribution |
+|-------|--------|-------------|
+| [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy |
+| [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function |
+| [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay |
+| [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement |
+| [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking |
+| [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation |
+| [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval |
+| [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns |
+| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic |
+| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training |
+| [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization |
+| [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling |
+| [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection |
 ## Installation
 ```bash
 git clone https://huggingface.co/Rohan03/purpose-agent
 cd purpose-agent
+pip install ollama  # for local models
+python demo.py      # verify everything works
 ```
 ## License
 MIT