Rohan03
/

purpose-agent

@@ -1,90 +1,66 @@
----
-library_name: purpose-agent
-license: mit
-language:
-  - en
-tags:
-  - reinforcement-learning
-  - agents
-  - self-improving
-  - memory-system
-  - multi-agent
-  - slm
-  - local-first
-  - evaluation
-  - safety
-  - immune-system
-pipeline_tag: text-generation
----
-# Purpose Agent
 **A local-first self-improvement kernel for AI agents.**
-Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
 ```bash
 pip install purpose-agent
 ```
-```python
-import purpose_agent as pa
-team = pa.purpose("Help me write Python code")
-result = team.run("Write a fibonacci function")
-print(result)
-team.teach("Always add type hints")
-# Next run uses what it learned
-```
-## How It Works (30-Second Version)
-1. **You give it a purpose.** "Help me write Python code."
-2. **It builds a team.** Architect + Coder + Tester — auto-selected from your description.
-3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
-4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
-5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
-## Real-World Test Results
-Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
-| Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
-|-------|-----------|----------|-----------|-----------------|
-| Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
-| Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |
-**0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
-**Immune system:** 93% adversarial catch rate, 0% false positives.
-**Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
-## Install
-```bash
-pip install purpose-agent                    # Core (zero dependencies)
-pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
-pip install purpose-agent[ollama]            # + Local Ollama
-pip install purpose-agent[all]               # Everything
 ```
-## Three Levels of Usage
-### Level 1 — Describe what you want
 ```python
 import purpose_agent as pa
-team = pa.purpose("Write Python code and test it")  # → architect + coder + tester
-team = pa.purpose("Research quantum computing")       # → researcher + analyst
-team = pa.purpose("Write blog posts about AI")        # → writer + editor
 result = team.run("Write a sorting algorithm")
-team.teach("Always handle edge cases")
-print(team.status())  # See what it's learned
 ```
-### Level 2 — Choose your model
 ```python
 # Local (free, private)
@@ -95,16 +71,23 @@ team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-inst
 team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
 team = pa.purpose("Code helper", model="openai:gpt-4o")
-# Any OpenAI-compatible API
 from purpose_agent import resolve_backend
 backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
 ```
-Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
-### Level 3 — Full control
-Purpose Agent has its own API vocabulary — original names, not borrowed from other frameworks.
 ```python
 import purpose_agent as pa
@@ -122,108 +105,39 @@ flow.add_edge("research", "write")
 flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
 result = flow.run(initial_state)
-# ── swarm: run tasks concurrently ──
-results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])
 # ── Council: agents deliberate together ──
 council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
 result = council.run("Design a web scraper", rounds=3)
-# ── Vault: knowledge store with RAG-as-a-tool ──
-vault = pa.Vault.from_directory("./docs")
-spark = pa.Spark("assistant", tools=[vault.as_tool()])
-result = spark.run("What does the documentation say about X?")
-# ── LLMCompiler: parallel tool execution via DAG planning ──
-compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
-result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
 ```
-## API Reference (Level 3)
-| Name | What | Example |
-|------|------|---------|
-| `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
-| `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
-| `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
-| `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
-| `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
-| `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
-| `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |
-## Evidence-Gated Memory
-Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
-```
-candidate → immune scan → quarantine → replay test → promote (or reject)
-```
-- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
-- **Quarantine** holds memories until they're tested
-- **Promotion** happens only after evidence shows the memory helps
-- **Rejection** preserves the memory for audit but never exposes it to the agent
-Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
-## Honest Evaluation
-```python
-from purpose_agent import RunMode
-RunMode.LEARNING_TRAIN       # Full read/write — this is where agents learn
-RunMode.LEARNING_VALIDATION  # Read + staging — validates before promoting
-RunMode.EVAL_TEST            # NO writes — numbers you can trust
-```
-## Secure Tools
-- **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
-- **PythonExecTool** — subprocess with timeout + isolated temp directory
-- **ReadFile/WriteFile** — sandboxed to declared root directory
-## Architecture
-See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
-34 Python modules, ~500KB:
 ```
-Core Engine   → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
-V2 Kernel     → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
-Research      → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
-Breakthroughs → Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
-Capabilities  → Spark, Flow, swarm, Council, Vault
-Easy API      → purpose(), Team, quickstart wizard
-```
-## Literature
-Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
-| Paper | What it contributes |
-|-------|-------------------|
-| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
-| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
-| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
-| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
-| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
-| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
-| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
-| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
-| [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
-| [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
-| [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
-| [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
-| [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
-## CLI
 ```bash
-python -m purpose_agent  # Interactive wizard
-purpose-agent            # Same, via entry point
 ```
-## License
 MIT

+<div align="center">
+  <img src="assets/hero_animation.svg" alt="Purpose Agent Hero" width="100%">
+</div>
+# Purpose Agent v3.0
 **A local-first self-improvement kernel for AI agents.**
+Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock‑in.
 ```bash
 pip install purpose-agent
 ```
+## 🚀 What's New in v3.0?
+> [!TIP]
+> Purpose Agent v3.0 hardens the kernel for production use, focusing on security, token efficiency, and absolute execution reliability.
+- **Strict Tool Validation**: Hallucination‑proof tool calling schema. Any hallucinated arguments instantly trigger a corrective loop.
+- **O(1) Markovian Critic**: The new `delta` state‑evaluator saves tokens by evaluating *only what changed* instead of the full environment state.
+- **Popperian Falsification**: Mathematical, zero‑hallucination code scoring where assertions evaluate code correctness automatically.
+- **PEP 578 Sandboxing**: Secure, isolated execution for the Python environment.
+---
+## 🛤️ Three Tracks of Usage
+Purpose Agent scales with your needs, from a simple one‑liner to full multi‑agent orchestration.
+### Track 1: Auto‑Pilot (Describe your goal)
+Provide a purpose, and the framework automatically delegates the necessary agents (Architect, Coder, Tester, etc.) and executes.
+```mermaid
+graph LR
+    A[pa.purpose] --> B(Auto‑Team Assembly)
+    B --> C{Execution Loop}
+    C -->|Success| D[Heuristics Extracted]
+    C -->|Fail| E[Feedback Provided]
+    D --> F[Smarter Next Run]
 ```
 ```python
 import purpose_agent as pa
+team = pa.purpose("Write Python code and test it")
 result = team.run("Write a sorting algorithm")
+team.teach("Always handle edge cases") # Injects into procedural memory
+print(team.status())
 ```
+### Track 2: Bring Your Own Model
+Swap in local models for free, private execution, or connect to OpenRouter, Groq, or OpenAI for state‑of‑the‑art reasoning.
+```mermaid
+graph TD
+    A[Your Application] --> B(purpose_agent)
+    B --> C[Ollama local]
+    B --> D[OpenRouter / Cloud]
+    B --> E[HuggingFace Hub]
+```
 ```python
 # Local (free, private)
 team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
 team = pa.purpose("Code helper", model="openai:gpt-4o")
+# Any OpenAI‑compatible API
 from purpose_agent import resolve_backend
 backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
 ```
+### Track 3: Full Multi‑Agent Control
+Take total control over the orchestration with `Spark` (Agent), `Flow` (Workflow DAG), `Council` (Deliberation), and `Vault` (RAG).
+```mermaid
+graph LR
+    A((BEGIN)) --> B[Research Spark]
+    B --> C[Write Spark]
+    C --> D{Review?}
+    D -- Pass --> E((DONE))
+    D -- Fail --> B
+```
 ```python
 import purpose_agent as pa
 flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
 result = flow.run(initial_state)
 # ── Council: agents deliberate together ──
 council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
 result = council.run("Design a web scraper", rounds=3)
 ```
+---
+## 🛡️ Evidence‑Gated Memory & Immune System
+Agents don't just accumulate knowledge blindly. Every new memory goes through a secure pipeline:
+```mermaid
+flowchart LR
+    A[Candidate Memory] --> B{Immune Scan}
+    B -- Block --> C[Quarantine]
+    B -- Pass --> D[Replay Test]
+    D -- Helps --> E[Promoted to SOP]
+    D -- Hurts --> F[Rejected]
 ```
+- **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse.
+- **Quarantine** holds memories until they're tested.
+- **Promotion** happens only after empirical evidence shows the memory improves reward.
+## 📦 Install
 ```bash
+pip install purpose-agent                    # Core (zero dependencies)
+pip install purpose-agent[openai]            # + OpenAI / Groq / OpenRouter
+pip install purpose-agent[ollama]            # + Local Ollama
+pip install purpose-agent[all]               # Everything
 ```
+## 📖 License
 MIT