Rohan03 commited on
Commit
485ddc5
·
1 Parent(s): 36d2671

Update README with interactive hero, mermaid tracks, and v3 features

Browse files
Files changed (2) hide show
  1. .gitignore +1 -0
  2. README.md +68 -154
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ dist/
README.md CHANGED
@@ -1,90 +1,66 @@
1
- ---
2
- library_name: purpose-agent
3
- license: mit
4
- language:
5
- - en
6
- tags:
7
- - reinforcement-learning
8
- - agents
9
- - self-improving
10
- - memory-system
11
- - multi-agent
12
- - slm
13
- - local-first
14
- - evaluation
15
- - safety
16
- - immune-system
17
- pipeline_tag: text-generation
18
- ---
19
 
20
- # Purpose Agent
21
 
22
  **A local-first self-improvement kernel for AI agents.**
23
 
24
- Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
25
 
26
  ```bash
27
  pip install purpose-agent
28
  ```
29
 
30
- ```python
31
- import purpose_agent as pa
32
-
33
- team = pa.purpose("Help me write Python code")
34
- result = team.run("Write a fibonacci function")
35
- print(result)
36
-
37
- team.teach("Always add type hints")
38
- # Next run uses what it learned
39
- ```
40
 
41
- ## How It Works (30-Second Version)
 
42
 
43
- 1. **You give it a purpose.** "Help me write Python code."
44
- 2. **It builds a team.** Architect + Coder + Tester auto-selected from your description.
45
- 3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
46
- 4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
47
- 5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
48
 
49
- ## Real-World Test Results
50
 
51
- Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
52
 
53
- | Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
54
- |-------|-----------|----------|-----------|-----------------|
55
- | Llama-3.3-70B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→9→18 heuristics |
56
- | Gemma-4-26B | ✓ 100% | ✓ 100% | ✓ 100% | 0→3→6→11 heuristics |
57
 
58
- **0-day production test:** 19/19 pass on Llama-3.3-70B across all 3 usage levels.
59
- **Immune system:** 93% adversarial catch rate, 0% false positives.
60
- **Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
61
 
62
- ## Install
63
 
64
- ```bash
65
- pip install purpose-agent # Core (zero dependencies)
66
- pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
67
- pip install purpose-agent[ollama] # + Local Ollama
68
- pip install purpose-agent[all] # Everything
 
 
69
  ```
70
 
71
- ## Three Levels of Usage
72
-
73
- ### Level 1 — Describe what you want
74
-
75
  ```python
76
  import purpose_agent as pa
77
 
78
- team = pa.purpose("Write Python code and test it") # → architect + coder + tester
79
- team = pa.purpose("Research quantum computing") # → researcher + analyst
80
- team = pa.purpose("Write blog posts about AI") # → writer + editor
81
-
82
  result = team.run("Write a sorting algorithm")
83
- team.teach("Always handle edge cases")
84
- print(team.status()) # See what it's learned
85
  ```
86
 
87
- ### Level 2 Choose your model
 
 
 
 
 
 
 
 
 
 
88
 
89
  ```python
90
  # Local (free, private)
@@ -95,16 +71,23 @@ team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-inst
95
  team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
96
  team = pa.purpose("Code helper", model="openai:gpt-4o")
97
 
98
- # Any OpenAI-compatible API
99
  from purpose_agent import resolve_backend
100
  backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
101
  ```
102
 
103
- Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
104
 
105
- ### Level 3 Full control
106
 
107
- Purpose Agent has its own API vocabulary — original names, not borrowed from other frameworks.
 
 
 
 
 
 
 
108
 
109
  ```python
110
  import purpose_agent as pa
@@ -122,108 +105,39 @@ flow.add_edge("research", "write")
122
  flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
123
  result = flow.run(initial_state)
124
 
125
- # ── swarm: run tasks concurrently ──
126
- results = pa.swarm(["task 1", "task 2", "task 3"], agents=[spark_a, spark_b, spark_c])
127
-
128
  # ── Council: agents deliberate together ──
129
  council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
130
  result = council.run("Design a web scraper", rounds=3)
131
-
132
- # ── Vault: knowledge store with RAG-as-a-tool ──
133
- vault = pa.Vault.from_directory("./docs")
134
- spark = pa.Spark("assistant", tools=[vault.as_tool()])
135
- result = spark.run("What does the documentation say about X?")
136
-
137
- # ── LLMCompiler: parallel tool execution via DAG planning ──
138
- compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
139
- result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
140
  ```
141
 
142
- ## API Reference (Level 3)
143
-
144
- | Name | What | Example |
145
- |------|------|---------|
146
- | `pa.Spark(name, model, tools)` | Create an intelligent agent | `pa.Spark("coder", model="qwen3:1.7b")` |
147
- | `pa.Flow()` | Workflow engine with nodes and edges | `flow.add_node("step", handler)` |
148
- | `pa.swarm(tasks, agents)` | Run tasks concurrently | `pa.swarm(["a","b"], [s1, s2])` |
149
- | `pa.Council(agents)` | Agent deliberation rounds | `council.run("topic", rounds=3)` |
150
- | `pa.Vault.from_texts(list)` | Knowledge store for RAG | `vault.query("search term")` |
151
- | `pa.BEGIN` | Flow start node | `flow.add_edge(pa.BEGIN, "first")` |
152
- | `pa.DONE_SIGNAL` | Flow end node | `flow.add_edge("last", pa.DONE_SIGNAL)` |
153
-
154
- ## Evidence-Gated Memory
155
-
156
- Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
157
-
158
- ```
159
- candidate → immune scan → quarantine → replay test → promote (or reject)
160
- ```
161
-
162
- - **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
163
- - **Quarantine** holds memories until they're tested
164
- - **Promotion** happens only after evidence shows the memory helps
165
- - **Rejection** preserves the memory for audit but never exposes it to the agent
166
-
167
- Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
168
-
169
- ## Honest Evaluation
170
-
171
- ```python
172
- from purpose_agent import RunMode
173
-
174
- RunMode.LEARNING_TRAIN # Full read/write — this is where agents learn
175
- RunMode.LEARNING_VALIDATION # Read + staging — validates before promoting
176
- RunMode.EVAL_TEST # NO writes — numbers you can trust
177
- ```
178
-
179
- ## Secure Tools
180
-
181
- - **CalculatorTool** — AST-validated, no `eval()` on arbitrary text
182
- - **PythonExecTool** — subprocess with timeout + isolated temp directory
183
- - **ReadFile/WriteFile** — sandboxed to declared root directory
184
-
185
- ## Architecture
186
 
187
- See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
188
 
189
- 34 Python modules, ~500KB:
190
 
 
 
 
 
 
 
 
191
  ```
192
- Core Engine → Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
193
- V2 Kernel → Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
194
- Research → Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
195
- Breakthroughs → Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
196
- Capabilities → Spark, Flow, swarm, Council, Vault
197
- Easy API → purpose(), Team, quickstart wizard
198
- ```
199
-
200
- ## Literature
201
 
202
- Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md). Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
 
 
203
 
204
- | Paper | What it contributes |
205
- |-------|-------------------|
206
- | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
207
- | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
208
- | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
209
- | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
210
- | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
211
- | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
212
- | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
213
- | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
214
- | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
215
- | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
216
- | [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
217
- | [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
218
- | [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
219
-
220
- ## CLI
221
 
222
  ```bash
223
- python -m purpose_agent # Interactive wizard
224
- purpose-agent # Same, via entry point
 
 
225
  ```
226
 
227
- ## License
228
 
229
  MIT
 
1
+ <div align="center">
2
+ <img src="assets/hero_animation.svg" alt="Purpose Agent Hero" width="100%">
3
+ </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
+ # Purpose Agent v3.0
6
 
7
  **A local-first self-improvement kernel for AI agents.**
8
 
9
+ Agents that learn from experience — without fine-tuning, cloud infrastructure, or vendor lockin.
10
 
11
  ```bash
12
  pip install purpose-agent
13
  ```
14
 
15
+ ## 🚀 What's New in v3.0?
 
 
 
 
 
 
 
 
 
16
 
17
+ > [!TIP]
18
+ > Purpose Agent v3.0 hardens the kernel for production use, focusing on security, token efficiency, and absolute execution reliability.
19
 
20
+ - **Strict Tool Validation**: Hallucination‑proof tool calling schema. Any hallucinated arguments instantly trigger a corrective loop.
21
+ - **O(1) Markovian Critic**: The new `delta` state‑evaluator saves tokens by evaluating *only what changed* instead of the full environment state.
22
+ - **Popperian Falsification**: Mathematical, zero‑hallucination code scoring where assertions evaluate code correctness automatically.
23
+ - **PEP 578 Sandboxing**: Secure, isolated execution for the Python environment.
 
24
 
25
+ ---
26
 
27
+ ## 🛤️ Three Tracks of Usage
28
 
29
+ Purpose Agent scales with your needs, from a simple one‑liner to full multi‑agent orchestration.
 
 
 
30
 
31
+ ### Track 1: Auto‑Pilot (Describe your goal)
 
 
32
 
33
+ Provide a purpose, and the framework automatically delegates the necessary agents (Architect, Coder, Tester, etc.) and executes.
34
 
35
+ ```mermaid
36
+ graph LR
37
+ A[pa.purpose] --> B(Auto‑Team Assembly)
38
+ B --> C{Execution Loop}
39
+ C -->|Success| D[Heuristics Extracted]
40
+ C -->|Fail| E[Feedback Provided]
41
+ D --> F[Smarter Next Run]
42
  ```
43
 
 
 
 
 
44
  ```python
45
  import purpose_agent as pa
46
 
47
+ team = pa.purpose("Write Python code and test it")
 
 
 
48
  result = team.run("Write a sorting algorithm")
49
+ team.teach("Always handle edge cases") # Injects into procedural memory
50
+ print(team.status())
51
  ```
52
 
53
+ ### Track 2: Bring Your Own Model
54
+
55
+ Swap in local models for free, private execution, or connect to OpenRouter, Groq, or OpenAI for state‑of‑the‑art reasoning.
56
+
57
+ ```mermaid
58
+ graph TD
59
+ A[Your Application] --> B(purpose_agent)
60
+ B --> C[Ollama local]
61
+ B --> D[OpenRouter / Cloud]
62
+ B --> E[HuggingFace Hub]
63
+ ```
64
 
65
  ```python
66
  # Local (free, private)
 
71
  team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
72
  team = pa.purpose("Code helper", model="openai:gpt-4o")
73
 
74
+ # Any OpenAIcompatible API
75
  from purpose_agent import resolve_backend
76
  backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
77
  ```
78
 
79
+ ### Track 3: Full Multi‑Agent Control
80
 
81
+ Take total control over the orchestration with `Spark` (Agent), `Flow` (Workflow DAG), `Council` (Deliberation), and `Vault` (RAG).
82
 
83
+ ```mermaid
84
+ graph LR
85
+ A((BEGIN)) --> B[Research Spark]
86
+ B --> C[Write Spark]
87
+ C --> D{Review?}
88
+ D -- Pass --> E((DONE))
89
+ D -- Fail --> B
90
+ ```
91
 
92
  ```python
93
  import purpose_agent as pa
 
105
  flow.add_conditional_edge("write", review_fn, {"pass": pa.DONE_SIGNAL, "retry": "research"})
106
  result = flow.run(initial_state)
107
 
 
 
 
108
  # ── Council: agents deliberate together ──
109
  council = pa.Council([pa.Spark("researcher"), pa.Spark("coder"), pa.Spark("reviewer")])
110
  result = council.run("Design a web scraper", rounds=3)
 
 
 
 
 
 
 
 
 
111
  ```
112
 
113
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
 
115
+ ## 🛡️ Evidence‑Gated Memory & Immune System
116
 
117
+ Agents don't just accumulate knowledge blindly. Every new memory goes through a secure pipeline:
118
 
119
+ ```mermaid
120
+ flowchart LR
121
+ A[Candidate Memory] --> B{Immune Scan}
122
+ B -- Block --> C[Quarantine]
123
+ B -- Pass --> D[Replay Test]
124
+ D -- Helps --> E[Promoted to SOP]
125
+ D -- Hurts --> F[Rejected]
126
  ```
 
 
 
 
 
 
 
 
 
127
 
128
+ - **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse.
129
+ - **Quarantine** holds memories until they're tested.
130
+ - **Promotion** happens only after empirical evidence shows the memory improves reward.
131
 
132
+ ## 📦 Install
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  ```bash
135
+ pip install purpose-agent # Core (zero dependencies)
136
+ pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
137
+ pip install purpose-agent[ollama] # + Local Ollama
138
+ pip install purpose-agent[all] # Everything
139
  ```
140
 
141
+ ## 📖 License
142
 
143
  MIT