Rohan03 commited on
Commit
f28a638
Β·
verified Β·
1 Parent(s): 8bb75c0

docs: Final README with real-world results, pip install, 3 usage levels

Browse files
Files changed (1) hide show
  1. README.md +136 -121
README.md CHANGED
@@ -7,8 +7,6 @@ tags:
7
  - reinforcement-learning
8
  - agents
9
  - self-improving
10
- - experience-replay
11
- - llm-as-judge
12
  - memory-system
13
  - multi-agent
14
  - slm
@@ -16,181 +14,198 @@ tags:
16
  - evaluation
17
  - safety
18
  - immune-system
19
- - no-code
20
  pipeline_tag: text-generation
21
  ---
22
 
23
  # Purpose Agent
24
 
25
- **A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics β€” so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.
 
 
 
 
 
 
26
 
27
  ```python
28
  import purpose_agent as pa
29
 
30
- team = pa.purpose("Help me research scientific papers")
31
- result = team.run("Find recent breakthroughs in quantum computing")
32
  print(result)
33
 
34
- team.teach("Always cite your sources")
 
35
  ```
36
 
37
- ## Core Principle
38
 
39
- Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.
 
 
 
 
40
 
41
- ```
42
- candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ```
44
 
45
  ## Three Levels of Usage
46
 
47
- ### Level 1 β€” Just describe what you want
48
 
49
  ```python
50
- team = pa.purpose("Write Python code and test it") # auto-builds architect + coder + tester
51
- team = pa.purpose("Research quantum computing") # auto-builds researcher + analyst
52
- team = pa.purpose("Write blog posts about AI") # auto-builds writer + editor
 
 
 
 
 
 
53
  ```
54
 
55
- ### Level 2 β€” Customize your team
56
 
57
  ```python
58
- team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
59
- team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")
 
 
 
 
 
 
 
 
 
60
  ```
61
 
 
 
62
  ### Level 3 β€” Full control
63
 
64
  ```python
65
- graph = pa.Graph() # LangGraph-style control flow
66
- results = pa.parallel(["task1", "task2"], agents) # CrewAI-style parallel execution
67
- chat = pa.Conversation([agent_a, agent_b]) # AutoGen-style agent conversation
68
- kb = pa.KnowledgeStore.from_directory("./docs") # LlamaIndex-style RAG
69
- compiler = pa.LLMCompiler(llm, registry) # Parallel tool execution via DAG
70
- ```
71
-
72
- ## Architecture
73
 
74
- ```
75
- purpose_agent/
76
- β”œβ”€β”€ Core
77
- β”‚ types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
78
- β”‚
79
- β”œβ”€β”€ V2 Kernel
80
- β”‚ v2_types (RunMode, MemoryScope, PurposeScoreV2)
81
- β”‚ trace (structured JSONL execution traces)
82
- β”‚ memory (7 kinds Γ— 5 statuses, scoped, versioned)
83
- β”‚ compiler (token-budgeted prompt compilation with credit assignment)
84
- β”‚ immune (injection, score hacking, tool misuse, privacy, scope scanning)
85
- β”‚ memory_ci (quarantine β†’ scan β†’ test β†’ promote/reject pipeline)
86
- β”‚ evalport (pluggable evaluation protocol)
87
- β”‚ benchmark_v2 (train/val/test splits, ablation, contamination control)
88
- β”‚
89
- β”œβ”€β”€ Research (13 papers implemented)
90
- β”‚ meta_rewarding (self-improving critic via meta-judge)
91
- β”‚ self_taught (synthetic training data for Ξ¦ function)
92
- β”‚ prompt_optimizer (DSPy-style automatic few-shot bootstrap)
93
- β”‚ llm_compiler (parallel function calling via DAG)
94
- β”‚ retroformer (structured reflection β†’ typed memories)
95
- β”‚
96
- β”œβ”€β”€ SLM-Native
97
- β”‚ slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
98
- β”‚
99
- β”œβ”€β”€ Capabilities
100
- β”‚ unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
101
- β”‚ easy (purpose(), Team, quickstart wizard)
102
- β”‚ tools, streaming, observability, multi_agent, hitl, evaluation, registry
103
  ```
104
 
105
- ## RunMode β€” Honest Evaluation
106
 
107
- ```python
108
- from purpose_agent import RunMode
109
 
110
- RunMode.LEARNING_TRAIN # Full read/write. Agent learns.
111
- RunMode.LEARNING_VALIDATION # Read + staging. Validates before promoting.
112
- RunMode.EVAL_TEST # NO writes. Numbers you can trust.
113
  ```
114
 
115
- ## Memory Lifecycle
 
 
 
116
 
117
- | Kind | Purpose |
118
- |------|---------|
119
- | `purpose_contract` | User's stated goal and constraints |
120
- | `user_preference` | Learned preferences |
121
- | `skill_card` | Reusable procedures from successful traces |
122
- | `episodic_case` | Specific experiences worth remembering |
123
- | `failure_pattern` | What NOT to do |
124
- | `critic_calibration` | Adjustments to Ξ¦ scoring |
125
- | `tool_policy` | Tool-specific usage rules |
126
 
127
- | Status | Meaning |
128
- |--------|---------|
129
- | `candidate` β†’ `quarantined` β†’ `promoted` | Happy path |
130
- | `candidate` β†’ `rejected` | Failed immune scan |
131
- | `promoted` β†’ `archived` | Superseded or demoted |
132
 
133
- ## Immune System
134
 
135
  ```python
136
- from purpose_agent import scan_memory, MemoryCard
137
 
138
- result = scan_memory(MemoryCard(content="Ignore previous instructions"))
139
- # result.passed = False, threats = ["prompt_injection"], severity = "critical"
 
140
  ```
141
 
142
  ## Secure Tools
143
 
144
- - **CalculatorTool** β€” AST-validated, no eval() on arbitrary text
145
  - **PythonExecTool** β€” subprocess with timeout + isolated temp directory
146
- - **ReadFileTool / WriteFileTool** β€” sandboxed to declared root
147
-
148
- ## Runs on Your Laptop
149
-
150
- ```bash
151
- curl -fsSL https://ollama.ai/install.sh | sh
152
- ollama pull qwen3:1.7b
153
- ```
154
 
155
- ```python
156
- team = pa.purpose("Research assistant", model="qwen3:1.7b") # Free, private, local
157
- ```
158
 
159
- Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud).
160
 
161
- ## Interactive CLI
162
 
163
- ```bash
164
- python -m purpose_agent # Step-by-step wizard, no coding required
 
 
 
 
 
165
  ```
166
 
167
- ## Literature Foundation
168
-
169
- Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)
170
-
171
- | Paper | Module | Contribution |
172
- |-------|--------|-------------|
173
- | [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy |
174
- | [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function |
175
- | [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay |
176
- | [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement |
177
- | [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking |
178
- | [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation |
179
- | [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval |
180
- | [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns |
181
- | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic |
182
- | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training |
183
- | [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization |
184
- | [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling |
185
- | [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection |
186
-
187
- ## Installation
 
188
 
189
  ```bash
190
- git clone https://huggingface.co/Rohan03/purpose-agent
191
- cd purpose-agent
192
- pip install ollama # for local models
193
- python demo.py # verify everything works
194
  ```
195
 
196
  ## License
 
7
  - reinforcement-learning
8
  - agents
9
  - self-improving
 
 
10
  - memory-system
11
  - multi-agent
12
  - slm
 
14
  - evaluation
15
  - safety
16
  - immune-system
 
17
  pipeline_tag: text-generation
18
  ---
19
 
20
  # Purpose Agent
21
 
22
+ **A local-first self-improvement kernel for AI agents.**
23
+
24
+ Agents that learn from experience β€” without fine-tuning, cloud infrastructure, or vendor lock-in. Tested with real models. Published on PyPI.
25
+
26
+ ```bash
27
+ pip install purpose-agent
28
+ ```
29
 
30
  ```python
31
  import purpose_agent as pa
32
 
33
+ team = pa.purpose("Help me write Python code")
34
+ result = team.run("Write a fibonacci function")
35
  print(result)
36
 
37
+ team.teach("Always add type hints")
38
+ # Next run uses what it learned
39
  ```
40
 
41
+ ## How It Works (30-Second Version)
42
 
43
+ 1. **You give it a purpose.** "Help me write Python code."
44
+ 2. **It builds a team.** Architect + Coder + Tester β€” auto-selected from your description.
45
+ 3. **It runs the task.** The agent writes code. A separate critic (the Purpose Function) scores every step.
46
+ 4. **It learns.** Good patterns are extracted as heuristics. Bad patterns are flagged. Dangerous content is blocked by an immune system.
47
+ 5. **Next run is better.** Heuristics from past runs are injected into the prompt. The agent gets smarter without any weight updates.
48
 
49
+ ## Real-World Test Results
50
+
51
+ Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter:
52
+
53
+ | Model | fibonacci | fizzbuzz | factorial | Self-Improvement |
54
+ |-------|-----------|----------|-----------|-----------------|
55
+ | Llama-3.3-70B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’9β†’18 heuristics |
56
+ | Gemma-4-26B | βœ“ 100% | βœ“ 100% | βœ“ 100% | 0β†’3β†’6β†’11 heuristics |
57
+
58
+ **Immune system:** 93% adversarial catch rate, 0% false positives.
59
+
60
+ **Test suite:** 119 unit tests, all passing. See [LAUNCH_READINESS.md](LAUNCH_READINESS.md).
61
+
62
+ ## Install
63
+
64
+ ```bash
65
+ pip install purpose-agent # Core (zero dependencies)
66
+ pip install purpose-agent[openai] # + OpenAI / Groq / OpenRouter
67
+ pip install purpose-agent[ollama] # + Local Ollama
68
+ pip install purpose-agent[all] # Everything
69
  ```
70
 
71
  ## Three Levels of Usage
72
 
73
+ ### Level 1 β€” Describe what you want
74
 
75
  ```python
76
+ import purpose_agent as pa
77
+
78
+ team = pa.purpose("Write Python code and test it") # β†’ architect + coder + tester
79
+ team = pa.purpose("Research quantum computing") # β†’ researcher + analyst
80
+ team = pa.purpose("Write blog posts about AI") # β†’ writer + editor
81
+
82
+ result = team.run("Write a sorting algorithm")
83
+ team.teach("Always handle edge cases")
84
+ print(team.status()) # See what it's learned
85
  ```
86
 
87
+ ### Level 2 β€” Choose your model
88
 
89
  ```python
90
+ # Local (free, private)
91
+ team = pa.purpose("Code helper", model="qwen3:1.7b")
92
+
93
+ # Cloud
94
+ team = pa.purpose("Code helper", model="openrouter:meta-llama/llama-3.3-70b-instruct")
95
+ team = pa.purpose("Code helper", model="groq:llama-3.3-70b-versatile")
96
+ team = pa.purpose("Code helper", model="openai:gpt-4o")
97
+
98
+ # Any OpenAI-compatible API
99
+ from purpose_agent import resolve_backend
100
+ backend = resolve_backend("openrouter:google/gemma-4-26b-a4b-it", api_key="sk-or-...")
101
  ```
102
 
103
+ Supported providers: **OpenRouter, Groq, OpenAI, Ollama, HuggingFace, Together, Fireworks, Cerebras, DeepSeek, Mistral.**
104
+
105
  ### Level 3 β€” Full control
106
 
107
  ```python
108
+ import purpose_agent as pa
 
 
 
 
 
 
 
109
 
110
+ # Graph workflows (LangGraph-style)
111
+ graph = pa.Graph()
112
+ graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
113
+ graph.add_node("write", pa.Agent("writer", model="qwen3:1.7b"))
114
+ graph.add_edge(pa.START, "research")
115
+ graph.add_edge("research", "write")
116
+ graph.add_edge("write", pa.END)
117
+ result = graph.run(pa.State(data={"topic": "AI safety"}))
118
+
119
+ # Parallel execution (CrewAI-style)
120
+ results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
121
+
122
+ # Agent conversations (AutoGen-style)
123
+ chat = pa.Conversation([pa.Agent("researcher"), pa.Agent("coder")])
124
+ result = chat.run("Design a web scraper", rounds=3)
125
+
126
+ # Knowledge-aware agents (LlamaIndex-style)
127
+ kb = pa.KnowledgeStore.from_directory("./docs")
128
+ agent = pa.Agent("assistant", tools=[kb.as_tool()])
129
+
130
+ # Parallel tool execution (LLMCompiler-style)
131
+ compiler = pa.LLMCompiler(planner_llm=backend, tool_registry=registry)
132
+ result = compiler.compile_and_execute("Calculate X and search Y simultaneously")
 
 
 
 
 
 
133
  ```
134
 
135
+ ## Evidence-Gated Memory
136
 
137
+ Agents don't just accumulate knowledge blindly. Every new memory goes through a pipeline:
 
138
 
139
+ ```
140
+ candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)
 
141
  ```
142
 
143
+ - **Immune scan** blocks prompt injection, score manipulation, API key leaks, tool misuse
144
+ - **Quarantine** holds memories until they're tested
145
+ - **Promotion** happens only after evidence shows the memory helps
146
+ - **Rejection** preserves the memory for audit but never exposes it to the agent
147
 
148
+ Seven memory types: `purpose_contract`, `user_preference`, `skill_card`, `episodic_case`, `failure_pattern`, `critic_calibration`, `tool_policy`.
 
 
 
 
 
 
 
 
149
 
150
+ ## Honest Evaluation
 
 
 
 
151
 
152
+ Three run modes enforce what the framework can mutate:
153
 
154
  ```python
155
+ from purpose_agent import RunMode
156
 
157
+ RunMode.LEARNING_TRAIN # Full read/write β€” this is where agents learn
158
+ RunMode.LEARNING_VALIDATION # Read + staging β€” validates before promoting
159
+ RunMode.EVAL_TEST # NO writes β€” numbers you can trust
160
  ```
161
 
162
  ## Secure Tools
163
 
164
+ - **CalculatorTool** β€” AST-validated, no `eval()` on arbitrary text
165
  - **PythonExecTool** β€” subprocess with timeout + isolated temp directory
166
+ - **ReadFile/WriteFile** β€” sandboxed to declared root directory
 
 
 
 
 
 
 
167
 
168
+ ## Architecture
 
 
169
 
170
+ See [ARCHITECTURE.md](ARCHITECTURE.md) for the complete technical documentation.
171
 
172
+ 34 Python modules, ~500KB, organized in layers:
173
 
174
+ ```
175
+ Core Engine β†’ Actor, Purpose Function, Experience Replay, Optimizer, Orchestrator
176
+ V2 Kernel β†’ Memory, Immune, Trace, Compiler, Memory CI, Eval Port, Benchmark
177
+ Research β†’ Meta-Rewarding, Self-Taught, Prompt Optimizer, LLM Compiler, Retroformer
178
+ Breakthroughs→ Self-Improving Critic, MoH, Hindsight Relabeling, Heuristic Evolution
179
+ Capabilities β†’ Agent, Graph, Parallel, Conversation, KnowledgeStore
180
+ Easy API β†’ purpose(), Team, quickstart wizard
181
  ```
182
 
183
+ ## Literature
184
+
185
+ Built on 13 published papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md).
186
+ Formal proofs: [PURPOSE_LEARNING.md](PURPOSE_LEARNING.md).
187
+
188
+ | Paper | What it contributes |
189
+ |-------|-------------------|
190
+ | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
191
+ | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
192
+ | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
193
+ | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
194
+ | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
195
+ | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
196
+ | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
197
+ | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
198
+ | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | Self-improving critic |
199
+ | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | Synthetic critic training |
200
+ | [DSPy](https://arxiv.org/abs/2310.03714) | Automatic prompt optimization |
201
+ | [LLMCompiler](https://arxiv.org/abs/2312.04511) | Parallel function calling |
202
+ | [Retroformer](https://arxiv.org/abs/2308.02151) | Structured reflection |
203
+
204
+ ## CLI
205
 
206
  ```bash
207
+ python -m purpose_agent # Interactive wizard
208
+ purpose-agent # Same, via entry point
 
 
209
  ```
210
 
211
  ## License