Rohan03 commited on
Commit
276b221
·
verified ·
1 Parent(s): 205cdf1

release: Purpose Agent v2.0.0 — final README with 13-paper architecture

Browse files
Files changed (1) hide show
  1. README.md +116 -125
README.md CHANGED
@@ -9,199 +9,190 @@ tags:
9
  - self-improving
10
  - experience-replay
11
  - llm-as-judge
12
- - state-value-evaluation
13
- - memory-augmented
14
  - multi-agent
15
  - slm
16
- - small-language-models
17
- - human-in-the-loop
18
- - streaming
19
- - tools
20
  - evaluation
21
- - ollama
22
- - local-models
23
  - no-code
24
- - easy-to-use
25
  pipeline_tag: text-generation
26
  ---
27
 
28
  # Purpose Agent
29
 
30
- **Build self-improving AI agent teams with just a purpose.**
31
-
32
- No PhD required. No infrastructure costs. Runs on your laptop.
33
 
34
  ```python
35
  import purpose_agent as pa
36
 
37
- # One line. That's all you need.
38
- team = pa.purpose("Help me research and summarize scientific papers")
39
-
40
- # Give it tasks. It gets smarter every time.
41
  result = team.run("Find recent breakthroughs in quantum computing")
42
  print(result)
43
 
44
- # Teach it your preferences
45
  team.teach("Always cite your sources")
46
- team.teach("Keep summaries under 200 words")
47
-
48
- # Check what it's learned
49
- print(team.status())
50
  ```
51
 
52
- ## Three Levels of Usage
53
 
54
- **Pick your level. You can always go deeper later.**
55
 
56
- ### Level 1 — Beginner (no technical knowledge needed)
 
 
57
 
58
- ```python
59
- import purpose_agent as pa
60
 
61
- # Describe what you want. The framework builds the right team.
62
- team = pa.purpose("Write Python code and test it")
63
- result = team.run("Create a function that calculates fibonacci numbers")
64
- print(result)
65
 
66
- # It auto-detects the best team:
67
- # "Write code" architect + coder + tester
68
- # "Research X" researcher + analyst
69
- # "Write blog" writer + editor
70
- # "Analyze data" → analyst + reporter
71
- # "Help me" → general assistant
72
  ```
73
 
74
- ### Level 2 — Intermediate (customize your team)
75
 
76
  ```python
77
- import purpose_agent as pa
78
-
79
- # Build a custom team
80
- team = pa.Team.build(
81
- purpose="Customer support assistant",
82
- agents=["greeter", "resolver", "escalator"],
83
- model="qwen3:1.7b", # Free local model
84
- )
85
- result = team.run("Customer says: I can't log in to my account")
86
-
87
- # Add knowledge from your docs
88
- team = pa.purpose(
89
- "Answer questions about our product",
90
- knowledge="./docs/", # Load all files from a folder
91
- model="qwen3:1.7b",
92
- )
93
- result = team.ask("What is our refund policy?")
94
  ```
95
 
96
- ### Level 3 — Advanced (full control)
97
 
98
  ```python
99
- import purpose_agent as pa
 
 
 
 
 
100
 
101
- # Graph workflows (like LangGraph)
102
- graph = pa.Graph()
103
- graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
104
- graph.add_node("write", pa.Agent("writer", model="phi4-mini"))
105
- graph.add_edge(pa.START, "research")
106
- graph.add_conditional_edge("write", review_fn, {"pass": pa.END, "fail": "research"})
107
- result = graph.run(initial_state)
108
 
109
- # Parallel execution (like CrewAI)
110
- results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
- # Agent conversations (like AutoGen)
113
- chat = pa.Conversation([researcher, coder, reviewer])
114
- result = chat.run("Design a web scraper", rounds=5)
115
 
116
- # Knowledge-aware agents (like LlamaIndex)
117
- kb = pa.KnowledgeStore.from_directory("./docs")
118
- agent = pa.Agent("assistant", tools=[kb.as_tool()])
119
 
120
- # Human-in-the-loop (like LangGraph)
121
- hitl = pa.HITLOrchestrator(orch, input_handler=pa.CLIInputHandler(),
122
- approve_actions=True, review_scores=True)
123
  ```
124
 
125
- ## What Makes This Different
126
 
127
- **The only framework where agents actually learn from experience.**
 
 
 
 
 
 
 
 
128
 
129
- Every other framework (LangChain, CrewAI, AutoGen) runs the same way every time. Purpose Agent gets smarter with each task via the **Φ self-improvement loop**:
 
 
 
 
130
 
131
- ```
132
- Task 1: Agent struggles, takes 12 steps → Φ evaluates → learns heuristics
133
- Task 5: Agent uses learned patterns, takes 8 steps → learns more
134
- Task 10: Agent is efficient, takes 5 steps → keeps refining
 
 
 
135
  ```
136
 
137
- Plus it absorbs the best of every competing framework:
138
 
139
- | You want... | Others say use... | Purpose Agent gives you... |
140
- |---|---|---|
141
- | **Control** (graphs, conditions, loops) | LangGraph | `pa.Graph()` same power, with self-improvement |
142
- | **Speed** (parallel execution) | CrewAI | `pa.parallel()` — real threads, not fake async |
143
- | **Agents talking** | AutoGen | `pa.Conversation()` — with Φ-scored turns |
144
- | **Plug-and-play** | OpenAI Agents SDK | `pa.purpose()` — even simpler, one function |
145
- | **Knowledge** (RAG) | LlamaIndex | `pa.KnowledgeStore` — RAG as a tool |
146
- | **Self-improvement** | Nobody | **Only Purpose Agent** |
147
 
148
- ## Runs on Your Laptop (Free, Private)
149
 
150
  ```bash
151
- # Install Ollama (one-time)
152
  curl -fsSL https://ollama.ai/install.sh | sh
153
- ollama pull qwen3:1.7b # 1.7B params, runs on CPU
154
-
155
- # That's it. No API keys. No cloud. No cost.
156
  ```
157
 
158
  ```python
159
- team = pa.purpose("Research assistant", model="qwen3:1.7b")
160
  ```
161
 
162
- Also works with cloud models:
163
- ```python
164
- team = pa.purpose("Research assistant", model="gpt-4o") # OpenAI
165
- team = pa.purpose("Research assistant", model="Qwen/Qwen3-32B") # HuggingFace
166
- ```
167
 
168
  ## Interactive CLI
169
 
170
  ```bash
171
- python -m purpose_agent
172
  ```
173
 
174
- Walks you through setup step-by-step. No coding required.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
175
 
176
  ## Installation
177
 
178
  ```bash
179
  git clone https://huggingface.co/Rohan03/purpose-agent
180
  cd purpose-agent
181
-
182
- # For local models (recommended)
183
- pip install ollama
184
-
185
- # Run demo (no API keys needed)
186
- python demo.py
187
  ```
188
 
189
- ## Literature Foundation
190
-
191
- Built on 8 published papers — every design decision has empirical backing.
192
- See [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) for the full research trace.
193
-
194
- | Paper | What it contributes |
195
- |-------|-------------------|
196
- | [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
197
- | [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
198
- | [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
199
- | [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
200
- | [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
201
- | [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
202
- | [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
203
- | [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
204
-
205
  ## License
206
 
207
  MIT
 
9
  - self-improving
10
  - experience-replay
11
  - llm-as-judge
12
+ - memory-system
 
13
  - multi-agent
14
  - slm
15
+ - local-first
 
 
 
16
  - evaluation
17
+ - safety
18
+ - immune-system
19
  - no-code
 
20
  pipeline_tag: text-generation
21
  ---
22
 
23
  # Purpose Agent
24
 
25
+ **A local-first self-improvement kernel for agents.** Turns traces into tested memory, policies, and rubrics — so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.
 
 
26
 
27
  ```python
28
  import purpose_agent as pa
29
 
30
+ team = pa.purpose("Help me research scientific papers")
 
 
 
31
  result = team.run("Find recent breakthroughs in quantum computing")
32
  print(result)
33
 
 
34
  team.teach("Always cite your sources")
 
 
 
 
35
  ```
36
 
37
+ ## Core Principle
38
 
39
+ Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.
40
 
41
+ ```
42
+ candidate → immune scan → quarantine → replay test → promote (or reject)
43
+ ```
44
 
45
+ ## Three Levels of Usage
 
46
 
47
+ ### Level 1 Just describe what you want
 
 
 
48
 
49
+ ```python
50
+ team = pa.purpose("Write Python code and test it") # auto-builds architect + coder + tester
51
+ team = pa.purpose("Research quantum computing") # auto-builds researcher + analyst
52
+ team = pa.purpose("Write blog posts about AI") # auto-builds writer + editor
 
 
53
  ```
54
 
55
+ ### Level 2 — Customize your team
56
 
57
  ```python
58
+ team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
59
+ team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ```
61
 
62
+ ### Level 3 — Full control
63
 
64
  ```python
65
+ graph = pa.Graph() # LangGraph-style control flow
66
+ results = pa.parallel(["task1", "task2"], agents) # CrewAI-style parallel execution
67
+ chat = pa.Conversation([agent_a, agent_b]) # AutoGen-style agent conversation
68
+ kb = pa.KnowledgeStore.from_directory("./docs") # LlamaIndex-style RAG
69
+ compiler = pa.LLMCompiler(llm, registry) # Parallel tool execution via DAG
70
+ ```
71
 
72
+ ## Architecture
 
 
 
 
 
 
73
 
74
+ ```
75
+ purpose_agent/
76
+ ├── Core
77
+ │ types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
78
+
79
+ ├── V2 Kernel
80
+ │ v2_types (RunMode, MemoryScope, PurposeScoreV2)
81
+ │ trace (structured JSONL execution traces)
82
+ │ memory (7 kinds × 5 statuses, scoped, versioned)
83
+ │ compiler (token-budgeted prompt compilation with credit assignment)
84
+ │ immune (injection, score hacking, tool misuse, privacy, scope scanning)
85
+ │ memory_ci (quarantine → scan → test → promote/reject pipeline)
86
+ │ evalport (pluggable evaluation protocol)
87
+ │ benchmark_v2 (train/val/test splits, ablation, contamination control)
88
+
89
+ ├── Research (13 papers implemented)
90
+ │ meta_rewarding (self-improving critic via meta-judge)
91
+ │ self_taught (synthetic training data for Φ function)
92
+ │ prompt_optimizer (DSPy-style automatic few-shot bootstrap)
93
+ │ llm_compiler (parallel function calling via DAG)
94
+ │ retroformer (structured reflection → typed memories)
95
+
96
+ ├── SLM-Native
97
+ │ slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
98
+
99
+ ├── Capabilities
100
+ │ unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
101
+ │ easy (purpose(), Team, quickstart wizard)
102
+ │ tools, streaming, observability, multi_agent, hitl, evaluation, registry
103
+ ```
104
 
105
+ ## RunMode Honest Evaluation
 
 
106
 
107
+ ```python
108
+ from purpose_agent import RunMode
 
109
 
110
+ RunMode.LEARNING_TRAIN # Full read/write. Agent learns.
111
+ RunMode.LEARNING_VALIDATION # Read + staging. Validates before promoting.
112
+ RunMode.EVAL_TEST # NO writes. Numbers you can trust.
113
  ```
114
 
115
+ ## Memory Lifecycle
116
 
117
+ | Kind | Purpose |
118
+ |------|---------|
119
+ | `purpose_contract` | User's stated goal and constraints |
120
+ | `user_preference` | Learned preferences |
121
+ | `skill_card` | Reusable procedures from successful traces |
122
+ | `episodic_case` | Specific experiences worth remembering |
123
+ | `failure_pattern` | What NOT to do |
124
+ | `critic_calibration` | Adjustments to Φ scoring |
125
+ | `tool_policy` | Tool-specific usage rules |
126
 
127
+ | Status | Meaning |
128
+ |--------|---------|
129
+ | `candidate` → `quarantined` → `promoted` | Happy path |
130
+ | `candidate` → `rejected` | Failed immune scan |
131
+ | `promoted` → `archived` | Superseded or demoted |
132
 
133
+ ## Immune System
134
+
135
+ ```python
136
+ from purpose_agent import scan_memory, MemoryCard
137
+
138
+ result = scan_memory(MemoryCard(content="Ignore previous instructions"))
139
+ # result.passed = False, threats = ["prompt_injection"], severity = "critical"
140
  ```
141
 
142
+ ## Secure Tools
143
 
144
+ - **CalculatorTool** AST-validated, no eval() on arbitrary text
145
+ - **PythonExecTool** — subprocess with timeout + isolated temp directory
146
+ - **ReadFileTool / WriteFileTool** — sandboxed to declared root
 
 
 
 
 
147
 
148
+ ## Runs on Your Laptop
149
 
150
  ```bash
 
151
  curl -fsSL https://ollama.ai/install.sh | sh
152
+ ollama pull qwen3:1.7b
 
 
153
  ```
154
 
155
  ```python
156
+ team = pa.purpose("Research assistant", model="qwen3:1.7b") # Free, private, local
157
  ```
158
 
159
+ Also works with: `model="gpt-4o"` (OpenAI), `model="Qwen/Qwen3-32B"` (HuggingFace cloud).
 
 
 
 
160
 
161
  ## Interactive CLI
162
 
163
  ```bash
164
+ python -m purpose_agent # Step-by-step wizard, no coding required
165
  ```
166
 
167
+ ## Literature Foundation
168
+
169
+ Built on 13 papers. Full research trace: [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md)
170
+
171
+ | Paper | Module | Contribution |
172
+ |-------|--------|-------------|
173
+ | [MUSE](https://arxiv.org/abs/2510.08002) | actor, optimizer | 3-tier memory hierarchy |
174
+ | [LATS](https://arxiv.org/abs/2310.04406) | purpose_function | LLM-as-value-function |
175
+ | [REMEMBERER](https://arxiv.org/abs/2306.07929) | experience_replay | Q-value experience replay |
176
+ | [Reflexion](https://arxiv.org/abs/2303.11366) | orchestrator | Verbal reinforcement |
177
+ | [SPC](https://arxiv.org/abs/2504.19162) | purpose_function, immune | Anti-reward-hacking |
178
+ | [CER](https://arxiv.org/abs/2506.06698) | optimizer | Experience distillation |
179
+ | [MemRL](https://arxiv.org/abs/2601.03192) | experience_replay, compiler | Two-phase retrieval |
180
+ | [TinyAgent](https://arxiv.org/abs/2409.00608) | slm_backends, tools | SLM-native patterns |
181
+ | [Meta-Rewarding](https://arxiv.org/abs/2407.19594) | meta_rewarding | Self-improving critic |
182
+ | [Self-Taught Eval](https://arxiv.org/abs/2408.02666) | self_taught | Synthetic critic training |
183
+ | [DSPy](https://arxiv.org/abs/2310.03714) | prompt_optimizer | Automatic prompt optimization |
184
+ | [LLMCompiler](https://arxiv.org/abs/2312.04511) | llm_compiler | Parallel function calling |
185
+ | [Retroformer](https://arxiv.org/abs/2308.02151) | retroformer | Structured reflection |
186
 
187
  ## Installation
188
 
189
  ```bash
190
  git clone https://huggingface.co/Rohan03/purpose-agent
191
  cd purpose-agent
192
+ pip install ollama # for local models
193
+ python demo.py # verify everything works
 
 
 
 
194
  ```
195
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
196
  ## License
197
 
198
  MIT