feat: Rewrite README — lead with easy API, 3 usage levels, complete docs
Browse files
README.md
CHANGED
|
@@ -11,269 +11,197 @@ tags:
|
|
| 11 |
- llm-as-judge
|
| 12 |
- state-value-evaluation
|
| 13 |
- memory-augmented
|
| 14 |
-
-
|
| 15 |
-
- orchestration
|
| 16 |
-
- modular
|
| 17 |
- slm
|
| 18 |
- small-language-models
|
| 19 |
-
- multi-agent
|
| 20 |
- human-in-the-loop
|
| 21 |
- streaming
|
| 22 |
- tools
|
| 23 |
- evaluation
|
| 24 |
- ollama
|
| 25 |
- local-models
|
|
|
|
|
|
|
| 26 |
pipeline_tag: text-generation
|
| 27 |
---
|
| 28 |
|
| 29 |
-
# Purpose Agent
|
| 30 |
-
|
| 31 |
-
**The world's first SLM-native self-improving agentic framework.**
|
| 32 |
-
|
| 33 |
-
Works with both **Small Language Models** (0.6B–3B params, local, $0 cost) and **Large Language Models** (cloud APIs) with equal efficiency. Agents learn from experience via a Purpose Function Φ(s) — no fine-tuning needed.
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
| Feature | Purpose Agent | LangChain | LangGraph | CrewAI | AutoGen | smolagents |
|
| 38 |
-
|---|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 39 |
-
| **Self-Improvement** | ✅ Φ(s) + experience replay + heuristic distillation | ❌ | ❌ | ❌ | ❌ | ❌ |
|
| 40 |
-
| **SLM-Native** | ✅ Grammar-constrained JSON, prompt compression, Tool RAG | ❌ | ❌ | ❌ | ❌ | ⚠️ |
|
| 41 |
-
| **Anti-Reward-Hacking** | ✅ 7 strict rules + cache consistency + anomaly detection | ❌ | ❌ | ❌ | ❌ | ❌ |
|
| 42 |
-
| **3-Tier Memory** | ✅ Strategic/Procedural/Tool with Q-value retrieval | ❌ | ⚠️ | ⚠️ | ❌ | ❌ |
|
| 43 |
-
| **Multi-Agent with Shared Learning** | ✅ Agents learn from each other | ❌ | ⚠️ | ✅ | ✅ | ⚠️ |
|
| 44 |
-
| **Human Φ Override** | ✅ Humans teach the critic → permanent learning | ❌ | ⚠️ | ❌ | ❌ | ❌ |
|
| 45 |
-
| **Streaming** | ✅ Event + token streaming | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
|
| 46 |
-
| **Tool Framework** | ✅ Schema, validation, retry, Tool RAG | ✅ | ✅ | ✅ | ✅ | ✅ |
|
| 47 |
-
| **Cost Tracking** | ✅ Per-call token + USD tracking | ⚠️ | ⚠️ | ❌ | ❌ | ❌ |
|
| 48 |
-
| **Benchmark Harness** | ✅ Improvement curve tracking | ❌ | ❌ | ❌ | ❌ | ❌ |
|
| 49 |
-
| **Lightweight** | ✅ ~150KB, stdlib only | ❌ | ❌ | ⚠️ | ⚠️ | ✅ |
|
| 50 |
-
| **Literature-Grounded** | ✅ 8 papers implemented | ❌ | ❌ | ❌ | ❌ | ❌ |
|
| 51 |
-
|
| 52 |
-
## Architecture
|
| 53 |
|
| 54 |
-
|
| 55 |
-
purpose_agent/
|
| 56 |
-
├── types.py # Core data types
|
| 57 |
-
├── llm_backend.py # Cloud LLM backends (HF, OpenAI, Mock)
|
| 58 |
-
├── slm_backends.py # 🆕 SLM backends (Ollama, llama-cpp, prompt compression)
|
| 59 |
-
├── actor.py # ReAct agent with 3-tier memory
|
| 60 |
-
├── purpose_function.py # Non-hackable Φ(s) critic
|
| 61 |
-
├── experience_replay.py # Two-phase retrieval (similarity + Q-value)
|
| 62 |
-
├── optimizer.py # Trajectory → heuristic distillation
|
| 63 |
-
├── orchestrator.py # Main loop
|
| 64 |
-
├── streaming.py # 🆕 Async engine + event streaming
|
| 65 |
-
├── tools.py # 🆕 Tool framework + built-in tools + Tool RAG
|
| 66 |
-
├── observability.py # 🆕 Cost tracking, callbacks, metrics
|
| 67 |
-
├── multi_agent.py # 🆕 Agent teams with shared learning
|
| 68 |
-
├── hitl.py # 🆕 Human-in-the-loop + checkpointing
|
| 69 |
-
└── evaluation.py # 🆕 Benchmark runner + improvement curves
|
| 70 |
-
```
|
| 71 |
-
|
| 72 |
-
## Quick Start — Local SLM (Zero Cost)
|
| 73 |
-
|
| 74 |
-
```bash
|
| 75 |
-
# 1. Install Ollama
|
| 76 |
-
curl -fsSL https://ollama.ai/install.sh | sh
|
| 77 |
-
|
| 78 |
-
# 2. Pull a small model (1.7B params, runs on any laptop)
|
| 79 |
-
ollama pull qwen3:1.7b
|
| 80 |
-
|
| 81 |
-
# 3. Run your agent
|
| 82 |
-
python my_agent.py
|
| 83 |
-
```
|
| 84 |
|
| 85 |
```python
|
| 86 |
-
|
| 87 |
-
Orchestrator, OllamaBackend, State, Environment, Action,
|
| 88 |
-
CalculatorTool, ToolRegistry,
|
| 89 |
-
)
|
| 90 |
|
| 91 |
-
#
|
| 92 |
-
|
| 93 |
|
| 94 |
-
#
|
| 95 |
-
|
| 96 |
-
|
| 97 |
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
print(result.summary())
|
| 105 |
```
|
| 106 |
|
| 107 |
-
##
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
-
from purpose_agent import create_slm_backend
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
```
|
| 120 |
|
| 121 |
-
##
|
| 122 |
-
|
| 123 |
-
Agents learn from each other — when one agent solves a problem, all benefit:
|
| 124 |
|
| 125 |
```python
|
| 126 |
-
|
| 127 |
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
coder = AgentSpec(
|
| 134 |
-
name="coder", role="Write and debug code",
|
| 135 |
-
model=OllamaBackend(model="phi4-mini"), # Better SLM for code
|
| 136 |
-
expertise_keywords=["code", "program", "debug"],
|
| 137 |
)
|
|
|
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
|
|
|
| 143 |
)
|
| 144 |
-
|
| 145 |
-
# Auto-delegates to the best agent
|
| 146 |
-
result = team.run_task(purpose="Search for Python sorting algorithms")
|
| 147 |
-
print(team.get_learning_report()) # See shared knowledge
|
| 148 |
```
|
| 149 |
|
| 150 |
-
##
|
| 151 |
-
|
| 152 |
-
Humans can override Φ scores → the agent permanently learns preferences:
|
| 153 |
|
| 154 |
```python
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
)
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
```
|
| 172 |
|
| 173 |
-
##
|
| 174 |
|
| 175 |
-
|
| 176 |
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
async for event in async_orch.run_task_stream(purpose="..."):
|
| 184 |
-
if event.event_type == "action":
|
| 185 |
-
print(f"🤖 {event.data['name']}: {event.data['thought'][:100]}")
|
| 186 |
-
elif event.event_type == "score":
|
| 187 |
-
print(f"📊 Φ: {event.data['phi_before']:.1f} → {event.data['phi_after']:.1f}")
|
| 188 |
-
|
| 189 |
-
asyncio.run(main())
|
| 190 |
```
|
| 191 |
|
| 192 |
-
|
| 193 |
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
# Tool RAG for SLMs (only load relevant tools into prompt)
|
| 204 |
-
registry = ToolRegistry()
|
| 205 |
-
registry.register(CalculatorTool())
|
| 206 |
-
registry.register(PythonExecTool())
|
| 207 |
-
registry.register(search)
|
| 208 |
-
|
| 209 |
-
relevant = registry.get_relevant_tools("compute 2+2", top_k=2)
|
| 210 |
-
# → [CalculatorTool, PythonExecTool] (search excluded — saves tokens)
|
| 211 |
-
```
|
| 212 |
|
| 213 |
-
##
|
| 214 |
|
| 215 |
-
```
|
| 216 |
-
|
|
|
|
|
|
|
| 217 |
|
| 218 |
-
|
| 219 |
-
tracker.record(prompt_tokens=500, completion_tokens=200)
|
| 220 |
-
print(tracker.summary())
|
| 221 |
-
# → {'model': 'qwen3:1.7b', 'total_tokens': 700, 'estimated_cost_usd': 0.000005}
|
| 222 |
```
|
| 223 |
|
| 224 |
-
|
|
|
|
|
|
|
| 225 |
|
|
|
|
| 226 |
```python
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
runner = BenchmarkRunner(orchestrator=orch)
|
| 230 |
-
tasks = [
|
| 231 |
-
BenchmarkTask(id="t1", purpose="Find treasure", initial_state=...),
|
| 232 |
-
BenchmarkTask(id="t2", purpose="Solve puzzle", initial_state=...),
|
| 233 |
-
]
|
| 234 |
-
|
| 235 |
-
result = runner.run(tasks, iterations=10, name="MazeTest")
|
| 236 |
-
print(result.summary())
|
| 237 |
-
# Iteration Success Rate Avg Φ Avg Steps Avg Reward
|
| 238 |
-
# -----------------------------------------------------------------
|
| 239 |
-
# 1 40.0% 4.20 12.0 3.20
|
| 240 |
-
# 5 70.0% 6.80 8.0 6.50
|
| 241 |
-
# 10 90.0% 8.50 6.0 8.90
|
| 242 |
-
# Improvement: 40.0% → 90.0% (+50.0%)
|
| 243 |
-
|
| 244 |
-
result.save("results/benchmark.json")
|
| 245 |
```
|
| 246 |
|
| 247 |
-
##
|
| 248 |
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
|
| 255 |
-
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
|
| 256 |
-
| [CER](https://arxiv.org/abs/2506.06698) | Contextual experience distillation |
|
| 257 |
-
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
|
| 258 |
-
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native agent patterns |
|
| 259 |
|
| 260 |
## Installation
|
| 261 |
|
| 262 |
```bash
|
| 263 |
-
# Core (no dependencies beyond stdlib)
|
| 264 |
git clone https://huggingface.co/Rohan03/purpose-agent
|
| 265 |
cd purpose-agent
|
| 266 |
|
| 267 |
-
# For local
|
| 268 |
pip install ollama
|
| 269 |
|
| 270 |
-
# For cloud LLMs
|
| 271 |
-
pip install huggingface_hub # or: pip install openai
|
| 272 |
-
|
| 273 |
# Run demo (no API keys needed)
|
| 274 |
python demo.py
|
| 275 |
```
|
| 276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
## License
|
| 278 |
|
| 279 |
MIT
|
|
|
|
| 11 |
- llm-as-judge
|
| 12 |
- state-value-evaluation
|
| 13 |
- memory-augmented
|
| 14 |
+
- multi-agent
|
|
|
|
|
|
|
| 15 |
- slm
|
| 16 |
- small-language-models
|
|
|
|
| 17 |
- human-in-the-loop
|
| 18 |
- streaming
|
| 19 |
- tools
|
| 20 |
- evaluation
|
| 21 |
- ollama
|
| 22 |
- local-models
|
| 23 |
+
- no-code
|
| 24 |
+
- easy-to-use
|
| 25 |
pipeline_tag: text-generation
|
| 26 |
---
|
| 27 |
|
| 28 |
+
# Purpose Agent
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
**Build self-improving AI agent teams with just a purpose.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
No PhD required. No infrastructure costs. Runs on your laptop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
```python
|
| 35 |
+
import purpose_agent as pa
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
# One line. That's all you need.
|
| 38 |
+
team = pa.purpose("Help me research and summarize scientific papers")
|
| 39 |
|
| 40 |
+
# Give it tasks. It gets smarter every time.
|
| 41 |
+
result = team.run("Find recent breakthroughs in quantum computing")
|
| 42 |
+
print(result)
|
| 43 |
|
| 44 |
+
# Teach it your preferences
|
| 45 |
+
team.teach("Always cite your sources")
|
| 46 |
+
team.teach("Keep summaries under 200 words")
|
| 47 |
|
| 48 |
+
# Check what it's learned
|
| 49 |
+
print(team.status())
|
|
|
|
| 50 |
```
|
| 51 |
|
| 52 |
+
## Three Levels of Usage
|
| 53 |
|
| 54 |
+
**Pick your level. You can always go deeper later.**
|
| 55 |
|
| 56 |
+
### Level 1 — Beginner (no technical knowledge needed)
|
|
|
|
| 57 |
|
| 58 |
+
```python
|
| 59 |
+
import purpose_agent as pa
|
| 60 |
+
|
| 61 |
+
# Describe what you want. The framework builds the right team.
|
| 62 |
+
team = pa.purpose("Write Python code and test it")
|
| 63 |
+
result = team.run("Create a function that calculates fibonacci numbers")
|
| 64 |
+
print(result)
|
| 65 |
+
|
| 66 |
+
# It auto-detects the best team:
|
| 67 |
+
# "Write code" → architect + coder + tester
|
| 68 |
+
# "Research X" → researcher + analyst
|
| 69 |
+
# "Write blog" → writer + editor
|
| 70 |
+
# "Analyze data" → analyst + reporter
|
| 71 |
+
# "Help me" → general assistant
|
| 72 |
```
|
| 73 |
|
| 74 |
+
### Level 2 — Intermediate (customize your team)
|
|
|
|
|
|
|
| 75 |
|
| 76 |
```python
|
| 77 |
+
import purpose_agent as pa
|
| 78 |
|
| 79 |
+
# Build a custom team
|
| 80 |
+
team = pa.Team.build(
|
| 81 |
+
purpose="Customer support assistant",
|
| 82 |
+
agents=["greeter", "resolver", "escalator"],
|
| 83 |
+
model="qwen3:1.7b", # Free local model
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
)
|
| 85 |
+
result = team.run("Customer says: I can't log in to my account")
|
| 86 |
|
| 87 |
+
# Add knowledge from your docs
|
| 88 |
+
team = pa.purpose(
|
| 89 |
+
"Answer questions about our product",
|
| 90 |
+
knowledge="./docs/", # Load all files from a folder
|
| 91 |
+
model="qwen3:1.7b",
|
| 92 |
)
|
| 93 |
+
result = team.ask("What is our refund policy?")
|
|
|
|
|
|
|
|
|
|
| 94 |
```
|
| 95 |
|
| 96 |
+
### Level 3 — Advanced (full control)
|
|
|
|
|
|
|
| 97 |
|
| 98 |
```python
|
| 99 |
+
import purpose_agent as pa
|
| 100 |
+
|
| 101 |
+
# Graph workflows (like LangGraph)
|
| 102 |
+
graph = pa.Graph()
|
| 103 |
+
graph.add_node("research", pa.Agent("researcher", model="qwen3:1.7b"))
|
| 104 |
+
graph.add_node("write", pa.Agent("writer", model="phi4-mini"))
|
| 105 |
+
graph.add_edge(pa.START, "research")
|
| 106 |
+
graph.add_conditional_edge("write", review_fn, {"pass": pa.END, "fail": "research"})
|
| 107 |
+
result = graph.run(initial_state)
|
| 108 |
+
|
| 109 |
+
# Parallel execution (like CrewAI)
|
| 110 |
+
results = pa.parallel(["task 1", "task 2", "task 3"], agents=[a1, a2, a3])
|
| 111 |
+
|
| 112 |
+
# Agent conversations (like AutoGen)
|
| 113 |
+
chat = pa.Conversation([researcher, coder, reviewer])
|
| 114 |
+
result = chat.run("Design a web scraper", rounds=5)
|
| 115 |
+
|
| 116 |
+
# Knowledge-aware agents (like LlamaIndex)
|
| 117 |
+
kb = pa.KnowledgeStore.from_directory("./docs")
|
| 118 |
+
agent = pa.Agent("assistant", tools=[kb.as_tool()])
|
| 119 |
+
|
| 120 |
+
# Human-in-the-loop (like LangGraph)
|
| 121 |
+
hitl = pa.HITLOrchestrator(orch, input_handler=pa.CLIInputHandler(),
|
| 122 |
+
approve_actions=True, review_scores=True)
|
| 123 |
```
|
| 124 |
|
| 125 |
+
## What Makes This Different
|
| 126 |
|
| 127 |
+
**The only framework where agents actually learn from experience.**
|
| 128 |
|
| 129 |
+
Every other framework (LangChain, CrewAI, AutoGen) runs the same way every time. Purpose Agent gets smarter with each task via the **Φ self-improvement loop**:
|
| 130 |
+
|
| 131 |
+
```
|
| 132 |
+
Task 1: Agent struggles, takes 12 steps → Φ evaluates → learns heuristics
|
| 133 |
+
Task 5: Agent uses learned patterns, takes 8 steps → learns more
|
| 134 |
+
Task 10: Agent is efficient, takes 5 steps → keeps refining
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
```
|
| 136 |
|
| 137 |
+
Plus it absorbs the best of every competing framework:
|
| 138 |
|
| 139 |
+
| You want... | Others say use... | Purpose Agent gives you... |
|
| 140 |
+
|---|---|---|
|
| 141 |
+
| **Control** (graphs, conditions, loops) | LangGraph | `pa.Graph()` — same power, with self-improvement |
|
| 142 |
+
| **Speed** (parallel execution) | CrewAI | `pa.parallel()` — real threads, not fake async |
|
| 143 |
+
| **Agents talking** | AutoGen | `pa.Conversation()` — with Φ-scored turns |
|
| 144 |
+
| **Plug-and-play** | OpenAI Agents SDK | `pa.purpose()` — even simpler, one function |
|
| 145 |
+
| **Knowledge** (RAG) | LlamaIndex | `pa.KnowledgeStore` — RAG as a tool |
|
| 146 |
+
| **Self-improvement** | Nobody | **Only Purpose Agent** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
+
## Runs on Your Laptop (Free, Private)
|
| 149 |
|
| 150 |
+
```bash
|
| 151 |
+
# Install Ollama (one-time)
|
| 152 |
+
curl -fsSL https://ollama.ai/install.sh | sh
|
| 153 |
+
ollama pull qwen3:1.7b # 1.7B params, runs on CPU
|
| 154 |
|
| 155 |
+
# That's it. No API keys. No cloud. No cost.
|
|
|
|
|
|
|
|
|
|
| 156 |
```
|
| 157 |
|
| 158 |
+
```python
|
| 159 |
+
team = pa.purpose("Research assistant", model="qwen3:1.7b")
|
| 160 |
+
```
|
| 161 |
|
| 162 |
+
Also works with cloud models:
|
| 163 |
```python
|
| 164 |
+
team = pa.purpose("Research assistant", model="gpt-4o") # OpenAI
|
| 165 |
+
team = pa.purpose("Research assistant", model="Qwen/Qwen3-32B") # HuggingFace
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
```
|
| 167 |
|
| 168 |
+
## Interactive CLI
|
| 169 |
|
| 170 |
+
```bash
|
| 171 |
+
python -m purpose_agent
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
Walks you through setup step-by-step. No coding required.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
|
| 176 |
## Installation
|
| 177 |
|
| 178 |
```bash
|
|
|
|
| 179 |
git clone https://huggingface.co/Rohan03/purpose-agent
|
| 180 |
cd purpose-agent
|
| 181 |
|
| 182 |
+
# For local models (recommended)
|
| 183 |
pip install ollama
|
| 184 |
|
|
|
|
|
|
|
|
|
|
| 185 |
# Run demo (no API keys needed)
|
| 186 |
python demo.py
|
| 187 |
```
|
| 188 |
|
| 189 |
+
## Literature Foundation
|
| 190 |
+
|
| 191 |
+
Built on 8 published papers — every design decision has empirical backing.
|
| 192 |
+
See [COMPILED_RESEARCH.md](COMPILED_RESEARCH.md) for the full research trace.
|
| 193 |
+
|
| 194 |
+
| Paper | What it contributes |
|
| 195 |
+
|-------|-------------------|
|
| 196 |
+
| [MUSE](https://arxiv.org/abs/2510.08002) | 3-tier memory hierarchy |
|
| 197 |
+
| [LATS](https://arxiv.org/abs/2310.04406) | LLM-as-value-function |
|
| 198 |
+
| [REMEMBERER](https://arxiv.org/abs/2306.07929) | Q-value experience replay |
|
| 199 |
+
| [Reflexion](https://arxiv.org/abs/2303.11366) | Verbal reinforcement |
|
| 200 |
+
| [SPC](https://arxiv.org/abs/2504.19162) | Anti-reward-hacking |
|
| 201 |
+
| [CER](https://arxiv.org/abs/2506.06698) | Experience distillation |
|
| 202 |
+
| [MemRL](https://arxiv.org/abs/2601.03192) | Two-phase retrieval |
|
| 203 |
+
| [TinyAgent](https://arxiv.org/abs/2409.00608) | SLM-native patterns |
|
| 204 |
+
|
| 205 |
## License
|
| 206 |
|
| 207 |
MIT
|