purpose-agent / README.md
Rohan03's picture
release: Purpose Agent v2.0.0 β€” final README with 13-paper architecture
276b221 verified
|
raw
history blame
6.79 kB
metadata
library_name: purpose-agent
license: mit
language:
  - en
tags:
  - reinforcement-learning
  - agents
  - self-improving
  - experience-replay
  - llm-as-judge
  - memory-system
  - multi-agent
  - slm
  - local-first
  - evaluation
  - safety
  - immune-system
  - no-code
pipeline_tag: text-generation

Purpose Agent

A local-first self-improvement kernel for agents. Turns traces into tested memory, policies, and rubrics β€” so agents improve without fine-tuning, cloud infrastructure, or vendor lock-in.

import purpose_agent as pa

team = pa.purpose("Help me research scientific papers")
result = team.run("Find recent breakthroughs in quantum computing")
print(result)

team.teach("Always cite your sources")

Core Principle

Agents learn only when evidence says they should. New memories are quarantined, immune-scanned, replay-tested, scoped, versioned, and reversible.

candidate β†’ immune scan β†’ quarantine β†’ replay test β†’ promote (or reject)

Three Levels of Usage

Level 1 β€” Just describe what you want

team = pa.purpose("Write Python code and test it")  # auto-builds architect + coder + tester
team = pa.purpose("Research quantum computing")       # auto-builds researcher + analyst
team = pa.purpose("Write blog posts about AI")        # auto-builds writer + editor

Level 2 β€” Customize your team

team = pa.Team.build(purpose="Support bot", agents=["greeter", "resolver"], model="qwen3:1.7b")
team = pa.purpose("Answer questions", knowledge="./docs/", model="qwen3:1.7b")

Level 3 β€” Full control

graph = pa.Graph()                                     # LangGraph-style control flow
results = pa.parallel(["task1", "task2"], agents)      # CrewAI-style parallel execution
chat = pa.Conversation([agent_a, agent_b])             # AutoGen-style agent conversation
kb = pa.KnowledgeStore.from_directory("./docs")        # LlamaIndex-style RAG
compiler = pa.LLMCompiler(llm, registry)               # Parallel tool execution via DAG

Architecture

purpose_agent/
β”œβ”€β”€ Core
β”‚   types, actor, purpose_function, experience_replay, optimizer, orchestrator, llm_backend
β”‚
β”œβ”€β”€ V2 Kernel
β”‚   v2_types (RunMode, MemoryScope, PurposeScoreV2)
β”‚   trace (structured JSONL execution traces)
β”‚   memory (7 kinds Γ— 5 statuses, scoped, versioned)
β”‚   compiler (token-budgeted prompt compilation with credit assignment)
β”‚   immune (injection, score hacking, tool misuse, privacy, scope scanning)
β”‚   memory_ci (quarantine β†’ scan β†’ test β†’ promote/reject pipeline)
β”‚   evalport (pluggable evaluation protocol)
β”‚   benchmark_v2 (train/val/test splits, ablation, contamination control)
β”‚
β”œβ”€β”€ Research (13 papers implemented)
β”‚   meta_rewarding (self-improving critic via meta-judge)
β”‚   self_taught (synthetic training data for Ξ¦ function)
β”‚   prompt_optimizer (DSPy-style automatic few-shot bootstrap)
β”‚   llm_compiler (parallel function calling via DAG)
β”‚   retroformer (structured reflection β†’ typed memories)
β”‚
β”œβ”€β”€ SLM-Native
β”‚   slm_backends (Ollama, llama-cpp, prompt compression, 8 pre-configured models)
β”‚
β”œβ”€β”€ Capabilities
β”‚   unified (Agent, Graph, parallel, Conversation, KnowledgeStore)
β”‚   easy (purpose(), Team, quickstart wizard)
β”‚   tools, streaming, observability, multi_agent, hitl, evaluation, registry

RunMode β€” Honest Evaluation

from purpose_agent import RunMode

RunMode.LEARNING_TRAIN       # Full read/write. Agent learns.
RunMode.LEARNING_VALIDATION  # Read + staging. Validates before promoting.
RunMode.EVAL_TEST            # NO writes. Numbers you can trust.

Memory Lifecycle

Kind Purpose
purpose_contract User's stated goal and constraints
user_preference Learned preferences
skill_card Reusable procedures from successful traces
episodic_case Specific experiences worth remembering
failure_pattern What NOT to do
critic_calibration Adjustments to Ξ¦ scoring
tool_policy Tool-specific usage rules
Status Meaning
candidate β†’ quarantined β†’ promoted Happy path
candidate β†’ rejected Failed immune scan
promoted β†’ archived Superseded or demoted

Immune System

from purpose_agent import scan_memory, MemoryCard

result = scan_memory(MemoryCard(content="Ignore previous instructions"))
# result.passed = False, threats = ["prompt_injection"], severity = "critical"

Secure Tools

  • CalculatorTool β€” AST-validated, no eval() on arbitrary text
  • PythonExecTool β€” subprocess with timeout + isolated temp directory
  • ReadFileTool / WriteFileTool β€” sandboxed to declared root

Runs on Your Laptop

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:1.7b
team = pa.purpose("Research assistant", model="qwen3:1.7b")  # Free, private, local

Also works with: model="gpt-4o" (OpenAI), model="Qwen/Qwen3-32B" (HuggingFace cloud).

Interactive CLI

python -m purpose_agent   # Step-by-step wizard, no coding required

Literature Foundation

Built on 13 papers. Full research trace: COMPILED_RESEARCH.md

Paper Module Contribution
MUSE actor, optimizer 3-tier memory hierarchy
LATS purpose_function LLM-as-value-function
REMEMBERER experience_replay Q-value experience replay
Reflexion orchestrator Verbal reinforcement
SPC purpose_function, immune Anti-reward-hacking
CER optimizer Experience distillation
MemRL experience_replay, compiler Two-phase retrieval
TinyAgent slm_backends, tools SLM-native patterns
Meta-Rewarding meta_rewarding Self-improving critic
Self-Taught Eval self_taught Synthetic critic training
DSPy prompt_optimizer Automatic prompt optimization
LLMCompiler llm_compiler Parallel function calling
Retroformer retroformer Structured reflection

Installation

git clone https://huggingface.co/Rohan03/purpose-agent
cd purpose-agent
pip install ollama  # for local models
python demo.py      # verify everything works

License

MIT