File size: 1,627 Bytes
12c2cae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
"""
purpose_agent — A Self-Improving Agentic Framework via State-Value Evaluation

Architecture based on:
  - MUSE (arxiv:2510.08002): 3-tier hierarchical memory (strategic/procedural/tool)
  - LATS (arxiv:2310.04406): LLM-as-value-function V(s) = λ·LM_score + (1-λ)·SC_score
  - REMEMBERER (arxiv:2306.07929): Q-value experience replay with Bellman updates
  - Reflexion (arxiv:2303.11366): Verbal reinforcement via episodic self-reflection
  - SPC (arxiv:2504.19162): Anti-reward-hacking via adversarial critic patterns

Core philosophy: The agent improves via a "Purpose Function" Φ(s) that evaluates
intermediate state improvements (distance to goal) rather than binary outcome success.
No real-time backprop — improvement comes from expanding external memory with
learned heuristics extracted from high-reward trajectories.
"""

__version__ = "0.1.0"

from purpose_agent.types import (
    State,
    Action,
    Trajectory,
    TrajectoryStep,
    Heuristic,
    PurposeScore,
    MemoryRecord,
)
from purpose_agent.llm_backend import LLMBackend, MockLLMBackend
from purpose_agent.actor import Actor
from purpose_agent.purpose_function import PurposeFunction
from purpose_agent.experience_replay import ExperienceReplay
from purpose_agent.optimizer import HeuristicOptimizer
from purpose_agent.orchestrator import Orchestrator

__all__ = [
    "State",
    "Action",
    "Trajectory",
    "TrajectoryStep",
    "Heuristic",
    "PurposeScore",
    "MemoryRecord",
    "LLMBackend",
    "MockLLMBackend",
    "Actor",
    "PurposeFunction",
    "ExperienceReplay",
    "HeuristicOptimizer",
    "Orchestrator",
]