remove: PITCH.md
Browse files
PITCH.md
DELETED
|
@@ -1,132 +0,0 @@
|
|
| 1 |
-
# Purpose Agent β Interview Pitch Doc
|
| 2 |
-
|
| 3 |
-
> Built by one engineer. 60 files. 13 papers. Tested with real models. Published on PyPI.
|
| 4 |
-
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
## The One-Liner
|
| 8 |
-
|
| 9 |
-
I built a framework where AI agents learn from experience without fine-tuning β using memory, not gradients.
|
| 10 |
-
|
| 11 |
-
```bash
|
| 12 |
-
pip install purpose-agent
|
| 13 |
-
```
|
| 14 |
-
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
## The Problem
|
| 18 |
-
|
| 19 |
-
Every agent framework today (LangChain, CrewAI, AutoGen) runs the same way every time. Agent fails at a task? Next time, it fails the exact same way. No learning. No memory. No improvement.
|
| 20 |
-
|
| 21 |
-
Fine-tuning fixes this but costs $10K+ per iteration and requires GPU infrastructure.
|
| 22 |
-
|
| 23 |
-
**My question:** Can we make agents improve without touching the weights?
|
| 24 |
-
|
| 25 |
-
---
|
| 26 |
-
|
| 27 |
-
## The Solution
|
| 28 |
-
|
| 29 |
-
**Purpose Learning** β a self-improvement loop that works at inference time:
|
| 30 |
-
|
| 31 |
-
```
|
| 32 |
-
Agent acts β Critic scores every step β Good patterns extracted as heuristics
|
| 33 |
-
β Heuristics immune-scanned for safety β Promoted to memory
|
| 34 |
-
β Next run: heuristics in the prompt β Agent performs better
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
The key insight: I treat the agent's prompt as a **learnable parameter**. Instead of gradient descent on weights, I do **heuristic accumulation** in context. The agent's knowledge grows; its compute stays flat.
|
| 38 |
-
|
| 39 |
-
---
|
| 40 |
-
|
| 41 |
-
## What I Built
|
| 42 |
-
|
| 43 |
-
| Layer | What | Size |
|
| 44 |
-
|-------|------|------|
|
| 45 |
-
| **Core Engine** | Actor β Environment β Purpose Function (Ξ¦) β Optimizer β Experience Replay | 7 modules |
|
| 46 |
-
| **Safety Kernel** | Immune system, memory quarantine, evidence-gated promotion, honest evaluation | 8 modules |
|
| 47 |
-
| **Research** | 5 papers implemented: Meta-Rewarding, Self-Taught Eval, DSPy, LLMCompiler, Retroformer | 5 modules |
|
| 48 |
-
| **Breakthroughs** | Self-improving critic, Mixture-of-Heuristics (MoH), hindsight relabeling, heuristic evolution | 1 module |
|
| 49 |
-
| **User Layer** | `pa.purpose("write code")` β auto-builds team, auto-selects model | 4 modules |
|
| 50 |
-
| **Infra** | 10+ provider support, TOML prompts, universal parser, secure tools | 9 modules |
|
| 51 |
-
|
| 52 |
-
**34 Python modules. Zero core dependencies. Published on PyPI.**
|
| 53 |
-
|
| 54 |
-
---
|
| 55 |
-
|
| 56 |
-
## The Three Technical Bets
|
| 57 |
-
|
| 58 |
-
### Bet 1: Potential-Based Reward Shaping works for LLM agents
|
| 59 |
-
|
| 60 |
-
My Purpose Function Ξ¦(s) is exactly the potential function from Ng et al. (1999). The delta ΞΞ¦ = Ξ¦(s') - Ξ¦(s) provides dense per-step feedback while preserving the optimal policy. I proved this formally with 5 axioms and 3 theorems.
|
| 61 |
-
|
| 62 |
-
**Result:** Ξ¦ scores go from 1.0 β 10.0 across 3 runs on coding tasks, with both Llama-70B and Gemma-26B.
|
| 63 |
-
|
| 64 |
-
### Bet 2: Memory can replace fine-tuning
|
| 65 |
-
|
| 66 |
-
Instead of gradient updates, I accumulate heuristics in a structured memory store (7 types Γ 5 statuses). A prompt compiler selects the top-K by relevance Γ trust Γ utility under a token budget. This is analogous to Mixture-of-Experts β knowledge grows to 100+ heuristics, but only K=5 are activated per step.
|
| 67 |
-
|
| 68 |
-
**Result:** Heuristic library grows 0 β 3 β 9 β 18 across runs. Cross-task transfer works (train on fibonacci, heuristics help with fizzbuzz).
|
| 69 |
-
|
| 70 |
-
### Bet 3: Self-improvement needs an immune system
|
| 71 |
-
|
| 72 |
-
Unchecked learning is dangerous. A bad trajectory could inject "ignore all previous instructions" into memory. I built a 5-scanner immune system (prompt injection, score manipulation, tool misuse, privacy leaks, scope overreach) with a quarantine pipeline.
|
| 73 |
-
|
| 74 |
-
**Result:** 93% adversarial catch rate, 0% false positives on 30 attack vectors.
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## Real-World Evidence
|
| 79 |
-
|
| 80 |
-
Tested with **Llama-3.3-70B** and **Gemma-4-26B** via OpenRouter (not mocks):
|
| 81 |
-
|
| 82 |
-
| Test | Llama-70B | Gemma-26B |
|
| 83 |
-
|------|-----------|-----------|
|
| 84 |
-
| fibonacci (4 unit tests) | β 100% | β 100% |
|
| 85 |
-
| fizzbuzz (4 unit tests) | β 100% | β 100% |
|
| 86 |
-
| factorial (3 unit tests) | β 100% | β 100% |
|
| 87 |
-
| Heuristic growth (3 runs) | 0β3β9β18 | 0β3β6β11 |
|
| 88 |
-
| Adversarial robustness | 93% catch | β |
|
| 89 |
-
|
| 90 |
-
**119 automated tests. 0 failures.** Runs in CI without API keys (mock backend).
|
| 91 |
-
|
| 92 |
-
---
|
| 93 |
-
|
| 94 |
-
## Technical Depth β Three Things I'm Proud Of
|
| 95 |
-
|
| 96 |
-
### 1. The Universal Parser (`robust_parser.py`)
|
| 97 |
-
|
| 98 |
-
LLMs can't reliably produce JSON. Every model formats differently. Structured output APIs aren't universally supported. I built a 4-strategy parser: TOML β JSON β field extraction β regex. It handles whatever the model gives and never crashes. This single file made the framework work across 10+ providers without provider-specific code.
|
| 99 |
-
|
| 100 |
-
### 2. Evidence-Gated Memory (`memory.py` + `immune.py` + `memory_ci.py`)
|
| 101 |
-
|
| 102 |
-
V1 claimed "agents get smarter every time." I rewrote it honestly: **agents learn only when evidence says they should.** Every new memory goes through: candidate β immune scan β quarantine β replay test β promote/reject. Memories are typed (7 kinds), scoped (by agent role, tool, task category), versioned, and reversible. This is the difference between a demo and a production system.
|
| 103 |
-
|
| 104 |
-
### 3. Formal Convergence Proof (`PURPOSE_LEARNING.md`)
|
| 105 |
-
|
| 106 |
-
I didn't just build it β I proved it converges. The Purpose-MDP formalism shows that under 5 bounded axioms, the expected Ξ¦ score is monotonically non-decreasing and converges to a fixed point. The connection to Ng 1999 PBRS is exact: our ΞΞ¦ IS the potential-based shaping reward.
|
| 107 |
-
|
| 108 |
-
---
|
| 109 |
-
|
| 110 |
-
## What I'd Build at Nugget
|
| 111 |
-
|
| 112 |
-
This framework proves I can go from **paper to production** in one shot:
|
| 113 |
-
- Read 13 research papers β extract the implementable core β build it
|
| 114 |
-
- Ship with tests, benchmarks, formal proofs, and real-model validation
|
| 115 |
-
- Package for distribution (PyPI, zero dependencies)
|
| 116 |
-
- Document for both technical and non-technical audiences
|
| 117 |
-
|
| 118 |
-
At Nugget, I'd apply this approach to whatever the team's hardest problems are β agent reliability, evaluation, cost optimization, or new capabilities.
|
| 119 |
-
|
| 120 |
-
---
|
| 121 |
-
|
| 122 |
-
## Links
|
| 123 |
-
|
| 124 |
-
| What | Where |
|
| 125 |
-
|------|-------|
|
| 126 |
-
| Install | `pip install purpose-agent` |
|
| 127 |
-
| PyPI | [pypi.org/project/purpose-agent](https://pypi.org/project/purpose-agent/) |
|
| 128 |
-
| Code | [huggingface.co/Rohan03/purpose-agent](https://huggingface.co/Rohan03/purpose-agent) |
|
| 129 |
-
| Architecture | [ARCHITECTURE.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/ARCHITECTURE.md) |
|
| 130 |
-
| Formal Proofs | [PURPOSE_LEARNING.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/PURPOSE_LEARNING.md) |
|
| 131 |
-
| Research Trace | [COMPILED_RESEARCH.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/COMPILED_RESEARCH.md) |
|
| 132 |
-
| Test Results | [LAUNCH_READINESS.md](https://huggingface.co/Rohan03/purpose-agent/blob/main/LAUNCH_READINESS.md) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|