narcolepticchicken
/

agent-cost-optimizer

Safetensors

Model card Files Files and versions

xet

Community

narcolepticchicken commited on about 22 hours ago

Commit

81a993a

verified ·

1 Parent(s): 17a2ae0

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -256,3 +256,51 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
 ```
 For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

 ```
 For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
+## Trained Router (NEW)
+The heuristic router has been replaced with a **trained XGBoost router** using the CARROT architecture.
+### Architecture: Difficulty-First + ML Confirmation + Safety Floors
+1. Map task_type to difficulty (1-5)
+2. Compute base_tier = min(difficulty + 1, 5)
+3. Apply safety floor per task_type (e.g., legal → tier 4)
+4. Use per-tier P(success|query) XGBoost classifiers to confirm or escalate
+5. If P(success@base_tier) < threshold, escalate one tier
+### Usage
+```python
+from aco.learned_router import TrainedRouter
+# Load from Hub
+router = TrainedRouter.from_pretrained("narcolepticchicken/agent-cost-optimizer")
+# Predict
+tier, confidence = router.predict(
+    "Write a Python function to reverse a linked list",
+    "coding",
+    difficulty=3,
+)
+print(f"Recommended: tier {tier} (confidence: {confidence:.2f})")
+```
+### Benchmark Results (2K eval traces)
+| Router | Success | AvgCost | Unsafe |
+|--------|---------|---------|--------|
+| trained (t=0.55) | 85.5% | 1.107 | 4.1% |
+| trained (t=0.65) | 91.9% | 1.365 | 1.5% |
+| always_frontier | 88.8% | 1.000 | 2.5% |
+| heuristic_diff+1 | 83.4% | 0.940 | 4.9% |
+| oracle | 99.8% | 0.486 | 0.0% |
+The trained router at t=0.65 **outperforms always-frontier on success rate** (91.9% vs 88.8%) with lower unsafe miss rate (1.5% vs 2.5%).
+### Training Data
+50,000 synthetic traces with ground-truth per-tier success labels. Each trace includes all 5 tier outcomes, enabling the per-tier classifiers to learn from balanced success/failure examples.
+See `docs/trained_router_report.md` for full details.