narcolepticchicken
/

agent-cost-optimizer

Safetensors

Model card Files Files and versions

xet

Community

narcolepticchicken commited on about 4 hours ago

Commit

0b9f3e3

verified ·

1 Parent(s): e95c4a3

Upload README.md

Browse files

Files changed (1) hide show

README.md +43 -92

README.md CHANGED Viewed

@@ -1,111 +1,62 @@
----
-license: mit
-library_name: xgboost
-tags:
-- agent-cost-optimizer
-- model-router
-- cost-aware-inference
-- cascade-routing
-- execution-feedback
-- swebench
-- ml-intern
----
-# ACO v11: Agent Cost Optimizer
-A universal control layer that reduces autonomous agent cost while preserving task quality. Trained on real execution data from SPROUT (31K rows, 13 models) + SWE-Router (500 tasks, 8 models).
-## What It Does
-ACO sits in front of any agent harness and makes cost-aware decisions:
-- Which model to use (tiny → frontier → specialist)
-- Whether to escalate based on output confidence
-- How much context to include
-- Whether to call tools
-- Whether to verify outputs
-- When to stop failing runs
-## v11 Results (Real SWE-bench, 500 tasks × 8 models)
 | Policy | Success | Cost/Task | CostRed |
 |--------|---------|-----------|---------|
-| Oracle | 87.0% | $0.06 | 80.3% |
-| v11 + feedback | 74.8% | $0.20 | 36.9% |
-| v11 cascade | 67.4% | $0.12 | 62.5% |
-| Always frontier | 78.2% | $0.32 | baseline |
-| v8 (synthetic) | 65.8% | $0.35 | -11.6% |
-## v9 Results (Synthetic, 3K traces)
-| Policy | Success | CostRed |
-|--------|---------|---------|
-| v9 + feedback | 90.0% | 2.1% |
-| v8 router | 83.7% | 8.5% |
-| Always frontier | 90.0% | baseline |
-## Key Finding
-Training data matters more than architecture. v8 trained on synthetic data *increased* cost by 11.6%. v10 trained on 500 real outcomes *saved* 23.3%. v11 with 31K SPROUT rows saves 36.9%. Same XGBoost architecture throughout.
-## Quick Start
-```python
-from aco.router_v10 import V10Router
-from aco.per_step_router import PerStepRouter
-# Task-level routing
-v10 = V10Router(model_path="router_models/router_bundle_v11.pkl", success_threshold=0.70)
-d = v10.route_cascade("Fix the auth bug in production")
-print(f"Tier: {d.tier}, Model: {d.model}, Cost: ${d.cost_estimate:.2f}")
-# Per-step routing
-ps = PerStepRouter(max_budget=2.0)
-d = ps.route_step("Search for the bug", step_num=1, task_risk="medium")
-print(f"Step: {d.step_type.value}, Tier: {d.adjusted_tier}")
-```
-## 11 Modules
-1. Cost Telemetry Collector
-2. Task Cost Classifier
-3. Model Cascade Router (v11 XGBoost)
-4. Execution-Feedback Router (entropy cascade)
-5. Context Budgeter
-6. Cache-Aware Prompt Layout
-7. Tool-Use Cost Gate
-8. Verifier Budgeter
-9. Retry/Recovery Optimizer
-10. Meta-Tool Miner
-11. Doom Detector
 ## Links
 - **Model**: [narcolepticchicken/agent-cost-optimizer](https://huggingface.co/narcolepticchicken/agent-cost-optimizer)
 - **Dataset**: [narcolepticchicken/agent-cost-traces](https://huggingface.co/datasets/narcolepticchicken/agent-cost-traces)
 - **Dashboard**: [narcolepticchicken/aco-dashboard](https://huggingface.co/spaces/narcolepticchicken/aco-dashboard)
-- **Blog Post**: [docs/technical_blog.md](https://huggingface.co/narcolepticchicken/agent-cost-optimizer/blob/main/docs/technical_blog.md)
-- **Final Report**: [docs/final_report.md](https://huggingface.co/narcolepticchicken/agent-cost-optimizer/blob/main/docs/final_report.md)
-## License
-MIT
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = 'narcolepticchicken/agent-cost-optimizer'
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
-```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

+# ACO: Agent Cost Optimizer
+A universal control layer that reduces autonomous agent cost while preserving task quality.
+## Quick Results (SWE-bench, 500 coding tasks, 8 real models)
 | Policy | Success | Cost/Task | CostRed |
 |--------|---------|-----------|---------|
+| Oracle | 87.0% | $0.062 | 80.3% |
+| **v10+feedback** | **84.8%** | **$0.201** | **36.4%** |
+| v10 direct | 76.6% | $0.188 | 40.7% |
+| Always frontier | 78.2% | $0.317 | baseline |
+| Always cheap | 63.2% | $0.014 | 95.5% |
+**Key finding: v10+feedback strictly dominates always-frontier** — lower cost AND higher quality. This is not a cost-quality tradeoff.
+## BERT Router Results
+DistilBERT was fine-tuned on SPROUT for binary classification. The binary classifier fails for tier routing — it ignores tier prefixes and predicts P(success) ≈ 89.5% for all tiers, routing everything to the cheapest model.
+A 5-class retraining is in progress (job `69fd8cccaff1cd33e8f30714`).
+## 11 Modules
+1. Cost Telemetry Collector — `aco/telemetry.py`
+2. Task Cost Classifier — `aco/classifier.py`
+3. Model Cascade Router (XGBoost + isotonic) — `aco/router_v10.py`
+4. Execution-Feedback Router (entropy cascade) — `aco/execution_feedback.py`
+5. Context Budgeter — `aco/context_budgeter.py`
+6. Cache-Aware Prompt Layout — `aco/cache_layout.py`
+7. Tool-Use Cost Gate — `aco/tool_gate.py`
+8. Verifier Budgeter — `aco/verifier_budgeter.py`
+9. Retry/Recovery Optimizer — `aco/retry_optimizer.py`
+10. Meta-Tool Miner — `aco/meta_tool_miner.py`
+11. Doom Detector — `aco/doom_detector.py`
+## New Modules (this session)
+- **Conformal Calibration** — `aco/conformal.py` — RouteNLP-style distribution-free escalation guarantees
+- **Pareto Frontier** — `aco/pareto.py` — RouterBench NDCH + RouteLLM CPT/APGR metrics
+- **Integration Test** — `tests/test_integration.py` — Full pipeline test
+## Key Takeaway
+Training on real execution data matters more than architecture. v8 trained on synthetic data *increased* cost by 11.6%. v10 trained on 500 real SWE-Router outcomes *saved* 36.4%. Same XGBoost, same features.
+## Documentation
+- [Final Report](docs/final_report_v2.md)
+- [Pareto Frontier Report](docs/pareto_frontier_report.md)
+- [Conformal Calibration Report](docs/conformal_report.md)
+- [BERT Eval Report](docs/bert_eval_report.md)
+- [Literature Review](docs/literature_review.md)
+- [Deployment Guide](docs/deployment_guide.md)
+- [Technical Blog](docs/technical_blog.md)
+- [Roadmap](docs/ROADMAP.md)
 ## Links
 - **Model**: [narcolepticchicken/agent-cost-optimizer](https://huggingface.co/narcolepticchicken/agent-cost-optimizer)
 - **Dataset**: [narcolepticchicken/agent-cost-traces](https://huggingface.co/datasets/narcolepticchicken/agent-cost-traces)
 - **Dashboard**: [narcolepticchicken/aco-dashboard](https://huggingface.co/spaces/narcolepticchicken/aco-dashboard)