ACO: Agent Cost Optimizer

A universal control layer that reduces autonomous agent cost while preserving task quality.

Quick Results (SWE-bench, 500 coding tasks, 8 real models)

Policy	Success	Cost/Task	CostRed
Oracle	87.0%	$0.062	80.3%
v10+feedback	84.8%	$0.201	36.4%
v10 direct	76.6%	$0.188	40.7%
Always frontier	78.2%	$0.317	baseline
Always cheap	63.2%	$0.014	95.5%

Key finding: v10+feedback strictly dominates always-frontier — lower cost AND higher quality. This is not a cost-quality tradeoff.

BERT Router Results

DistilBERT was fine-tuned on SPROUT for binary classification. The binary classifier fails for tier routing — it ignores tier prefixes and predicts P(success) ≈ 89.5% for all tiers, routing everything to the cheapest model.

A 5-class retraining is in progress (job 69fd8cccaff1cd33e8f30714).

11 Modules

Cost Telemetry Collector — aco/telemetry.py
Task Cost Classifier — aco/classifier.py
Model Cascade Router (XGBoost + isotonic) — aco/router_v10.py
Execution-Feedback Router (entropy cascade) — aco/execution_feedback.py
Context Budgeter — aco/context_budgeter.py
Cache-Aware Prompt Layout — aco/cache_layout.py
Tool-Use Cost Gate — aco/tool_gate.py
Verifier Budgeter — aco/verifier_budgeter.py
Retry/Recovery Optimizer — aco/retry_optimizer.py
Meta-Tool Miner — aco/meta_tool_miner.py
Doom Detector — aco/doom_detector.py

New Modules (this session)

Conformal Calibration — aco/conformal.py — RouteNLP-style distribution-free escalation guarantees
Pareto Frontier — aco/pareto.py — RouterBench NDCH + RouteLLM CPT/APGR metrics
Integration Test — tests/test_integration.py — Full pipeline test

Key Takeaway

Training on real execution data matters more than architecture. v8 trained on synthetic data increased cost by 11.6%. v10 trained on 500 real SWE-Router outcomes saved 36.4%. Same XGBoost, same features.

narcolepticchicken
/

agent-cost-optimizer