narcolepticchicken's picture
Upload README.md
0b9f3e3 verified

ACO: Agent Cost Optimizer

A universal control layer that reduces autonomous agent cost while preserving task quality.

Quick Results (SWE-bench, 500 coding tasks, 8 real models)

Policy Success Cost/Task CostRed
Oracle 87.0% $0.062 80.3%
v10+feedback 84.8% $0.201 36.4%
v10 direct 76.6% $0.188 40.7%
Always frontier 78.2% $0.317 baseline
Always cheap 63.2% $0.014 95.5%

Key finding: v10+feedback strictly dominates always-frontier β€” lower cost AND higher quality. This is not a cost-quality tradeoff.

BERT Router Results

DistilBERT was fine-tuned on SPROUT for binary classification. The binary classifier fails for tier routing β€” it ignores tier prefixes and predicts P(success) β‰ˆ 89.5% for all tiers, routing everything to the cheapest model.

A 5-class retraining is in progress (job 69fd8cccaff1cd33e8f30714).

11 Modules

  1. Cost Telemetry Collector β€” aco/telemetry.py
  2. Task Cost Classifier β€” aco/classifier.py
  3. Model Cascade Router (XGBoost + isotonic) β€” aco/router_v10.py
  4. Execution-Feedback Router (entropy cascade) β€” aco/execution_feedback.py
  5. Context Budgeter β€” aco/context_budgeter.py
  6. Cache-Aware Prompt Layout β€” aco/cache_layout.py
  7. Tool-Use Cost Gate β€” aco/tool_gate.py
  8. Verifier Budgeter β€” aco/verifier_budgeter.py
  9. Retry/Recovery Optimizer β€” aco/retry_optimizer.py
  10. Meta-Tool Miner β€” aco/meta_tool_miner.py
  11. Doom Detector β€” aco/doom_detector.py

New Modules (this session)

  • Conformal Calibration β€” aco/conformal.py β€” RouteNLP-style distribution-free escalation guarantees
  • Pareto Frontier β€” aco/pareto.py β€” RouterBench NDCH + RouteLLM CPT/APGR metrics
  • Integration Test β€” tests/test_integration.py β€” Full pipeline test

Key Takeaway

Training on real execution data matters more than architecture. v8 trained on synthetic data increased cost by 11.6%. v10 trained on 500 real SWE-Router outcomes saved 36.4%. Same XGBoost, same features.

Documentation

Links