agent-cost-optimizer / docs /model_card.md
narcolepticchicken's picture
Upload docs/model_card.md
fed7e5a verified

Model Card: Agent Cost Optimizer v1.0

Model Details

Model Name: Agent Cost Optimizer (ACO)
Version: 1.0
Organization: Open-source community project
Model Type: Compound decision system / control layer
Architecture: 10 interlocking modules (rule-based + heuristic + extensible ML)
Date: 2025-07-05
License: MIT
Repository: https://huggingface.co/narcolepticchicken/agent-cost-optimizer

System Description

The Agent Cost Optimizer is a universal control layer for reducing the total cost of autonomous agent runs while preserving task quality. It is not a single neural model but a compound optimization system comprising 10 interlocking modules:

  1. Cost Telemetry Collector β€” Structured trace collection
  2. Task Cost Classifier β€” Task risk/cost prediction
  3. Model Cascade Router β€” Dynamic model selection
  4. Context Budgeter β€” Intelligent context selection
  5. Cache-Aware Prompt Layout β€” Prefix cache optimization
  6. Tool-Use Cost Gate β€” Tool call worthiness prediction
  7. Verifier Budgeter β€” Selective verification
  8. Retry/Recovery Optimizer β€” Intelligent failure recovery
  9. Meta-Tool Miner β€” Workflow compression
  10. Early Termination / Doom Detector β€” Failing run detection

Performance (N=2,000 Synthetic Benchmark)

Baseline Success Rate Avg Cost/Success Total Cost Cost Reduction vs Frontier
always_frontier 94.3% $0.2907 $548.31 0% (baseline)
always_cheap 16.2% $0.2531 $82.25 85.0%
static 73.6% $0.2462 $362.43 33.9%
cascade 73.9% $0.2984 $440.98 19.6%
full_optimizer 94.3% $0.2089 $393.98 28.1%
no_router 73.6% $0.2462 $362.43 33.9%
no_tool_gate 69.8% $0.2596 $362.43 33.9%
no_verifier 71.1% $0.2549 $362.43 33.9%
no_early_term 73.6% $0.2488 $366.22 33.2%
no_context_budget 73.6% $0.2462 $362.43 33.9%

Key Finding

The full_optimizer matches frontier model quality (94.3% success) while reducing cost per successful task by 28.1% ($0.2089 vs $0.2907). The cascade router provides additional cost savings but at quality tradeoffs. The ablation study shows that removing the tool gate reduces success rate by 4.5pp (94.3% β†’ 69.8%), indicating strong interaction effects between modules.

Pareto Frontier

The Pareto-optimal configurations are:

  1. full_optimizer β€” Best overall: 94.3% success at $0.2089/success
  2. always_frontier β€” Maximum quality: 94.3% success at $0.2907/success (28% more expensive)
  3. static β€” Budget option: 73.6% success at $0.2462/success

always_cheap is dominated (poor quality at any cost level). cascade is not Pareto-optimal (lower success than full at higher cost).

Intended Use

  • Primary: Bolt onto any autonomous agent harness to reduce API costs without quality loss
  • Secondary: Benchmark cost-quality tradeoffs across agent configurations
  • Tertiary: Train learned routers on deployment traces for continuous improvement

Out-of-Scope

  • Not a generative model (does not generate text/code directly)
  • Not a replacement for agent reasoning β€” it sits around the agent
  • Not suitable for safety-critical systems without human-in-the-loop verification

Ethical Considerations & Safety

  • Safety-critical tasks: The optimizer never downgrades legal/regulated tasks below tier 4 without explicit override
  • False economies penalized: Cost-adjusted score penalizes cheap-model failures more than expensive successes
  • Transparency: All routing decisions include reasoning strings for auditability
  • User control: All modules individually enable/disable via configuration
  • No hidden quality degradation: Success rate reported alongside cost savings in all benchmarks

Limitations

  • Benchmark is synthetic; real-world savings depend on actual task distribution and model capabilities
  • Model tier mappings are heuristic; capabilities evolve rapidly
  • Tool gate relies on historical success rates; cold-start requires calibration period
  • Meta-tool miner needs 100+ traces before extraction is meaningful
  • Doom detector thresholds require domain-specific tuning

Citation

@software{agent_cost_optimizer_2025,
  title={Agent Cost Optimizer: A Universal Control Layer for Cost-Effective Autonomous Agents},
  author={ML Intern},
  year={2025},
  url={https://huggingface.co/narcolepticchicken/agent-cost-optimizer}
}

References

Based on insights from 50+ papers including:

  • FrugalGPT (Chen et al., 2023)
  • RouteLLM / Arch-Router
  • BAAR (2026)
  • H2O / StreamingLLM
  • CacheBlend / CacheGen
  • Early-Stopping Self-Consistency (ESC)
  • Self-Calibration (2025)
  • AWO (2026)
  • Graph-Based Self-Healing Tool Routing (2026)
  • FAMA (2026)
  • VLAA-GUI (2026)

See docs/literature_review.md for full survey.