# Agent Cost Optimizer - Deployment Guide ## Overview The Agent Cost Optimizer (ACO) is a control layer that sits **in front of, around, or inside** any agent harness. It does not replace your agent — it optimizes how your agent runs. ## Installation ```bash # Clone the repository git clone https://huggingface.co/narcolepticchicken/agent-cost-optimizer cd agent-cost-optimizer # Install dependencies pip install -e . # Optional: Gradio dashboard pip install gradio # Optional: Trackio monitoring pip install trackio ``` ## Quick Start ```python from aco import AgentCostOptimizer from aco.config import ACOConfig, ModelConfig, RoutingPolicy # 1. Define your available models with real pricing config = ACOConfig( models={ "gpt-4o-mini": ModelConfig( model_id="gpt-4o-mini", provider="openai", cost_per_1k_input=0.00015, cost_per_1k_output=0.0006, strength_tier=2, max_context=128000, ), "gpt-4o": ModelConfig( model_id="gpt-4o", provider="openai", cost_per_1k_input=0.0025, cost_per_1k_output=0.01, strength_tier=4, max_context=128000, ), "deepseek-chat": ModelConfig( model_id="deepseek-chat", provider="deepseek", cost_per_1k_input=0.00014, cost_per_1k_output=0.00028, strength_tier=3, max_context=64000, cache_discount_rate=0.5, ), }, routing_policy=RoutingPolicy("cascade"), ) # 2. Initialize optimizer optimizer = AgentCostOptimizer(config) # 3. Before each agent step, call optimize() request = "Write a Python function to reverse a linked list" run_state = { "trace_id": "run-001", "planned_tools": [("file_read", {"path": "linked_list.py"})], "previous_tool_calls": [], "current_cost": 0.0, "step_number": 1, "total_steps": 3, "is_irreversible": False, "routing_mode": "cascade", } result = optimizer.optimize(request, run_state) # 4. Use the decisions print(f"Use model: {result.routing_decision.model_id}") print(f"Max tokens: {result.routing_decision.max_tokens}") print(f"Temperature: {result.routing_decision.temperature}") print(f"Estimated cost: ${result.estimated_cost:.4f}") # 5. After execution, record actual costs optimizer.record_step( trace_id=result.trace_id, model_call=ModelCall( model_id=result.routing_decision.model_id, provider=result.routing_decision.provider, input_tokens=2000, output_tokens=800, latency_ms=1200, ), tool_calls=[ToolCall(tool_name="file_read", tool_input={"path": "linked_list.py"}, tool_cost=0.001, tool_latency_ms=300)], context_size_tokens=2500, step_outcome=Outcome.SUCCESS, ) # 6. Finalize trace optimizer.finalize_trace(result.trace_id, outcome=Outcome.SUCCESS) ``` ## Configuration ### Model Tiers | Tier | Typical Models | Cost | Strength | When to Use | |------|---------------|------|----------|-------------| | 1 | Local Qwen-0.5B, Phi-1 | Near-zero | 35% | Factual QA, simple extraction | | 2 | GPT-4o-mini, Claude-3.5-Haiku, DeepSeek | $0.15/M tok | 55% | Drafting, classification, parsing | | 3 | Claude-3.5-Sonnet, DeepSeek-V2 | $1.5-3/M tok | 80% | Coding, reasoning, research | | 4 | GPT-4o, Claude-3-Opus | $2.5-5/M tok | 93% | Complex analysis, legal, creative | | 5 | o1, o3-mini, specialist | $3-15/M tok | 97% | Math, safety-critical, adversarial | ### Routing Modes - **`cheapest`**: Always use lowest-cost model (dangerous, only for internal tools) - **`strongest`**: Always use frontier (expensive, maximum quality) - **`cascade`**: Try cheap first, escalate on low confidence - **`risk_based`**: Route by predicted task risk - **`adaptive`**: Learn from trace history ## Integration Patterns ### Pattern A: Front Proxy (Pre-Step) ``` User Request → ACO.optimize() → [Decisions] → Agent Harness → LLM API ``` ### Pattern B: Around Wrapper (Pre + Post) ``` User Request → ACO.optimize() → Agent Step → ACO.record_step() → Next Step ``` ### Pattern C: Inside Agent (Per-Step) ``` Agent Loop: if step == 0: ACO.optimize() else: ACO.reassess() # mid-run adjustment ``` ## Benchmarking Your Own Traces ```bash # Generate benchmark python -m aco.benchmark --tasks 1000 --output ./results # Compare baselines python -m aco.benchmark --compare always_frontier always_cheap cascade full_optimizer # Run ablation study python -m aco.benchmark --ablate all ``` ## Dashboard ```bash # Launch Gradio dashboard python dashboard.py --results ./eval_results_v2/baseline_results.json ``` ## Trackio Integration ```python from aco.trackio_integration import ACOTrackioLogger logger = ACOTrackioLogger(project="aco-production", space_id="your-space") # Inside your agent loop logger.log_decision(run_id, decision, cost, success) logger.alert(run_id, "Cost spike", f"Step {step} cost ${cost:.3f}", "WARN") ``` ## Multi-Provider Setup ```python config = ACOConfig( models={ "gpt-4o": ModelConfig(..., provider="openai", api_key_env="OPENAI_API_KEY"), "claude-3.5-sonnet": ModelConfig(..., provider="anthropic", api_key_env="ANTHROPIC_API_KEY"), "deepseek-chat": ModelConfig(..., provider="deepseek", api_key_env="DEEPSEEK_API_KEY"), "local-qwen": ModelConfig(..., provider="local", base_url="http://localhost:8000/v1"), } ) ``` ## Safety Rules 1. **Legal/regulated tasks never go below tier 4** without explicit override 2. **Tool calls marked `requires_verification` always get a verifier** 3. **Irreversible actions trigger automatic frontier escalation** 4. **All routing decisions include reasoning strings for audit** 5. **Doom detector stops runs where cost exceeds 3x estimate** ## Performance Tuning | Parameter | Default | Tune When... | |-----------|---------|-------------| | `doom_max_cost_ratio` | 3.0 | Runs often terminate too early | | `doom_no_progress_steps` | 5 | Long-horizon tasks get killed | | `verifier_confidence_threshold` | 0.7 | Too many/few verifiers | | `max_context_fraction` | 0.8 | Context truncation issues | | `cache_prefix_max_tokens` | 8000 | Cache hit rate low | ## Monitoring Track these metrics in production: - Cost per successful task (primary) - Cost per artifact (secondary) - Task success rate by tier - Cache hit rate - Tool call efficiency (used vs called) - Verifier pass rate - Retry rate - False-DONE rate - Escalation rate - Doom detector precision/recall