Agent Cost Optimizer - Deployment Guide

Overview

The Agent Cost Optimizer (ACO) is a control layer that sits in front of, around, or inside any agent harness. It does not replace your agent — it optimizes how your agent runs.

Installation

# Clone the repository
git clone https://huggingface.co/narcolepticchicken/agent-cost-optimizer
cd agent-cost-optimizer

# Install dependencies
pip install -e .

# Optional: Gradio dashboard
pip install gradio

# Optional: Trackio monitoring
pip install trackio

Quick Start

from aco import AgentCostOptimizer
from aco.config import ACOConfig, ModelConfig, RoutingPolicy

# 1. Define your available models with real pricing
config = ACOConfig(
    models={
        "gpt-4o-mini": ModelConfig(
            model_id="gpt-4o-mini", provider="openai",
            cost_per_1k_input=0.00015, cost_per_1k_output=0.0006,
            strength_tier=2, max_context=128000,
        ),
        "gpt-4o": ModelConfig(
            model_id="gpt-4o", provider="openai",
            cost_per_1k_input=0.0025, cost_per_1k_output=0.01,
            strength_tier=4, max_context=128000,
        ),
        "deepseek-chat": ModelConfig(
            model_id="deepseek-chat", provider="deepseek",
            cost_per_1k_input=0.00014, cost_per_1k_output=0.00028,
            strength_tier=3, max_context=64000,
            cache_discount_rate=0.5,
        ),
    },
    routing_policy=RoutingPolicy("cascade"),
)

# 2. Initialize optimizer
optimizer = AgentCostOptimizer(config)

# 3. Before each agent step, call optimize()
request = "Write a Python function to reverse a linked list"
run_state = {
    "trace_id": "run-001",
    "planned_tools": [("file_read", {"path": "linked_list.py"})],
    "previous_tool_calls": [],
    "current_cost": 0.0,
    "step_number": 1,
    "total_steps": 3,
    "is_irreversible": False,
    "routing_mode": "cascade",
}

result = optimizer.optimize(request, run_state)

# 4. Use the decisions
print(f"Use model: {result.routing_decision.model_id}")
print(f"Max tokens: {result.routing_decision.max_tokens}")
print(f"Temperature: {result.routing_decision.temperature}")
print(f"Estimated cost: ${result.estimated_cost:.4f}")

# 5. After execution, record actual costs
optimizer.record_step(
    trace_id=result.trace_id,
    model_call=ModelCall(
        model_id=result.routing_decision.model_id,
        provider=result.routing_decision.provider,
        input_tokens=2000,
        output_tokens=800,
        latency_ms=1200,
    ),
    tool_calls=[ToolCall(tool_name="file_read", tool_input={"path": "linked_list.py"},
                          tool_cost=0.001, tool_latency_ms=300)],
    context_size_tokens=2500,
    step_outcome=Outcome.SUCCESS,
)

# 6. Finalize trace
optimizer.finalize_trace(result.trace_id, outcome=Outcome.SUCCESS)

Configuration

Model Tiers

Tier	Typical Models	Cost	Strength	When to Use
1	Local Qwen-0.5B, Phi-1	Near-zero	35%	Factual QA, simple extraction
2	GPT-4o-mini, Claude-3.5-Haiku, DeepSeek	$0.15/M tok	55%	Drafting, classification, parsing
3	Claude-3.5-Sonnet, DeepSeek-V2	$1.5-3/M tok	80%	Coding, reasoning, research
4	GPT-4o, Claude-3-Opus	$2.5-5/M tok	93%	Complex analysis, legal, creative
5	o1, o3-mini, specialist	$3-15/M tok	97%	Math, safety-critical, adversarial

Routing Modes

cheapest: Always use lowest-cost model (dangerous, only for internal tools)
strongest: Always use frontier (expensive, maximum quality)
cascade: Try cheap first, escalate on low confidence
risk_based: Route by predicted task risk
adaptive: Learn from trace history

Integration Patterns

Pattern A: Front Proxy (Pre-Step)

User Request → ACO.optimize() → [Decisions] → Agent Harness → LLM API

Pattern B: Around Wrapper (Pre + Post)

User Request → ACO.optimize() → Agent Step → ACO.record_step() → Next Step

Pattern C: Inside Agent (Per-Step)

Agent Loop:
  if step == 0: ACO.optimize()
  else: ACO.reassess()  # mid-run adjustment

Benchmarking Your Own Traces

# Generate benchmark
python -m aco.benchmark --tasks 1000 --output ./results

# Compare baselines
python -m aco.benchmark --compare always_frontier always_cheap cascade full_optimizer

# Run ablation study
python -m aco.benchmark --ablate all

Dashboard

# Launch Gradio dashboard
python dashboard.py --results ./eval_results_v2/baseline_results.json

Trackio Integration

from aco.trackio_integration import ACOTrackioLogger

logger = ACOTrackioLogger(project="aco-production", space_id="your-space")

# Inside your agent loop
logger.log_decision(run_id, decision, cost, success)
logger.alert(run_id, "Cost spike", f"Step {step} cost ${cost:.3f}", "WARN")

Multi-Provider Setup

config = ACOConfig(
    models={
        "gpt-4o": ModelConfig(..., provider="openai", api_key_env="OPENAI_API_KEY"),
        "claude-3.5-sonnet": ModelConfig(..., provider="anthropic", api_key_env="ANTHROPIC_API_KEY"),
        "deepseek-chat": ModelConfig(..., provider="deepseek", api_key_env="DEEPSEEK_API_KEY"),
        "local-qwen": ModelConfig(..., provider="local", base_url="http://localhost:8000/v1"),
    }
)

Safety Rules

Legal/regulated tasks never go below tier 4 without explicit override
Tool calls marked requires_verification always get a verifier
Irreversible actions trigger automatic frontier escalation
All routing decisions include reasoning strings for audit
Doom detector stops runs where cost exceeds 3x estimate

Performance Tuning

Parameter	Default	Tune When...
`doom_max_cost_ratio`	3.0	Runs often terminate too early
`doom_no_progress_steps`	5	Long-horizon tasks get killed
`verifier_confidence_threshold`	0.7	Too many/few verifiers
`max_context_fraction`	0.8	Context truncation issues
`cache_prefix_max_tokens`	8000	Cache hit rate low

Monitoring

Track these metrics in production:

Cost per successful task (primary)
Cost per artifact (secondary)
Task success rate by tier
Cache hit rate
Tool call efficiency (used vs called)
Verifier pass rate
Retry rate
False-DONE rate
Escalation rate
Doom detector precision/recall