agent-cost-optimizer / docs /deployment_guide.md
narcolepticchicken's picture
Upload docs/deployment_guide.md
1a611f6 verified

ACO Deployment Guide

Quick Install

pip install -e .

Or use directly:

from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig

CLI

# Route a request to the optimal model
aco route "Fix the auth bug in production"
# → tier=5, model=specialist-expert, cost=$1.50

aco route "What is 2+2?"
# → tier=2, model=cheap-cloud-8b, cost=$0.15

# Get context budget
aco budget "Research transformer advances"

# Check if a tool call is worth it
aco gate web_search --task-type research

# Check if verification is needed
aco verify --risk high --confidence 0.7

# Show optimizer stats
aco stats

# Version
aco version

Python API

Basic Routing

from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig

opt = ACOOptimizer(ACOConfig(
    router_model_path="router_models/router_bundle_v11.pkl"
))

result = opt.start_run("Debug this critical production bug")
print(result["routing"])  # tier, model_id, confidence, cost_estimate
print(result["context_budget"])  # total_tokens, keep_exact, omit

With Execution Feedback

# Step 1: Route to cheap model
result = opt.start_run("Fix the typo in README")

# Step 2: Get cheap model's logprobs
cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request)

# Step 3: Decide whether to escalate
cascade = opt.cascade_step(
    request=request,
    initial_tier=result["routing"]["tier"],
    cheap_logprobs=cheap_logprobs,
    cheap_response=cheap_response
)

if cascade.escalated:
    # Run stronger model
    final_response = call_model(cascade.final_tier, request)
else:
    final_response = cheap_response

Per-Step Routing

from aco.per_step_router import PerStepRouter

ps = PerStepRouter(max_budget=2.0)

for step in agent_steps:
    d = ps.route_step(
        action=step.description,
        step_num=step.number,
        has_prior_failures=step.had_errors,
        task_risk="medium"
    )
    step.model_tier = d.adjusted_tier
    step.model_id = d.model_id
    step.estimated_cost = d.cost_estimate

Integration Examples

LangChain Integration

from aco.optimizer import ACOOptimizer

opt = ACOOptimizer()

class ACORouter:
    def route(self, prompt: str) -> str:
        result = opt.start_run(prompt)
        return result["routing"]["model_id"]

# Use with LangChain
llm = ACORouter()
chain = LLMChain(llm=llm, ...)

Custom Agent Harness

class CostAwareAgent:
    def __init__(self, max_budget=5.0):
        self.opt = ACOOptimizer()
        self.ps = PerStepRouter(max_budget=max_budget)

    def run(self, request):
        # Initial routing
        result = self.opt.start_run(request)
        tier = result["routing"]["tier"]
        model = result["routing"]["model_id"]

        # Per-step execution
        while not done and self.ps.budget_remaining > 0:
            step = self.plan_next_step()
            routing = self.ps.route_step(
                step.action, step.num,
                has_prior_failures=self.has_errors
            )
            response = self.call_model(routing.model_id, step)

            # Check if we need to escalate
            if not response.success and routing.adjusted_tier < 5:
                cascade = self.opt.cascade_step(
                    request, routing.adjusted_tier,
                    response.logprobs, response.text
                )
                if cascade.escalated:
                    response = self.call_model(cascade.model_id, step)

            # Check doom
            doom = self.opt.check_doom(self.ps.total_spent)
            if doom.doomed:
                break

        trace = self.opt.end_run(success=done)
        return trace

Model Tier Reference

Tier Model ID Provider Cost/1K tokens Use For
1 tiny-local-3b local $0.00 Simple queries, search, read
2 cheap-cloud-8b cloud $0.05 Quick answers, simple edits
3 medium-70b cloud $0.30 Standard tasks, most coding
4 frontier-latest cloud $1.00 Complex tasks, critical paths
5 specialist-expert cloud $1.50 Legal, multi-step orchestration

Configuration

# config.yaml
routing:
  safety_threshold: 0.30
  downgrade_threshold: 0.90
  max_retries: 3
  max_cost_per_task: 5.0

models:
  tier1:
    model_id: tiny-local-3b
    provider: local
    cost_per_1k_input: 0.00
    cost_per_1k_output: 0.00
  tier4:
    model_id: frontier-latest
    provider: cloud
    cost_per_1k_input: 1.00
    cost_per_1k_output: 3.00

task_floors:
  legal_regulated: 4
  long_horizon: 3
  coding: 3
  quick_answer: 1

Trace Format

{
  "trace_id": "abc123",
  "request": "Fix the auth bug",
  "task_type": "coding",
  "difficulty": 4,
  "predicted_tier": 5,
  "steps": [
    {
      "step_num": 1,
      "model_call": {
        "model_id": "specialist-expert",
        "tier": 5,
        "input_tokens": 2000,
        "output_tokens": 500,
        "cost": 3.50
      },
      "tool_calls": [
        {"tool_name": "code_search", "success": true, "cost": 0.01}
      ],
      "verifier_called": false
    }
  ],
  "final_outcome": "completed",
  "task_success": true,
  "total_cost": 3.51
}

Monitoring

What to watch:

  • Cost per successful task (primary)
  • Success rate by tier (quality)
  • Escalation rate (routing accuracy)
  • Cache hit rate (prompt layout)
  • Verifier call rate (selectivity)
  • False-DONE rate (termination accuracy)

Alerts:

  • Success rate < 70% → check routing thresholds
  • Cost per successful task > 2x frontier → check escalation logic
  • Verifier call rate > 50% → tighten verifier budgeter
  • Escalation rate > 30% → check task classifier