ACO Deployment Guide
Quick Install
pip install -e .
Or use directly:
from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig
CLI
# Route a request to the optimal model
aco route "Fix the auth bug in production"
# → tier=5, model=specialist-expert, cost=$1.50
aco route "What is 2+2?"
# → tier=2, model=cheap-cloud-8b, cost=$0.15
# Get context budget
aco budget "Research transformer advances"
# Check if a tool call is worth it
aco gate web_search --task-type research
# Check if verification is needed
aco verify --risk high --confidence 0.7
# Show optimizer stats
aco stats
# Version
aco version
Python API
Basic Routing
from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig
opt = ACOOptimizer(ACOConfig(
router_model_path="router_models/router_bundle_v11.pkl"
))
result = opt.start_run("Debug this critical production bug")
print(result["routing"]) # tier, model_id, confidence, cost_estimate
print(result["context_budget"]) # total_tokens, keep_exact, omit
With Execution Feedback
# Step 1: Route to cheap model
result = opt.start_run("Fix the typo in README")
# Step 2: Get cheap model's logprobs
cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request)
# Step 3: Decide whether to escalate
cascade = opt.cascade_step(
request=request,
initial_tier=result["routing"]["tier"],
cheap_logprobs=cheap_logprobs,
cheap_response=cheap_response
)
if cascade.escalated:
# Run stronger model
final_response = call_model(cascade.final_tier, request)
else:
final_response = cheap_response
Per-Step Routing
from aco.per_step_router import PerStepRouter
ps = PerStepRouter(max_budget=2.0)
for step in agent_steps:
d = ps.route_step(
action=step.description,
step_num=step.number,
has_prior_failures=step.had_errors,
task_risk="medium"
)
step.model_tier = d.adjusted_tier
step.model_id = d.model_id
step.estimated_cost = d.cost_estimate
Integration Examples
LangChain Integration
from aco.optimizer import ACOOptimizer
opt = ACOOptimizer()
class ACORouter:
def route(self, prompt: str) -> str:
result = opt.start_run(prompt)
return result["routing"]["model_id"]
# Use with LangChain
llm = ACORouter()
chain = LLMChain(llm=llm, ...)
Custom Agent Harness
class CostAwareAgent:
def __init__(self, max_budget=5.0):
self.opt = ACOOptimizer()
self.ps = PerStepRouter(max_budget=max_budget)
def run(self, request):
# Initial routing
result = self.opt.start_run(request)
tier = result["routing"]["tier"]
model = result["routing"]["model_id"]
# Per-step execution
while not done and self.ps.budget_remaining > 0:
step = self.plan_next_step()
routing = self.ps.route_step(
step.action, step.num,
has_prior_failures=self.has_errors
)
response = self.call_model(routing.model_id, step)
# Check if we need to escalate
if not response.success and routing.adjusted_tier < 5:
cascade = self.opt.cascade_step(
request, routing.adjusted_tier,
response.logprobs, response.text
)
if cascade.escalated:
response = self.call_model(cascade.model_id, step)
# Check doom
doom = self.opt.check_doom(self.ps.total_spent)
if doom.doomed:
break
trace = self.opt.end_run(success=done)
return trace
Model Tier Reference
| Tier | Model ID | Provider | Cost/1K tokens | Use For |
|---|---|---|---|---|
| 1 | tiny-local-3b | local | $0.00 | Simple queries, search, read |
| 2 | cheap-cloud-8b | cloud | $0.05 | Quick answers, simple edits |
| 3 | medium-70b | cloud | $0.30 | Standard tasks, most coding |
| 4 | frontier-latest | cloud | $1.00 | Complex tasks, critical paths |
| 5 | specialist-expert | cloud | $1.50 | Legal, multi-step orchestration |
Configuration
# config.yaml
routing:
safety_threshold: 0.30
downgrade_threshold: 0.90
max_retries: 3
max_cost_per_task: 5.0
models:
tier1:
model_id: tiny-local-3b
provider: local
cost_per_1k_input: 0.00
cost_per_1k_output: 0.00
tier4:
model_id: frontier-latest
provider: cloud
cost_per_1k_input: 1.00
cost_per_1k_output: 3.00
task_floors:
legal_regulated: 4
long_horizon: 3
coding: 3
quick_answer: 1
Trace Format
{
"trace_id": "abc123",
"request": "Fix the auth bug",
"task_type": "coding",
"difficulty": 4,
"predicted_tier": 5,
"steps": [
{
"step_num": 1,
"model_call": {
"model_id": "specialist-expert",
"tier": 5,
"input_tokens": 2000,
"output_tokens": 500,
"cost": 3.50
},
"tool_calls": [
{"tool_name": "code_search", "success": true, "cost": 0.01}
],
"verifier_called": false
}
],
"final_outcome": "completed",
"task_success": true,
"total_cost": 3.51
}
Monitoring
What to watch:
- Cost per successful task (primary)
- Success rate by tier (quality)
- Escalation rate (routing accuracy)
- Cache hit rate (prompt layout)
- Verifier call rate (selectivity)
- False-DONE rate (termination accuracy)
Alerts:
- Success rate < 70% → check routing thresholds
- Cost per successful task > 2x frontier → check escalation logic
- Verifier call rate > 50% → tighten verifier budgeter
- Escalation rate > 30% → check task classifier