# ACO Deployment Guide ## Quick Install ```bash pip install -e . ``` Or use directly: ```python from aco.optimizer import ACOOptimizer from aco.config import ACOConfig ``` ## CLI ```bash # Route a request to the optimal model aco route "Fix the auth bug in production" # → tier=5, model=specialist-expert, cost=$1.50 aco route "What is 2+2?" # → tier=2, model=cheap-cloud-8b, cost=$0.15 # Get context budget aco budget "Research transformer advances" # Check if a tool call is worth it aco gate web_search --task-type research # Check if verification is needed aco verify --risk high --confidence 0.7 # Show optimizer stats aco stats # Version aco version ``` ## Python API ### Basic Routing ```python from aco.optimizer import ACOOptimizer from aco.config import ACOConfig opt = ACOOptimizer(ACOConfig( router_model_path="router_models/router_bundle_v11.pkl" )) result = opt.start_run("Debug this critical production bug") print(result["routing"]) # tier, model_id, confidence, cost_estimate print(result["context_budget"]) # total_tokens, keep_exact, omit ``` ### With Execution Feedback ```python # Step 1: Route to cheap model result = opt.start_run("Fix the typo in README") # Step 2: Get cheap model's logprobs cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request) # Step 3: Decide whether to escalate cascade = opt.cascade_step( request=request, initial_tier=result["routing"]["tier"], cheap_logprobs=cheap_logprobs, cheap_response=cheap_response ) if cascade.escalated: # Run stronger model final_response = call_model(cascade.final_tier, request) else: final_response = cheap_response ``` ### Per-Step Routing ```python from aco.per_step_router import PerStepRouter ps = PerStepRouter(max_budget=2.0) for step in agent_steps: d = ps.route_step( action=step.description, step_num=step.number, has_prior_failures=step.had_errors, task_risk="medium" ) step.model_tier = d.adjusted_tier step.model_id = d.model_id step.estimated_cost = d.cost_estimate ``` ## Integration Examples ### LangChain Integration ```python from aco.optimizer import ACOOptimizer opt = ACOOptimizer() class ACORouter: def route(self, prompt: str) -> str: result = opt.start_run(prompt) return result["routing"]["model_id"] # Use with LangChain llm = ACORouter() chain = LLMChain(llm=llm, ...) ``` ### Custom Agent Harness ```python class CostAwareAgent: def __init__(self, max_budget=5.0): self.opt = ACOOptimizer() self.ps = PerStepRouter(max_budget=max_budget) def run(self, request): # Initial routing result = self.opt.start_run(request) tier = result["routing"]["tier"] model = result["routing"]["model_id"] # Per-step execution while not done and self.ps.budget_remaining > 0: step = self.plan_next_step() routing = self.ps.route_step( step.action, step.num, has_prior_failures=self.has_errors ) response = self.call_model(routing.model_id, step) # Check if we need to escalate if not response.success and routing.adjusted_tier < 5: cascade = self.opt.cascade_step( request, routing.adjusted_tier, response.logprobs, response.text ) if cascade.escalated: response = self.call_model(cascade.model_id, step) # Check doom doom = self.opt.check_doom(self.ps.total_spent) if doom.doomed: break trace = self.opt.end_run(success=done) return trace ``` ## Model Tier Reference | Tier | Model ID | Provider | Cost/1K tokens | Use For | |------|----------|----------|---------------|---------| | 1 | tiny-local-3b | local | $0.00 | Simple queries, search, read | | 2 | cheap-cloud-8b | cloud | $0.05 | Quick answers, simple edits | | 3 | medium-70b | cloud | $0.30 | Standard tasks, most coding | | 4 | frontier-latest | cloud | $1.00 | Complex tasks, critical paths | | 5 | specialist-expert | cloud | $1.50 | Legal, multi-step orchestration | ## Configuration ```yaml # config.yaml routing: safety_threshold: 0.30 downgrade_threshold: 0.90 max_retries: 3 max_cost_per_task: 5.0 models: tier1: model_id: tiny-local-3b provider: local cost_per_1k_input: 0.00 cost_per_1k_output: 0.00 tier4: model_id: frontier-latest provider: cloud cost_per_1k_input: 1.00 cost_per_1k_output: 3.00 task_floors: legal_regulated: 4 long_horizon: 3 coding: 3 quick_answer: 1 ``` ## Trace Format ```json { "trace_id": "abc123", "request": "Fix the auth bug", "task_type": "coding", "difficulty": 4, "predicted_tier": 5, "steps": [ { "step_num": 1, "model_call": { "model_id": "specialist-expert", "tier": 5, "input_tokens": 2000, "output_tokens": 500, "cost": 3.50 }, "tool_calls": [ {"tool_name": "code_search", "success": true, "cost": 0.01} ], "verifier_called": false } ], "final_outcome": "completed", "task_success": true, "total_cost": 3.51 } ``` ## Monitoring ### What to watch: - Cost per successful task (primary) - Success rate by tier (quality) - Escalation rate (routing accuracy) - Cache hit rate (prompt layout) - Verifier call rate (selectivity) - False-DONE rate (termination accuracy) ### Alerts: - Success rate < 70% → check routing thresholds - Cost per successful task > 2x frontier → check escalation logic - Verifier call rate > 50% → tighten verifier budgeter - Escalation rate > 30% → check task classifier