| # ACO Deployment Guide |
|
|
| ## Quick Install |
|
|
| ```bash |
| pip install -e . |
| ``` |
|
|
| Or use directly: |
|
|
| ```python |
| from aco.optimizer import ACOOptimizer |
| from aco.config import ACOConfig |
| ``` |
|
|
| ## CLI |
|
|
| ```bash |
| # Route a request to the optimal model |
| aco route "Fix the auth bug in production" |
| # → tier=5, model=specialist-expert, cost=$1.50 |
| |
| aco route "What is 2+2?" |
| # → tier=2, model=cheap-cloud-8b, cost=$0.15 |
| |
| # Get context budget |
| aco budget "Research transformer advances" |
| |
| # Check if a tool call is worth it |
| aco gate web_search --task-type research |
| |
| # Check if verification is needed |
| aco verify --risk high --confidence 0.7 |
| |
| # Show optimizer stats |
| aco stats |
| |
| # Version |
| aco version |
| ``` |
|
|
| ## Python API |
|
|
| ### Basic Routing |
|
|
| ```python |
| from aco.optimizer import ACOOptimizer |
| from aco.config import ACOConfig |
| |
| opt = ACOOptimizer(ACOConfig( |
| router_model_path="router_models/router_bundle_v11.pkl" |
| )) |
| |
| result = opt.start_run("Debug this critical production bug") |
| print(result["routing"]) # tier, model_id, confidence, cost_estimate |
| print(result["context_budget"]) # total_tokens, keep_exact, omit |
| ``` |
|
|
| ### With Execution Feedback |
|
|
| ```python |
| # Step 1: Route to cheap model |
| result = opt.start_run("Fix the typo in README") |
| |
| # Step 2: Get cheap model's logprobs |
| cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request) |
| |
| # Step 3: Decide whether to escalate |
| cascade = opt.cascade_step( |
| request=request, |
| initial_tier=result["routing"]["tier"], |
| cheap_logprobs=cheap_logprobs, |
| cheap_response=cheap_response |
| ) |
| |
| if cascade.escalated: |
| # Run stronger model |
| final_response = call_model(cascade.final_tier, request) |
| else: |
| final_response = cheap_response |
| ``` |
|
|
| ### Per-Step Routing |
|
|
| ```python |
| from aco.per_step_router import PerStepRouter |
| |
| ps = PerStepRouter(max_budget=2.0) |
| |
| for step in agent_steps: |
| d = ps.route_step( |
| action=step.description, |
| step_num=step.number, |
| has_prior_failures=step.had_errors, |
| task_risk="medium" |
| ) |
| step.model_tier = d.adjusted_tier |
| step.model_id = d.model_id |
| step.estimated_cost = d.cost_estimate |
| ``` |
|
|
| ## Integration Examples |
|
|
| ### LangChain Integration |
|
|
| ```python |
| from aco.optimizer import ACOOptimizer |
| |
| opt = ACOOptimizer() |
| |
| class ACORouter: |
| def route(self, prompt: str) -> str: |
| result = opt.start_run(prompt) |
| return result["routing"]["model_id"] |
| |
| # Use with LangChain |
| llm = ACORouter() |
| chain = LLMChain(llm=llm, ...) |
| ``` |
|
|
| ### Custom Agent Harness |
|
|
| ```python |
| class CostAwareAgent: |
| def __init__(self, max_budget=5.0): |
| self.opt = ACOOptimizer() |
| self.ps = PerStepRouter(max_budget=max_budget) |
| |
| def run(self, request): |
| # Initial routing |
| result = self.opt.start_run(request) |
| tier = result["routing"]["tier"] |
| model = result["routing"]["model_id"] |
| |
| # Per-step execution |
| while not done and self.ps.budget_remaining > 0: |
| step = self.plan_next_step() |
| routing = self.ps.route_step( |
| step.action, step.num, |
| has_prior_failures=self.has_errors |
| ) |
| response = self.call_model(routing.model_id, step) |
| |
| # Check if we need to escalate |
| if not response.success and routing.adjusted_tier < 5: |
| cascade = self.opt.cascade_step( |
| request, routing.adjusted_tier, |
| response.logprobs, response.text |
| ) |
| if cascade.escalated: |
| response = self.call_model(cascade.model_id, step) |
| |
| # Check doom |
| doom = self.opt.check_doom(self.ps.total_spent) |
| if doom.doomed: |
| break |
| |
| trace = self.opt.end_run(success=done) |
| return trace |
| ``` |
|
|
| ## Model Tier Reference |
|
|
| | Tier | Model ID | Provider | Cost/1K tokens | Use For | |
| |------|----------|----------|---------------|---------| |
| | 1 | tiny-local-3b | local | $0.00 | Simple queries, search, read | |
| | 2 | cheap-cloud-8b | cloud | $0.05 | Quick answers, simple edits | |
| | 3 | medium-70b | cloud | $0.30 | Standard tasks, most coding | |
| | 4 | frontier-latest | cloud | $1.00 | Complex tasks, critical paths | |
| | 5 | specialist-expert | cloud | $1.50 | Legal, multi-step orchestration | |
|
|
| ## Configuration |
|
|
| ```yaml |
| # config.yaml |
| routing: |
| safety_threshold: 0.30 |
| downgrade_threshold: 0.90 |
| max_retries: 3 |
| max_cost_per_task: 5.0 |
| |
| models: |
| tier1: |
| model_id: tiny-local-3b |
| provider: local |
| cost_per_1k_input: 0.00 |
| cost_per_1k_output: 0.00 |
| tier4: |
| model_id: frontier-latest |
| provider: cloud |
| cost_per_1k_input: 1.00 |
| cost_per_1k_output: 3.00 |
| |
| task_floors: |
| legal_regulated: 4 |
| long_horizon: 3 |
| coding: 3 |
| quick_answer: 1 |
| ``` |
|
|
| ## Trace Format |
|
|
| ```json |
| { |
| "trace_id": "abc123", |
| "request": "Fix the auth bug", |
| "task_type": "coding", |
| "difficulty": 4, |
| "predicted_tier": 5, |
| "steps": [ |
| { |
| "step_num": 1, |
| "model_call": { |
| "model_id": "specialist-expert", |
| "tier": 5, |
| "input_tokens": 2000, |
| "output_tokens": 500, |
| "cost": 3.50 |
| }, |
| "tool_calls": [ |
| {"tool_name": "code_search", "success": true, "cost": 0.01} |
| ], |
| "verifier_called": false |
| } |
| ], |
| "final_outcome": "completed", |
| "task_success": true, |
| "total_cost": 3.51 |
| } |
| ``` |
|
|
| ## Monitoring |
|
|
| ### What to watch: |
| - Cost per successful task (primary) |
| - Success rate by tier (quality) |
| - Escalation rate (routing accuracy) |
| - Cache hit rate (prompt layout) |
| - Verifier call rate (selectivity) |
| - False-DONE rate (termination accuracy) |
|
|
| ### Alerts: |
| - Success rate < 70% → check routing thresholds |
| - Cost per successful task > 2x frontier → check escalation logic |
| - Verifier call rate > 50% → tighten verifier budgeter |
| - Escalation rate > 30% → check task classifier |
|
|