agent-cost-optimizer / docs /deployment_guide.md

Upload docs/deployment_guide.md

1a611f6 verified about 14 hours ago

5.79 kB

	# ACO Deployment Guide

	## Quick Install

	```bash
	pip install -e .
	```

	Or use directly:

	```python
	from aco.optimizer import ACOOptimizer
	from aco.config import ACOConfig
	```

	## CLI

	```bash
	# Route a request to the optimal model
	aco route "Fix the auth bug in production"
	# → tier=5, model=specialist-expert, cost=$1.50

	aco route "What is 2+2?"
	# → tier=2, model=cheap-cloud-8b, cost=$0.15

	# Get context budget
	aco budget "Research transformer advances"

	# Check if a tool call is worth it
	aco gate web_search --task-type research

	# Check if verification is needed
	aco verify --risk high --confidence 0.7

	# Show optimizer stats
	aco stats

	# Version
	aco version
	```

	## Python API

	### Basic Routing

	```python
	from aco.optimizer import ACOOptimizer
	from aco.config import ACOConfig

	opt = ACOOptimizer(ACOConfig(
	router_model_path="router_models/router_bundle_v11.pkl"
	))

	result = opt.start_run("Debug this critical production bug")
	print(result["routing"]) # tier, model_id, confidence, cost_estimate
	print(result["context_budget"]) # total_tokens, keep_exact, omit
	```

	### With Execution Feedback

	```python
	# Step 1: Route to cheap model
	result = opt.start_run("Fix the typo in README")

	# Step 2: Get cheap model's logprobs
	cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request)

	# Step 3: Decide whether to escalate
	cascade = opt.cascade_step(
	request=request,
	initial_tier=result["routing"]["tier"],
	cheap_logprobs=cheap_logprobs,
	cheap_response=cheap_response
	)

	if cascade.escalated:
	# Run stronger model
	final_response = call_model(cascade.final_tier, request)
	else:
	final_response = cheap_response
	```

	### Per-Step Routing

	```python
	from aco.per_step_router import PerStepRouter

	ps = PerStepRouter(max_budget=2.0)

	for step in agent_steps:
	d = ps.route_step(
	action=step.description,
	step_num=step.number,
	has_prior_failures=step.had_errors,
	task_risk="medium"
	)
	step.model_tier = d.adjusted_tier
	step.model_id = d.model_id
	step.estimated_cost = d.cost_estimate
	```

	## Integration Examples

	### LangChain Integration

	```python
	from aco.optimizer import ACOOptimizer

	opt = ACOOptimizer()

	class ACORouter:
	def route(self, prompt: str) -> str:
	result = opt.start_run(prompt)
	return result["routing"]["model_id"]

	# Use with LangChain
	llm = ACORouter()
	chain = LLMChain(llm=llm, ...)
	```

	### Custom Agent Harness

	```python
	class CostAwareAgent:
	def __init__(self, max_budget=5.0):
	self.opt = ACOOptimizer()
	self.ps = PerStepRouter(max_budget=max_budget)

	def run(self, request):
	# Initial routing
	result = self.opt.start_run(request)
	tier = result["routing"]["tier"]
	model = result["routing"]["model_id"]

	# Per-step execution
	while not done and self.ps.budget_remaining > 0:
	step = self.plan_next_step()
	routing = self.ps.route_step(
	step.action, step.num,
	has_prior_failures=self.has_errors
	)
	response = self.call_model(routing.model_id, step)

	# Check if we need to escalate
	if not response.success and routing.adjusted_tier < 5:
	cascade = self.opt.cascade_step(
	request, routing.adjusted_tier,
	response.logprobs, response.text
	)
	if cascade.escalated:
	response = self.call_model(cascade.model_id, step)

	# Check doom
	doom = self.opt.check_doom(self.ps.total_spent)
	if doom.doomed:
	break

	trace = self.opt.end_run(success=done)
	return trace
	```

	## Model Tier Reference

	\| Tier \| Model ID \| Provider \| Cost/1K tokens \| Use For \|
	\|------\|----------\|----------\|---------------\|---------\|
	\| 1 \| tiny-local-3b \| local \| $0.00 \| Simple queries, search, read \|
	\| 2 \| cheap-cloud-8b \| cloud \| $0.05 \| Quick answers, simple edits \|
	\| 3 \| medium-70b \| cloud \| $0.30 \| Standard tasks, most coding \|
	\| 4 \| frontier-latest \| cloud \| $1.00 \| Complex tasks, critical paths \|
	\| 5 \| specialist-expert \| cloud \| $1.50 \| Legal, multi-step orchestration \|

	## Configuration

	```yaml
	# config.yaml
	routing:
	safety_threshold: 0.30
	downgrade_threshold: 0.90
	max_retries: 3
	max_cost_per_task: 5.0

	models:
	tier1:
	model_id: tiny-local-3b
	provider: local
	cost_per_1k_input: 0.00
	cost_per_1k_output: 0.00
	tier4:
	model_id: frontier-latest
	provider: cloud
	cost_per_1k_input: 1.00
	cost_per_1k_output: 3.00

	task_floors:
	legal_regulated: 4
	long_horizon: 3
	coding: 3
	quick_answer: 1
	```

	## Trace Format

	```json
	{
	"trace_id": "abc123",
	"request": "Fix the auth bug",
	"task_type": "coding",
	"difficulty": 4,
	"predicted_tier": 5,
	"steps": [
	{
	"step_num": 1,
	"model_call": {
	"model_id": "specialist-expert",
	"tier": 5,
	"input_tokens": 2000,
	"output_tokens": 500,
	"cost": 3.50
	},
	"tool_calls": [
	{"tool_name": "code_search", "success": true, "cost": 0.01}
	],
	"verifier_called": false
	}
	],
	"final_outcome": "completed",
	"task_success": true,
	"total_cost": 3.51
	}
	```

	## Monitoring

	### What to watch:
	- Cost per successful task (primary)
	- Success rate by tier (quality)
	- Escalation rate (routing accuracy)
	- Cache hit rate (prompt layout)
	- Verifier call rate (selectivity)
	- False-DONE rate (termination accuracy)

	### Alerts:
	- Success rate < 70% → check routing thresholds
	- Cost per successful task > 2x frontier → check escalation logic
	- Verifier call rate > 50% → tighten verifier budgeter
	- Escalation rate > 30% → check task classifier