# ACO Deployment Guide

## Quick Install

```bash
pip install -e .
```

Or use directly:

```python
from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig
```

## CLI

```bash
# Route a request to the optimal model
aco route "Fix the auth bug in production"
# → tier=5, model=specialist-expert, cost=$1.50

aco route "What is 2+2?"
# → tier=2, model=cheap-cloud-8b, cost=$0.15

# Get context budget
aco budget "Research transformer advances"

# Check if a tool call is worth it
aco gate web_search --task-type research

# Check if verification is needed
aco verify --risk high --confidence 0.7

# Show optimizer stats
aco stats

# Version
aco version
```

## Python API

### Basic Routing

```python
from aco.optimizer import ACOOptimizer
from aco.config import ACOConfig

opt = ACOOptimizer(ACOConfig(
    router_model_path="router_models/router_bundle_v11.pkl"
))

result = opt.start_run("Debug this critical production bug")
print(result["routing"])  # tier, model_id, confidence, cost_estimate
print(result["context_budget"])  # total_tokens, keep_exact, omit
```

### With Execution Feedback

```python
# Step 1: Route to cheap model
result = opt.start_run("Fix the typo in README")

# Step 2: Get cheap model's logprobs
cheap_logprobs = get_model_logprobs(result["routing"]["model_id"], request)

# Step 3: Decide whether to escalate
cascade = opt.cascade_step(
    request=request,
    initial_tier=result["routing"]["tier"],
    cheap_logprobs=cheap_logprobs,
    cheap_response=cheap_response
)

if cascade.escalated:
    # Run stronger model
    final_response = call_model(cascade.final_tier, request)
else:
    final_response = cheap_response
```

### Per-Step Routing

```python
from aco.per_step_router import PerStepRouter

ps = PerStepRouter(max_budget=2.0)

for step in agent_steps:
    d = ps.route_step(
        action=step.description,
        step_num=step.number,
        has_prior_failures=step.had_errors,
        task_risk="medium"
    )
    step.model_tier = d.adjusted_tier
    step.model_id = d.model_id
    step.estimated_cost = d.cost_estimate
```

## Integration Examples

### LangChain Integration

```python
from aco.optimizer import ACOOptimizer

opt = ACOOptimizer()

class ACORouter:
    def route(self, prompt: str) -> str:
        result = opt.start_run(prompt)
        return result["routing"]["model_id"]

# Use with LangChain
llm = ACORouter()
chain = LLMChain(llm=llm, ...)
```

### Custom Agent Harness

```python
class CostAwareAgent:
    def __init__(self, max_budget=5.0):
        self.opt = ACOOptimizer()
        self.ps = PerStepRouter(max_budget=max_budget)

    def run(self, request):
        # Initial routing
        result = self.opt.start_run(request)
        tier = result["routing"]["tier"]
        model = result["routing"]["model_id"]

        # Per-step execution
        while not done and self.ps.budget_remaining > 0:
            step = self.plan_next_step()
            routing = self.ps.route_step(
                step.action, step.num,
                has_prior_failures=self.has_errors
            )
            response = self.call_model(routing.model_id, step)

            # Check if we need to escalate
            if not response.success and routing.adjusted_tier < 5:
                cascade = self.opt.cascade_step(
                    request, routing.adjusted_tier,
                    response.logprobs, response.text
                )
                if cascade.escalated:
                    response = self.call_model(cascade.model_id, step)

            # Check doom
            doom = self.opt.check_doom(self.ps.total_spent)
            if doom.doomed:
                break

        trace = self.opt.end_run(success=done)
        return trace
```

## Model Tier Reference

| Tier | Model ID | Provider | Cost/1K tokens | Use For |
|------|----------|----------|---------------|---------|
| 1 | tiny-local-3b | local | $0.00 | Simple queries, search, read |
| 2 | cheap-cloud-8b | cloud | $0.05 | Quick answers, simple edits |
| 3 | medium-70b | cloud | $0.30 | Standard tasks, most coding |
| 4 | frontier-latest | cloud | $1.00 | Complex tasks, critical paths |
| 5 | specialist-expert | cloud | $1.50 | Legal, multi-step orchestration |

## Configuration

```yaml
# config.yaml
routing:
  safety_threshold: 0.30
  downgrade_threshold: 0.90
  max_retries: 3
  max_cost_per_task: 5.0

models:
  tier1:
    model_id: tiny-local-3b
    provider: local
    cost_per_1k_input: 0.00
    cost_per_1k_output: 0.00
  tier4:
    model_id: frontier-latest
    provider: cloud
    cost_per_1k_input: 1.00
    cost_per_1k_output: 3.00

task_floors:
  legal_regulated: 4
  long_horizon: 3
  coding: 3
  quick_answer: 1
```

## Trace Format

```json
{
  "trace_id": "abc123",
  "request": "Fix the auth bug",
  "task_type": "coding",
  "difficulty": 4,
  "predicted_tier": 5,
  "steps": [
    {
      "step_num": 1,
      "model_call": {
        "model_id": "specialist-expert",
        "tier": 5,
        "input_tokens": 2000,
        "output_tokens": 500,
        "cost": 3.50
      },
      "tool_calls": [
        {"tool_name": "code_search", "success": true, "cost": 0.01}
      ],
      "verifier_called": false
    }
  ],
  "final_outcome": "completed",
  "task_success": true,
  "total_cost": 3.51
}
```

## Monitoring

### What to watch:
- Cost per successful task (primary)
- Success rate by tier (quality)
- Escalation rate (routing accuracy)
- Cache hit rate (prompt layout)
- Verifier call rate (selectivity)
- False-DONE rate (termination accuracy)

### Alerts:
- Success rate < 70% → check routing thresholds
- Cost per successful task > 2x frontier → check escalation logic
- Verifier call rate > 50% → tighten verifier budgeter
- Escalation rate > 30% → check task classifier