| --- |
| tags: |
| - ml-intern |
| --- |
| # ACO: Agent Cost Optimizer |
|
|
| A universal control layer that bolts onto any agent harness to reduce total cost while preserving task quality. |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install -e . |
| aco route "Debug this critical production bug" |
| aco budget "Research transformer advances" |
| aco gate web_search --task-type research |
| aco verify --risk high --confidence 0.7 |
| aco stats |
| aco version |
| ``` |
|
|
| ## Results |
|
|
| On 2,000 synthetic traces across 9 task types: |
|
|
| | Router | Success | AvgCost | CostRed | |
| |--------|---------|---------|---------| |
| | always_frontier | 91.0% | $1.04 | baseline | |
| | heuristic | 84.5% | $0.92 | 11.6% | |
| | **ACO v8** | **79.6%** | **$0.78** | **25.3%** | |
| | always_cheap | 29.8% | $0.07 | 93.1% | |
|
|
| Key: ACO achieves 25% cost reduction. The verifier budgeter alone eliminates 88% of unnecessary verifications (238/2000 vs 2000/2000). |
|
|
| ## The 10 Modules |
|
|
| 1. **Cost Telemetry Collector** - Normalized JSON trace schema |
| 2. **Task Cost Classifier** - Predicts task type, difficulty, risk |
| 3. **Model Cascade Router** - Dynamic difficulty + ML confirmation + safety floors |
| 4. **Context Budgeter** - Adaptive context allocation by task type |
| 5. **Cache-Aware Prompt Layout** - Prefix-cache reuse optimization |
| 6. **Tool-Use Cost Gate** - Skip/batch/cache tool calls |
| 7. **Verifier Budgeter** - Selective verification (high-risk only) |
| 8. **Retry/Recovery Optimizer** - Failure-specific recovery actions |
| 9. **Meta-Tool Miner** - Compress repeated workflows |
| 10. **Doom Detector** - Early termination for failing runs |
|
|
| ## Router Architecture (v8) |
|
|
| ``` |
| 1. Dynamic difficulty = base(task_type) + adjust(request_keywords) |
| 2. base_tier = min(difficulty + 1, 5) |
| 3. base_tier = max(base_tier, TASK_FLOOR[task_type]) |
| 4. If P(success@base_tier) < 0.30 → ESCALATE (safety net) |
| 5. If P(success@tier-1) >= 0.90 → DOWNGRADE (cost saver) |
| 6. Never below floor, never above 5 |
| ``` |
|
|
| Per-task safety floors prevent unsafe cheap-model routing on critical tasks. |
|
|
| ## License |
|
|
| MIT |
|
|
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
|
|
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
|
|
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = 'narcolepticchicken/agent-cost-optimizer' |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
|
|
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
|
|