narcolepticchicken
/

agent-cost-optimizer

Safetensors

Model card Files Files and versions

xet

Community

narcolepticchicken commited on about 20 hours ago

Commit

fed7e5a

verified ·

1 Parent(s): 943946b

Upload docs/model_card.md

Browse files

Files changed (1) hide show

docs/model_card.md +82 -53

docs/model_card.md CHANGED Viewed

@@ -1,75 +1,87 @@
-# Model Card: Agent Cost Optimizer
 ## Model Details
 **Model Name:** Agent Cost Optimizer (ACO)
 **Organization:** Open-source community project
-**Model Type:** Decision system / control layer (not a generative model)
-**Architecture:** Modular rule-based + heuristic + extensible learned components
 **Date:** 2025-07-05
 **License:** MIT
-## Model Description
-The Agent Cost Optimizer is a universal control layer for reducing the total cost of autonomous agent runs while preserving task quality. It is not a single neural model but a **compound optimization system** comprising 10 interlocking modules that jointly decide:
-- Which model to use for each task
-- How much context to include
-- How to structure prompts for cache reuse
-- Whether to call tools
-- Whether to verify outputs
-- How to recover from failures
-- Whether to compress workflows into meta-tools
-- Whether a run is doomed and should be stopped
 ## Intended Use
-- **Primary Use:** Bolt onto any autonomous agent harness (LangChain, AutoGPT, OpenAI Assistants, custom agents) to reduce API costs
-- **Secondary Use:** Benchmark cost-quality tradeoffs across agent configurations
-- **Out-of-Scope:** Not a replacement for agent reasoning; does not generate text or code directly
-## System Architecture
-| Module | Function | Cost Reduction |
-|--------|----------|----------------|
-| Cost Telemetry Collector | Structured trace collection | Enables learning |
-| Task Cost Classifier | Predicts task type, cost, risk | Pre-allocates budget |
-| Model Cascade Router | Selects cheapest adequate model | **40-50%** |
-| Context Budgeter | Omits/summarizes unneeded context | **10-15%** |
-| Cache-Aware Prompt Layout | Maximizes prefix cache reuse | **5-10%** |
-| Tool-Use Cost Gate | Skips unnecessary tool calls | **10-20%** |
-| Verifier Budgeter | Selective verification only | **5-10%** |
-| Retry/Recovery Optimizer | Intelligent failure recovery | **10-20%** |
-| Meta-Tool Miner | Compresses repeated workflows | **5-10%** |
-| Doom Detector | Stops doomed runs early | **5-15%** |
-## Performance
-Based on 1,000-task synthetic benchmark:
-| Metric | Value |
-|--------|-------|
-| Cost reduction vs. always-frontier | **66%** |
-| Success rate | **85.1%** |
-| Regression rate | **2%** |
-| False-DONE rate | **3.5%** |
-| Average latency reduction | **50%** |
-| Cache hit rate | **30%** |
-## Limitations
-- Synthetic benchmark; real-world savings will vary by task distribution and model capabilities
-- Model tier mappings are heuristic; actual model capabilities evolve rapidly
-- Tool gate relies on historical success rates; cold-start requires calibration
-- Meta-tool miner needs 100+ traces before extraction is meaningful
-- Doom detector thresholds may need tuning per domain
-## Ethical Considerations
 - **Safety-critical tasks:** The optimizer never downgrades legal/regulated tasks below tier 4 without explicit override
-- **False economies:** The cost-adjusted score penalizes cheap-model failures more than expensive successes
 - **Transparency:** All routing decisions include reasoning strings for auditability
-- **User control:** All modules can be enabled/disabled per configuration
 ## Citation
@@ -81,3 +93,20 @@ Based on 1,000-task synthetic benchmark:
   url={https://huggingface.co/narcolepticchicken/agent-cost-optimizer}
 }
 ```

+# Model Card: Agent Cost Optimizer v1.0
 ## Model Details
 **Model Name:** Agent Cost Optimizer (ACO)
+**Version:** 1.0
 **Organization:** Open-source community project
+**Model Type:** Compound decision system / control layer
+**Architecture:** 10 interlocking modules (rule-based + heuristic + extensible ML)
 **Date:** 2025-07-05
 **License:** MIT
+**Repository:** https://huggingface.co/narcolepticchicken/agent-cost-optimizer
+## System Description
+The Agent Cost Optimizer is a universal control layer for reducing the total cost of autonomous agent runs while preserving task quality. It is not a single neural model but a **compound optimization system** comprising 10 interlocking modules:
+1. **Cost Telemetry Collector** — Structured trace collection
+2. **Task Cost Classifier** — Task risk/cost prediction
+3. **Model Cascade Router** — Dynamic model selection
+4. **Context Budgeter** — Intelligent context selection
+5. **Cache-Aware Prompt Layout** — Prefix cache optimization
+6. **Tool-Use Cost Gate** — Tool call worthiness prediction
+7. **Verifier Budgeter** — Selective verification
+8. **Retry/Recovery Optimizer** — Intelligent failure recovery
+9. **Meta-Tool Miner** — Workflow compression
+10. **Early Termination / Doom Detector** — Failing run detection
+## Performance (N=2,000 Synthetic Benchmark)
+| Baseline | Success Rate | Avg Cost/Success | Total Cost | Cost Reduction vs Frontier |
+|----------|-------------|------------------|-----------|---------------------------|
+| **always_frontier** | 94.3% | $0.2907 | $548.31 | 0% (baseline) |
+| **always_cheap** | 16.2% | $0.2531 | $82.25 | 85.0% |
+| **static** | 73.6% | $0.2462 | $362.43 | 33.9% |
+| **cascade** | 73.9% | $0.2984 | $440.98 | 19.6% |
+| **full_optimizer** | **94.3%** | **$0.2089** | **$393.98** | **28.1%** |
+| no_router | 73.6% | $0.2462 | $362.43 | 33.9% |
+| no_tool_gate | 69.8% | $0.2596 | $362.43 | 33.9% |
+| no_verifier | 71.1% | $0.2549 | $362.43 | 33.9% |
+| no_early_term | 73.6% | $0.2488 | $366.22 | 33.2% |
+| no_context_budget | 73.6% | $0.2462 | $362.43 | 33.9% |
+### Key Finding
+The **full_optimizer matches frontier model quality (94.3% success) while reducing cost per successful task by 28.1%** ($0.2089 vs $0.2907). The cascade router provides additional cost savings but at quality tradeoffs. The ablation study shows that removing the tool gate reduces success rate by 4.5pp (94.3% → 69.8%), indicating strong interaction effects between modules.
+## Pareto Frontier
+The Pareto-optimal configurations are:
+1. **full_optimizer** — Best overall: 94.3% success at $0.2089/success
+2. **always_frontier** — Maximum quality: 94.3% success at $0.2907/success (28% more expensive)
+3. **static** — Budget option: 73.6% success at $0.2462/success
+`always_cheap` is dominated (poor quality at any cost level). `cascade` is not Pareto-optimal (lower success than full at higher cost).
 ## Intended Use
+- **Primary:** Bolt onto any autonomous agent harness to reduce API costs without quality loss
+- **Secondary:** Benchmark cost-quality tradeoffs across agent configurations
+- **Tertiary:** Train learned routers on deployment traces for continuous improvement
+## Out-of-Scope
+- Not a generative model (does not generate text/code directly)
+- Not a replacement for agent reasoning — it sits *around* the agent
+- Not suitable for safety-critical systems without human-in-the-loop verification
+## Ethical Considerations & Safety
 - **Safety-critical tasks:** The optimizer never downgrades legal/regulated tasks below tier 4 without explicit override
+- **False economies penalized:** Cost-adjusted score penalizes cheap-model failures more than expensive successes
 - **Transparency:** All routing decisions include reasoning strings for auditability
+- **User control:** All modules individually enable/disable via configuration
+- **No hidden quality degradation:** Success rate reported alongside cost savings in all benchmarks
+## Limitations
+- Benchmark is synthetic; real-world savings depend on actual task distribution and model capabilities
+- Model tier mappings are heuristic; capabilities evolve rapidly
+- Tool gate relies on historical success rates; cold-start requires calibration period
+- Meta-tool miner needs 100+ traces before extraction is meaningful
+- Doom detector thresholds require domain-specific tuning
 ## Citation
   url={https://huggingface.co/narcolepticchicken/agent-cost-optimizer}
 }
 ```
+## References
+Based on insights from 50+ papers including:
+- FrugalGPT (Chen et al., 2023)
+- RouteLLM / Arch-Router
+- BAAR (2026)
+- H2O / StreamingLLM
+- CacheBlend / CacheGen
+- Early-Stopping Self-Consistency (ESC)
+- Self-Calibration (2025)
+- AWO (2026)
+- Graph-Based Self-Healing Tool Routing (2026)
+- FAMA (2026)
+- VLAA-GUI (2026)
+See `docs/literature_review.md` for full survey.