narcolepticchicken commited on
Commit
aa220ed
·
verified ·
1 Parent(s): 329d9ff

Upload docs/deployment_guide.md

Browse files
Files changed (1) hide show
  1. docs/deployment_guide.md +441 -0
docs/deployment_guide.md ADDED
@@ -0,0 +1,441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Cost Optimizer — Deployment Guide
2
+
3
+ ## Quick Start
4
+
5
+ ### Installation
6
+
7
+ ```bash
8
+ pip install git+https://huggingface.co/narcolepticchicken/agent-cost-optimizer
9
+ ```
10
+
11
+ Or clone and install locally:
12
+
13
+ ```bash
14
+ git clone https://huggingface.co/narcolepticchicken/agent-cost-optimizer
15
+ cd agent-cost-optimizer
16
+ pip install -e .
17
+ ```
18
+
19
+ ### Basic Usage
20
+
21
+ ```python
22
+ from aco import AgentCostOptimizer
23
+
24
+ # Load default configuration
25
+ optimizer = AgentCostOptimizer()
26
+
27
+ # Optimize a single agent request
28
+ result = optimizer.optimize(
29
+ "Write a Python function to reverse a linked list",
30
+ run_state={
31
+ "trace_id": "run-001",
32
+ "planned_tools": [("code_execution", {"code": "test"})],
33
+ }
34
+ )
35
+
36
+ print(f"Model: {result.routing_decision.model_id}")
37
+ print(f"Tier: {result.routing_decision.tier}")
38
+ print(f"Estimated Cost: ${result.estimated_cost:.4f}")
39
+ print(f"Tool Decisions: {[d.decision.value for d in result.tool_decisions]}")
40
+ ```
41
+
42
+ ## Configuration
43
+
44
+ ### Config File
45
+
46
+ Create a `config.yaml`:
47
+
48
+ ```yaml
49
+ project_name: "my-agent-optimizer"
50
+ trace_storage_path: "./traces"
51
+
52
+ models:
53
+ gpt-4o-mini:
54
+ model_id: "gpt-4o-mini"
55
+ provider: "openai"
56
+ cost_per_1k_input: 0.00015
57
+ cost_per_1k_output: 0.0006
58
+ strength_tier: 2
59
+ max_context: 128000
60
+ cache_discount_rate: 0.5
61
+
62
+ gpt-4o:
63
+ model_id: "gpt-4o"
64
+ provider: "openai"
65
+ cost_per_1k_input: 0.0025
66
+ cost_per_1k_output: 0.01
67
+ strength_tier: 4
68
+ max_context: 128000
69
+ cache_discount_rate: 0.5
70
+
71
+ tools:
72
+ search:
73
+ tool_name: "search"
74
+ cost_per_call: 0.002
75
+ latency_ms_estimate: 500
76
+
77
+ code_execution:
78
+ tool_name: "code_execution"
79
+ cost_per_call: 0.005
80
+ latency_ms_estimate: 1000
81
+ requires_verification: true
82
+
83
+ verifiers:
84
+ verifier_medium:
85
+ verifier_model_id: "gpt-4o-mini"
86
+ cost_per_call: 0.005
87
+ confidence_threshold: 0.8
88
+
89
+ # Enable/disable modules
90
+ enable_router: true
91
+ enable_context_budgeter: true
92
+ enable_cache_layout: true
93
+ enable_tool_gate: true
94
+ enable_verifier_budgeter: true
95
+ enable_retry_optimizer: true
96
+ enable_meta_tool_miner: true
97
+ enable_early_termination: true
98
+ ```
99
+
100
+ Load it:
101
+
102
+ ```python
103
+ optimizer = AgentCostOptimizer.from_config("config.yaml")
104
+ ```
105
+
106
+ ## Integration with Agent Harness
107
+
108
+ ### Generic Integration Pattern
109
+
110
+ ```python
111
+ class MyAgentHarness:
112
+ def __init__(self):
113
+ self.optimizer = AgentCostOptimizer.from_config("config.yaml")
114
+
115
+ def execute(self, user_request: str, context: dict):
116
+ # 1. Build run state
117
+ run_state = {
118
+ "trace_id": f"run-{uuid.uuid4()}",
119
+ "planned_tools": self.plan_tools(user_request),
120
+ "context_pieces": context,
121
+ "current_cost": 0.0,
122
+ "step_number": 1,
123
+ "total_steps": self.estimate_steps(user_request),
124
+ "is_irreversible": False,
125
+ }
126
+
127
+ # 2. Call optimizer BEFORE execution
128
+ decision = self.optimizer.optimize(user_request, run_state)
129
+
130
+ # 3. Apply optimizer decisions
131
+ selected_model = decision.routing_decision.model_id
132
+
133
+ # Apply tool gate
134
+ approved_tools = [
135
+ td for td in decision.tool_decisions
136
+ if td.decision.value in ("use", "batch", "parallel")
137
+ ]
138
+
139
+ # Apply context budget
140
+ if decision.context_budget:
141
+ context = self._apply_context_budget(context, decision.context_budget)
142
+
143
+ # Apply cache layout
144
+ if decision.prompt_layout:
145
+ prompt = self._apply_cache_layout(decision.prompt_layout)
146
+
147
+ # Check doom assessment
148
+ if decision.doom_assessment and decision.doom_assessment.action.value == "mark_blocked":
149
+ return {"status": "BLOCKED", "reason": decision.doom_assessment.reasoning}
150
+
151
+ # 4. Execute with optimized parameters
152
+ result = self.llm_call(
153
+ model=selected_model,
154
+ prompt=prompt,
155
+ tools=approved_tools,
156
+ max_tokens=decision.routing_decision.max_tokens,
157
+ )
158
+
159
+ # 5. Record step
160
+ self.optimizer.record_step(
161
+ trace_id=decision.trace_id,
162
+ model_call=ModelCall(
163
+ model_id=selected_model,
164
+ provider="openai",
165
+ input_tokens=result.input_tokens,
166
+ output_tokens=result.output_tokens,
167
+ cost_per_1k_input=0.0025,
168
+ cost_per_1k_output=0.01,
169
+ ),
170
+ tool_calls=[...],
171
+ context_size_tokens=len(prompt) // 4,
172
+ step_outcome=Outcome.SUCCESS if result.success else Outcome.FAILURE,
173
+ )
174
+
175
+ # 6. Finalize trace
176
+ self.optimizer.finalize_trace(
177
+ trace_id=decision.trace_id,
178
+ outcome=Outcome.SUCCESS if result.success else Outcome.FAILURE,
179
+ user_satisfaction=1.0 if result.success else 0.0,
180
+ )
181
+
182
+ return result
183
+ ```
184
+
185
+ ### LangChain Integration
186
+
187
+ ```python
188
+ from aco import AgentCostOptimizer
189
+ from langchain.agents import AgentExecutor
190
+
191
+ class ACOWrapper:
192
+ def __init__(self, agent_executor, optimizer):
193
+ self.agent = agent_executor
194
+ self.optimizer = optimizer
195
+
196
+ def invoke(self, input_data):
197
+ # Pre-optimize
198
+ decision = self.optimizer.optimize(
199
+ input_data["input"],
200
+ run_state={
201
+ "planned_tools": [(t.name, {}) for t in self.agent.tools],
202
+ "trace_id": input_data.get("run_id", str(uuid.uuid4())),
203
+ }
204
+ )
205
+
206
+ # Override agent LLM based on routing decision
207
+ self.agent.llm = self.get_llm(decision.routing_decision.model_id)
208
+
209
+ # Filter tools based on tool gate
210
+ self.agent.tools = [
211
+ t for t in self.agent.tools
212
+ if any(d.tool_name == t.name and d.decision.value == "use"
213
+ for d in decision.tool_decisions)
214
+ ]
215
+
216
+ # Execute
217
+ result = self.agent.invoke(input_data)
218
+
219
+ # Record and finalize
220
+ # ... (see generic pattern above)
221
+
222
+ return result
223
+ ```
224
+
225
+ ### OpenAI Assistants Integration
226
+
227
+ ```python
228
+ from aco import AgentCostOptimizer
229
+
230
+ class ACOAssistantWrapper:
231
+ def __init__(self, assistant_id, optimizer):
232
+ self.assistant_id = assistant_id
233
+ self.optimizer = optimizer
234
+
235
+ def create_run(self, thread_id, instructions):
236
+ # Optimize instructions (context budgeter)
237
+ decision = self.optimizer.optimize(
238
+ instructions,
239
+ run_state={
240
+ "trace_id": f"assistant-run-{thread_id}",
241
+ "context_pieces": {"system_rules": instructions},
242
+ }
243
+ )
244
+
245
+ # Use cache-aware prompt layout
246
+ if decision.prompt_layout:
247
+ optimized_instructions = decision.prompt_layout.prefix + "\n\n" + decision.prompt_layout.suffix
248
+ else:
249
+ optimized_instructions = instructions
250
+
251
+ # Create run with optimized parameters
252
+ return openai.beta.threads.runs.create(
253
+ thread_id=thread_id,
254
+ assistant_id=self.assistant_id,
255
+ instructions=optimized_instructions,
256
+ model=decision.routing_decision.model_id,
257
+ )
258
+ ```
259
+
260
+ ## Multi-Provider Support
261
+
262
+ ACO supports any provider with cost metadata:
263
+
264
+ ```yaml
265
+ models:
266
+ claude-3-haiku:
267
+ model_id: "claude-3-haiku-20240307"
268
+ provider: "anthropic"
269
+ cost_per_1k_input: 0.00025
270
+ cost_per_1k_output: 0.00125
271
+ strength_tier: 2
272
+
273
+ claude-3-opus:
274
+ model_id: "claude-3-opus-20240229"
275
+ provider: "anthropic"
276
+ cost_per_1k_input: 0.015
277
+ cost_per_1k_output: 0.075
278
+ strength_tier: 4
279
+
280
+ gemini-pro:
281
+ model_id: "gemini-1.5-pro"
282
+ provider: "google"
283
+ cost_per_1k_input: 0.0035
284
+ cost_per_1k_output: 0.0105
285
+ strength_tier: 3
286
+
287
+ deepseek-chat:
288
+ model_id: "deepseek-chat"
289
+ provider: "deepseek"
290
+ cost_per_1k_input: 0.00014
291
+ cost_per_1k_output: 0.00028
292
+ strength_tier: 2
293
+ cache_discount_rate: 0.5
294
+ ```
295
+
296
+ ## Local Model Support
297
+
298
+ For self-hosted models:
299
+
300
+ ```yaml
301
+ models:
302
+ llama-3.2-1b:
303
+ model_id: "meta-llama/Llama-3.2-1B-Instruct"
304
+ provider: "local"
305
+ cost_per_1k_input: 0.0
306
+ cost_per_1k_output: 0.0
307
+ strength_tier: 1
308
+ max_context: 128000
309
+
310
+ qwen2.5-7b:
311
+ model_id: "Qwen/Qwen2.5-7B-Instruct"
312
+ provider: "local"
313
+ cost_per_1k_input: 0.0
314
+ cost_per_1k_output: 0.0
315
+ strength_tier: 3
316
+ max_context: 131072
317
+ ```
318
+
319
+ Use `cost_per_1k_input: 0.0` for local models. ACO will still optimize latency and context size.
320
+
321
+ ## Benchmarking
322
+
323
+ Run the benchmark suite:
324
+
325
+ ```bash
326
+ python eval_runner.py --tasks 1000 --output ./eval_results
327
+ ```
328
+
329
+ With ablations:
330
+
331
+ ```bash
332
+ python eval_runner.py --tasks 1000 --ablations --output ./eval_results
333
+ ```
334
+
335
+ Generate report:
336
+
337
+ ```bash
338
+ python -m aco.cli report --input ./eval_results/baseline_results.json
339
+ ```
340
+
341
+ ## Telemetry and Monitoring
342
+
343
+ Traces are stored as JSON in `trace_storage_path`:
344
+
345
+ ```python
346
+ # List all traces
347
+ traces = optimizer.telemetry.list_traces()
348
+
349
+ # Get statistics
350
+ stats = optimizer.telemetry.get_stats()
351
+ print(f"Total traces: {stats['count']}")
352
+ print(f"Avg cost: ${stats['avg_cost']:.4f}")
353
+ print(f"Success rate: {stats['success_rate']:.1%}")
354
+
355
+ # Full optimizer stats
356
+ all_stats = optimizer.get_stats()
357
+ print(json.dumps(all_stats, indent=2))
358
+ ```
359
+
360
+ ## Advanced: Training a Custom Router
361
+
362
+ To train a model-specific router using your trace data:
363
+
364
+ ```python
365
+ from aco.optimizer import AgentCostOptimizer
366
+ from aco.config import ACOConfig, ModelConfig
367
+
368
+ # 1. Collect traces
369
+ optimizer = AgentCostOptimizer()
370
+ # ... run agent tasks ...
371
+
372
+ # 2. Extract features and labels from traces
373
+ traces = [optimizer.telemetry.load_trace(tid) for tid in optimizer.telemetry.list_traces()]
374
+
375
+ # 3. Train a simple classifier (example with sklearn)
376
+ from sklearn.ensemble import RandomForestClassifier
377
+ import numpy as np
378
+
379
+ X = []
380
+ y = []
381
+ for trace in traces:
382
+ # Features: task_type, request_length, predicted_cost, prior_success_rate
383
+ features = [
384
+ hash(trace["task_type"]) % 1000,
385
+ len(trace["user_request"]),
386
+ trace.get("total_cost", 0.01),
387
+ ]
388
+ # Label: optimal model tier (from oracle comparison)
389
+ optimal_tier = trace.get("metadata", {}).get("optimal_tier", 3)
390
+ X.append(features)
391
+ y.append(optimal_tier)
392
+
393
+ clf = RandomForestClassifier(n_estimators=100)
394
+ clf.fit(X, y)
395
+
396
+ # 4. Deploy: override router decisions
397
+ # In production, integrate the classifier into ModelCascadeRouter._route_learned()
398
+ ```
399
+
400
+ For RL-based routing (GRPO/DPO), see the literature review for BAAR and xRouter approaches.
401
+
402
+ ## Production Checklist
403
+
404
+ - [ ] Configure all models with accurate cost metadata
405
+ - [ ] Configure all tools with cost/latency estimates
406
+ - [ ] Set appropriate tier mappings for your use case
407
+ - [ ] Enable telemetry to collect traces for learning
408
+ - [ ] Set doom thresholds appropriate for your SLA
409
+ - [ ] Configure verifier thresholds for safety-critical tasks
410
+ - [ ] Test with small synthetic benchmark before deployment
411
+ - [ ] Monitor regression rate and false-DONE rate
412
+ - [ ] Review and adjust routing policy monthly
413
+ - [ ] Mine meta-tools after collecting 100+ successful traces
414
+
415
+ ## Troubleshooting
416
+
417
+ ### High regression rate
418
+ - Check if model tier mappings match your actual model capabilities
419
+ - Increase `unsafe_cheap_model_penalty` in config
420
+ - Enable verifier on more task types
421
+
422
+ ### Low cost savings
423
+ - Verify cache layout is enabled (check cache hit rate)
424
+ - Ensure tool gate is catching repeated/unnecessary calls
425
+ - Check if meta-tool miner is enabled and has enough traces
426
+
427
+ ### High false-DONE rate
428
+ - Increase verifier threshold for final-step verification
429
+ - Enable doom detector with stricter `doom_no_progress_steps`
430
+ - Add more failure patterns to retry optimizer
431
+
432
+ ### Slow routing decisions
433
+ - Use prompt-only or static routing instead of learned
434
+ - Cache classification results for repeated request patterns
435
+ - Pre-compute meta-tools during off-peak hours
436
+
437
+ ## Support
438
+
439
+ - Repository: https://huggingface.co/narcolepticchicken/agent-cost-optimizer
440
+ - Issues: Open a discussion on the Hugging Face Hub
441
+ - Literature Review: See `docs/literature_review.md`