Add cost_estimate and benchmark OpenClaw skills
Browse files
openclaw/skills/cost_estimate/SKILL.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# cost_estimate
|
| 2 |
+
|
| 3 |
+
Estimate the cost of running queries through different LLM providers and models. Supports all 12 providers: OpenAI, Claude, Gemini, Mistral, Cohere, Ollama (free), OpenRouter, Groq, xAI, Together, HuggingFace, DeepSeek.
|
| 4 |
+
|
| 5 |
+
## Parameters
|
| 6 |
+
- `num_queries` (integer, required): Number of queries to project
|
| 7 |
+
- `provider` (string, optional): Specific provider to analyze (default: all)
|
| 8 |
+
- `pipeline` (string, optional): "baseline", "graphrag", or "both" (default: both)
|
| 9 |
+
|
| 10 |
+
## Returns
|
| 11 |
+
JSON with cost projections per provider including:
|
| 12 |
+
- Cost per query for baseline and graphrag pipelines
|
| 13 |
+
- Total cost for the specified number of queries
|
| 14 |
+
- Monthly and annual projections at 1K queries/day
|
| 15 |
+
- Provider comparison sorted by total cost
|
| 16 |
+
|
| 17 |
+
## Example
|
| 18 |
+
```
|
| 19 |
+
cost_estimate 10000 --provider ollama --pipeline both
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
# benchmark
|
| 25 |
+
|
| 26 |
+
Run the HotpotQA benchmark suite through both pipelines and generate a full evaluation report with F1, Exact Match, Context Hit Rate, and cost analysis.
|
| 27 |
+
|
| 28 |
+
## Parameters
|
| 29 |
+
- `num_samples` (integer, optional, default=50): Number of HotpotQA questions to evaluate
|
| 30 |
+
- `provider` (string, optional, default="anthropic"): LLM provider
|
| 31 |
+
- `model` (string, optional): Specific model ID
|
| 32 |
+
- `output` (string, optional): File path to save results JSON
|
| 33 |
+
|
| 34 |
+
## Returns
|
| 35 |
+
JSON with:
|
| 36 |
+
- Per-query results with F1, EM, tokens, cost for both pipelines
|
| 37 |
+
- Aggregate metrics (avg F1, avg EM, context hit rate, win rate)
|
| 38 |
+
- Stratified results by question type (bridge vs comparison)
|
| 39 |
+
- Full text benchmark report
|
| 40 |
+
|
| 41 |
+
## Example
|
| 42 |
+
```
|
| 43 |
+
benchmark 100 --provider openai --model gpt-4o-mini --output results.json
|
| 44 |
+
```
|