Add WildClawBench evaluation result
#69
by yuhangzang - opened
YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml
Check out the documentation for more information.
Show details
Task ID "overall" does not match any task in dataset "internlm/WildClawBench". Available: none
YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml
Check out the documentation for more information.
Show details
Task ID "avg_time" does not match any task in dataset "internlm/WildClawBench". Available: none
YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml
Check out the documentation for more information.
Show details
Task ID "avg_cost" does not match any task in dataset "internlm/WildClawBench". Available: none
.eval_results/wildclawbench.yaml
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
- dataset:
|
| 2 |
+
id: internlm/WildClawBench
|
| 3 |
+
task_id: overall
|
| 4 |
+
value: 33.5
|
| 5 |
+
date: "2026-03-24"
|
| 6 |
+
source:
|
| 7 |
+
url: https://internlm.github.io/WildClawBench
|
| 8 |
+
name: WildClawBench
|
| 9 |
+
user: internlm
|
| 10 |
+
|
| 11 |
+
- dataset:
|
| 12 |
+
id: internlm/WildClawBench
|
| 13 |
+
task_id: avg_time
|
| 14 |
+
value: 459
|
| 15 |
+
date: "2026-03-24"
|
| 16 |
+
source:
|
| 17 |
+
url: https://internlm.github.io/WildClawBench
|
| 18 |
+
name: WildClawBench
|
| 19 |
+
user: internlm
|
| 20 |
+
|
| 21 |
+
- dataset:
|
| 22 |
+
id: internlm/WildClawBench
|
| 23 |
+
task_id: avg_cost
|
| 24 |
+
value: 22.33
|
| 25 |
+
date: "2026-03-24"
|
| 26 |
+
source:
|
| 27 |
+
url: https://internlm.github.io/WildClawBench
|
| 28 |
+
name: WildClawBench
|
| 29 |
+
user: internlm
|