Add WildClawBench evaluation result

#69

by yuhangzang - opened 28 days ago

←

YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml

Check out the documentation for more information.

Show details

Task ID "overall" does not match any task in dataset "internlm/WildClawBench". Available: none

YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml

Check out the documentation for more information.

Show details

Task ID "avg_time" does not match any task in dataset "internlm/WildClawBench". Available: none

YAML Metadata Error:Invalid content in Eval Result file .eval_results/wildclawbench.yaml

Check out the documentation for more information.

Show details

Task ID "avg_cost" does not match any task in dataset "internlm/WildClawBench". Available: none

Files changed (1) hide show

.eval_results/wildclawbench.yaml ADDED Viewed

+- dataset:
+    id: internlm/WildClawBench
+    task_id: overall
+  value: 33.5
+  date: "2026-03-24"
+  source:
+    url: https://internlm.github.io/WildClawBench
+    name: WildClawBench
+    user: internlm
+- dataset:
+    id: internlm/WildClawBench
+    task_id: avg_time
+  value: 459
+  date: "2026-03-24"
+  source:
+    url: https://internlm.github.io/WildClawBench
+    name: WildClawBench
+    user: internlm
+- dataset:
+    id: internlm/WildClawBench
+    task_id: avg_cost
+  value: 22.33
+  date: "2026-03-24"
+  source:
+    url: https://internlm.github.io/WildClawBench
+    name: WildClawBench
+    user: internlm