Add GPQA Diamond evaluation result

by SaylorTwift HF Staff - opened Mar 2

←

YAML Metadata Error:Invalid content in Eval Result file .eval_results/mmlu_pro.yaml

Check out the documentation for more information.

Show details

Task ID "mmlu_pro" does not match any task in dataset "TIGER-Lab/MMLU-Pro". Available: none

Files changed (2) hide show

.eval_results/gpqa.yaml ADDED Viewed

+- dataset:
+    id: Idavidrein/gpqa
+    task_id: diamond
+  value: 81.7
+  source:
+    url: https://huggingface.co/Qwen/Qwen3.5-9B
+    name: Model Card
+    user: SaylorTwift

.eval_results/mmlu_pro.yaml ADDED Viewed

+- dataset:
+    id: TIGER-Lab/MMLU-Pro
+    task_id: mmlu_pro
+  value: 82.5
+  source:
+    url: https://huggingface.co/Qwen/Qwen3.5-9B
+    name: Model Card
+    user: SaylorTwift