Upload ugi-leaderboard-data.csv
Browse files- ugi-leaderboard-data.csv +4 -4
ugi-leaderboard-data.csv
CHANGED
|
@@ -1067,8 +1067,8 @@ openai/gpt-5.4-2026-03-05 (reasoning_effort=medium),https://huggingface.co/opena
|
|
| 1067 |
openai/gpt-5.4-2026-03-05 (reasoning_effort=high),https://huggingface.co/openai/gpt-5.4-2026-03-05 (reasoning_effort=high),3/5/2026,3/5/2026,,,,,FALSE,FALSE,TRUE,68.82,38.96,49.69,0.0,5.6,8.2,1.8,2.0,1.5,71.25,85.6,64.48,63.67,49.69,0.6725,0.8421,0.7386,0.511,0.4191,-29.2%,70.1%,46.6%,49.9%,64.2%,42.5%,62.5%,44.8%,26.0%,37.7%,26.0%,59.0%,58.1%,32.7%,64.0%,61.2%,67.5%,Liberalism,True,0,0,,29.0,0.84,13.1,4.7,0.4,46.0,94.0,0.845,0.424,0.31,1.227,0.357,0.378,20.9,819.0,60.1,20.4,1.5,4.7
|
| 1068 |
openai/gpt-5.4-2026-03-05 (reasoning_effort=xhigh),https://huggingface.co/openai/gpt-5.4-2026-03-05 (reasoning_effort=xhigh),3/5/2026,3/5/2026,,,,,FALSE,FALSE,TRUE,68.65,41.71,56.32,2.4,5.4,8.6,1.2,1.0,1.5,73.53,83.54,70.69,66.34,56.32,0.7984,0.8882,0.6878,0.5018,0.441,-26.9%,70.1%,45.1%,47.5%,63.4%,42.9%,63.3%,41.7%,25.0%,36.9%,27.7%,58.3%,54.2%,30.0%,61.9%,61.9%,66.5%,Liberalism,True,0,0,,30.8,0.89,12.2,4.2,0.398,30.0,72.0,0.856,0.425,0.318,1.183,0.451,0.393,15.1,628.0,65.3,20.48,2.2,4.1
|
| 1069 |
xai/grok-4.20-multi-agent-beta-0309 (agent_count=4),https://huggingface.co/xai/grok-4.20-multi-agent-beta-0309 (agent_count=4),3/11/2026,3/13/2026,,,,,FALSE,FALSE,TRUE,63.13,70.0,72.51,8.8,4.9,8.8,6.5,7.0,6.0,56.34,76.88,49.66,42.5,72.51,0.4634,0.1924,0.5306,0.5498,0.3888,12.7%,58.5%,45.3%,24.0%,66.6%,41.7%,60.4%,38.1%,44.0%,36.9%,43.8%,19.2%,27.7%,25.2%,74.6%,51.2%,74.0%,Classical Liberalism,True,47312,0,,35.6,0.68,13.3,5.7,0.356,38.0,96.0,0.885,0.432,0.276,1.237,0.402,0.332,32.3,6506.0,81.6,20.05,9.7,5.0
|
| 1070 |
-
xai/grok-4.20-beta-0309-reasoning,https://huggingface.co/xai/grok-4.20-beta-0309-reasoning,3/
|
| 1071 |
-
xai/grok-4.20-beta-0309-non-reasoning,https://huggingface.co/xai/grok-4.20-beta-0309-non-reasoning,3/
|
| 1072 |
Qwen/Qwen3.5-35B-A3B (no thinking),https://huggingface.co/Qwen/Qwen3.5-35B-A3B,2/24/2026,3/14/2026,chatml w/ no thinking,3.0,35.0,35.0,FALSE,FALSE,TRUE,37.0,23.15,20.98,4.1,0.8,2.1,2.8,4.0,1.5,24.97,24.78,12.41,37.7,20.98,0.4402,0.1294,0.4529,0.6033,0.2591,-20.7%,65.2%,46.5%,44.4%,56.8%,42.3%,66.2%,48.1%,37.3%,38.3%,28.8%,45.4%,50.8%,36.9%,46.5%,61.7%,62.3%,Liberalism,False,0,3,Qwen3_5MoeForConditionalGeneration,40.6,0.76,12.7,6.3,0.329,17.0,89.0,0.823,0.439,0.32,1.513,0.112,0.22,33.8,8908.0,90.5,19.57,3.0,5.0
|
| 1073 |
Qwen/Qwen3.5-35B-A3B (<think> prefill),https://huggingface.co/Qwen/Qwen3.5-35B-A3B,2/24/2026,3/14/2026,chatml w/ <think> prefill,3.0,35.0,35.0,FALSE,FALSE,TRUE,42.58,16.3,13.19,0.0,0.9,2.9,2.2,3.0,1.5,33.09,52.29,20.69,26.3,13.19,0.1634,0.1105,0.5124,0.2448,0.2838,-18.9%,68.8%,46.6%,43.5%,58.6%,39.2%,69.4%,48.3%,31.9%,33.8%,27.9%,44.6%,52.5%,33.5%,53.8%,57.7%,64.4%,Liberalism,True,14784,3,Qwen3_5MoeForConditionalGeneration,22.8,0.74,12.4,6.3,0.283,18.0,86.0,0.85,0.403,0.299,1.473,0.213,0.275,67.9,10042.0,83.6,23.23,2.2,4.9
|
| 1074 |
Qwen/Qwen3.5-27B (<think> prefill),https://huggingface.co/Qwen/Qwen3.5-27B,2/24/2026,3/9/2026,chatml w/ <think> prefill,27.0,27.0,27.0,FALSE,FALSE,TRUE,42.37,15.98,17.72,1.8,1.0,2.7,1.2,1.0,1.5,35.83,52.48,18.97,36.03,17.72,0.5144,0.2508,0.4146,0.354,0.2677,-17.8%,69.9%,45.3%,42.7%,57.2%,46.5%,63.8%,46.0%,27.3%,35.8%,27.1%,48.5%,47.9%,31.7%,54.0%,55.6%,61.9%,Liberalism,True,13700,2,Qwen3_5ForConditionalGeneration,24.8,0.72,11.8,6.0,0.311,11.0,90.0,0.858,0.412,0.31,1.487,0.305,0.271,29.2,5186.0,95.4,21.92,2.0,4.4
|
|
@@ -1191,5 +1191,5 @@ openai/gpt-5.5-2026-04-23 (reasoning_effort=low),https://huggingface.co/openai/g
|
|
| 1191 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=medium),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=medium),4/23/2026,4/25/2026,,,,,False,False,True,64.75,49.19,62.54,4.1,5.6,8.7,2.2,3.0,1.5,77.12,85.58,74.14,71.64,62.54,0.8053,0.8712,0.7425,0.7321,0.4308,-31.6%,73.0%,46.0%,49.5%,62.4%,42.3%,65.8%,46.2%,25.8%,31.7%,23.5%,55.4%,54.4%,38.8%,55.4%,66.0%,65.8%,Liberalism,False,0,0,,28.4,0.87,12.0,4.2,0.409,71.0,100.0,0.854,0.43,0.332,1.193,0.46,0.383,14.8,699.0,59.7,18.33,2.4,4.4
|
| 1192 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=high),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=high),4/23/2026,4/25/2026,,,,,False,False,True,67.91,50.98,67.71,6.5,5.8,8.3,1.8,2.0,1.5,79.27,82.26,81.38,74.16,67.71,0.865,0.9579,0.7839,0.6838,0.4176,-31.7%,72.3%,45.3%,47.2%,64.1%,44.0%,63.5%,43.5%,26.7%,32.5%,24.0%,56.0%,55.0%,30.6%,58.3%,66.0%,67.9%,Liberalism,False,0,0,,28.3,0.86,11.5,4.1,0.412,42.0,88.0,0.863,0.427,0.325,1.21,0.49,0.378,11.8,312.0,55.3,18.82,1.1,3.1
|
| 1193 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=xhigh),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=xhigh),4/23/2026,4/25/2026,,,,,False,False,True,69.66,51.19,65.53,5.9,5.6,8.2,2.2,3.0,1.5,78.16,88.3,76.9,69.29,65.53,0.8434,0.8631,0.7278,0.6361,0.3939,-32.1%,72.8%,44.9%,48.0%,63.9%,45.2%,63.5%,43.3%,27.5%,30.6%,23.5%,56.5%,56.9%,30.6%,57.7%,65.0%,69.0%,Liberalism,False,0,0,,30.3,0.84,11.0,4.1,0.403,13.0,58.0,0.887,0.426,0.307,1.243,0.437,0.356,12.9,733.0,61.2,19.27,0.9,4.3
|
| 1194 |
-
xai/grok-4.20-0309-non-reasoning,https://huggingface.co/xai/grok-4.20-0309-non-reasoning,3/
|
| 1195 |
-
xai/grok-4.20-0309-reasoning,https://huggingface.co/xai/grok-4.20-0309-reasoning,3/
|
|
|
|
| 1067 |
openai/gpt-5.4-2026-03-05 (reasoning_effort=high),https://huggingface.co/openai/gpt-5.4-2026-03-05 (reasoning_effort=high),3/5/2026,3/5/2026,,,,,FALSE,FALSE,TRUE,68.82,38.96,49.69,0.0,5.6,8.2,1.8,2.0,1.5,71.25,85.6,64.48,63.67,49.69,0.6725,0.8421,0.7386,0.511,0.4191,-29.2%,70.1%,46.6%,49.9%,64.2%,42.5%,62.5%,44.8%,26.0%,37.7%,26.0%,59.0%,58.1%,32.7%,64.0%,61.2%,67.5%,Liberalism,True,0,0,,29.0,0.84,13.1,4.7,0.4,46.0,94.0,0.845,0.424,0.31,1.227,0.357,0.378,20.9,819.0,60.1,20.4,1.5,4.7
|
| 1068 |
openai/gpt-5.4-2026-03-05 (reasoning_effort=xhigh),https://huggingface.co/openai/gpt-5.4-2026-03-05 (reasoning_effort=xhigh),3/5/2026,3/5/2026,,,,,FALSE,FALSE,TRUE,68.65,41.71,56.32,2.4,5.4,8.6,1.2,1.0,1.5,73.53,83.54,70.69,66.34,56.32,0.7984,0.8882,0.6878,0.5018,0.441,-26.9%,70.1%,45.1%,47.5%,63.4%,42.9%,63.3%,41.7%,25.0%,36.9%,27.7%,58.3%,54.2%,30.0%,61.9%,61.9%,66.5%,Liberalism,True,0,0,,30.8,0.89,12.2,4.2,0.398,30.0,72.0,0.856,0.425,0.318,1.183,0.451,0.393,15.1,628.0,65.3,20.48,2.2,4.1
|
| 1069 |
xai/grok-4.20-multi-agent-beta-0309 (agent_count=4),https://huggingface.co/xai/grok-4.20-multi-agent-beta-0309 (agent_count=4),3/11/2026,3/13/2026,,,,,FALSE,FALSE,TRUE,63.13,70.0,72.51,8.8,4.9,8.8,6.5,7.0,6.0,56.34,76.88,49.66,42.5,72.51,0.4634,0.1924,0.5306,0.5498,0.3888,12.7%,58.5%,45.3%,24.0%,66.6%,41.7%,60.4%,38.1%,44.0%,36.9%,43.8%,19.2%,27.7%,25.2%,74.6%,51.2%,74.0%,Classical Liberalism,True,47312,0,,35.6,0.68,13.3,5.7,0.356,38.0,96.0,0.885,0.432,0.276,1.237,0.402,0.332,32.3,6506.0,81.6,20.05,9.7,5.0
|
| 1070 |
+
xai/grok-4.20-beta-0309-reasoning,https://huggingface.co/xai/grok-4.20-beta-0309-reasoning,3/10/2026,3/13/2026,,,,,FALSE,FALSE,TRUE,57.41,53.16,52.25,6.5,4.0,5.7,5.5,5.0,6.0,53.67,75.82,41.38,43.82,52.25,0.4124,0.4036,0.5568,0.3829,0.4353,29.7%,49.8%,42.6%,17.5%,62.6%,56.5%,52.7%,37.1%,57.9%,43.3%,49.4%,11.2%,16.9%,24.4%,67.5%,40.8%,79.4%,Classical Liberalism,True,0,0,,25.3,0.74,13.4,5.6,0.33,43.0,96.0,0.877,0.438,0.299,1.217,0.342,0.399,35.8,3244.0,78.8,21.62,9.5,5.9
|
| 1071 |
+
xai/grok-4.20-beta-0309-non-reasoning,https://huggingface.co/xai/grok-4.20-beta-0309-non-reasoning,3/10/2026,3/13/2026,,,,,FALSE,FALSE,TRUE,43.88,30.25,25.37,2.4,2.3,2.9,4.0,5.0,3.0,39.77,55.31,33.45,30.56,25.37,0.2202,0.2511,0.3307,0.378,0.3481,11.7%,53.3%,46.3%,30.0%,57.6%,38.3%,58.1%,35.4%,54.0%,41.5%,44.6%,24.8%,31.5%,33.8%,54.4%,49.0%,69.4%,Classical Liberalism,False,0,0,,23.2,0.76,14.0,4.6,0.353,37.0,88.0,0.849,0.439,0.303,1.297,0.373,0.302,56.5,5180.0,107.5,21.67,8.8,6.8
|
| 1072 |
Qwen/Qwen3.5-35B-A3B (no thinking),https://huggingface.co/Qwen/Qwen3.5-35B-A3B,2/24/2026,3/14/2026,chatml w/ no thinking,3.0,35.0,35.0,FALSE,FALSE,TRUE,37.0,23.15,20.98,4.1,0.8,2.1,2.8,4.0,1.5,24.97,24.78,12.41,37.7,20.98,0.4402,0.1294,0.4529,0.6033,0.2591,-20.7%,65.2%,46.5%,44.4%,56.8%,42.3%,66.2%,48.1%,37.3%,38.3%,28.8%,45.4%,50.8%,36.9%,46.5%,61.7%,62.3%,Liberalism,False,0,3,Qwen3_5MoeForConditionalGeneration,40.6,0.76,12.7,6.3,0.329,17.0,89.0,0.823,0.439,0.32,1.513,0.112,0.22,33.8,8908.0,90.5,19.57,3.0,5.0
|
| 1073 |
Qwen/Qwen3.5-35B-A3B (<think> prefill),https://huggingface.co/Qwen/Qwen3.5-35B-A3B,2/24/2026,3/14/2026,chatml w/ <think> prefill,3.0,35.0,35.0,FALSE,FALSE,TRUE,42.58,16.3,13.19,0.0,0.9,2.9,2.2,3.0,1.5,33.09,52.29,20.69,26.3,13.19,0.1634,0.1105,0.5124,0.2448,0.2838,-18.9%,68.8%,46.6%,43.5%,58.6%,39.2%,69.4%,48.3%,31.9%,33.8%,27.9%,44.6%,52.5%,33.5%,53.8%,57.7%,64.4%,Liberalism,True,14784,3,Qwen3_5MoeForConditionalGeneration,22.8,0.74,12.4,6.3,0.283,18.0,86.0,0.85,0.403,0.299,1.473,0.213,0.275,67.9,10042.0,83.6,23.23,2.2,4.9
|
| 1074 |
Qwen/Qwen3.5-27B (<think> prefill),https://huggingface.co/Qwen/Qwen3.5-27B,2/24/2026,3/9/2026,chatml w/ <think> prefill,27.0,27.0,27.0,FALSE,FALSE,TRUE,42.37,15.98,17.72,1.8,1.0,2.7,1.2,1.0,1.5,35.83,52.48,18.97,36.03,17.72,0.5144,0.2508,0.4146,0.354,0.2677,-17.8%,69.9%,45.3%,42.7%,57.2%,46.5%,63.8%,46.0%,27.3%,35.8%,27.1%,48.5%,47.9%,31.7%,54.0%,55.6%,61.9%,Liberalism,True,13700,2,Qwen3_5ForConditionalGeneration,24.8,0.72,11.8,6.0,0.311,11.0,90.0,0.858,0.412,0.31,1.487,0.305,0.271,29.2,5186.0,95.4,21.92,2.0,4.4
|
|
|
|
| 1191 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=medium),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=medium),4/23/2026,4/25/2026,,,,,False,False,True,64.75,49.19,62.54,4.1,5.6,8.7,2.2,3.0,1.5,77.12,85.58,74.14,71.64,62.54,0.8053,0.8712,0.7425,0.7321,0.4308,-31.6%,73.0%,46.0%,49.5%,62.4%,42.3%,65.8%,46.2%,25.8%,31.7%,23.5%,55.4%,54.4%,38.8%,55.4%,66.0%,65.8%,Liberalism,False,0,0,,28.4,0.87,12.0,4.2,0.409,71.0,100.0,0.854,0.43,0.332,1.193,0.46,0.383,14.8,699.0,59.7,18.33,2.4,4.4
|
| 1192 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=high),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=high),4/23/2026,4/25/2026,,,,,False,False,True,67.91,50.98,67.71,6.5,5.8,8.3,1.8,2.0,1.5,79.27,82.26,81.38,74.16,67.71,0.865,0.9579,0.7839,0.6838,0.4176,-31.7%,72.3%,45.3%,47.2%,64.1%,44.0%,63.5%,43.5%,26.7%,32.5%,24.0%,56.0%,55.0%,30.6%,58.3%,66.0%,67.9%,Liberalism,False,0,0,,28.3,0.86,11.5,4.1,0.412,42.0,88.0,0.863,0.427,0.325,1.21,0.49,0.378,11.8,312.0,55.3,18.82,1.1,3.1
|
| 1193 |
openai/gpt-5.5-2026-04-23 (reasoning_effort=xhigh),https://huggingface.co/openai/gpt-5.5-2026-04-23 (reasoning_effort=xhigh),4/23/2026,4/25/2026,,,,,False,False,True,69.66,51.19,65.53,5.9,5.6,8.2,2.2,3.0,1.5,78.16,88.3,76.9,69.29,65.53,0.8434,0.8631,0.7278,0.6361,0.3939,-32.1%,72.8%,44.9%,48.0%,63.9%,45.2%,63.5%,43.3%,27.5%,30.6%,23.5%,56.5%,56.9%,30.6%,57.7%,65.0%,69.0%,Liberalism,False,0,0,,30.3,0.84,11.0,4.1,0.403,13.0,58.0,0.887,0.426,0.307,1.243,0.437,0.356,12.9,733.0,61.2,19.27,0.9,4.3
|
| 1194 |
+
xai/grok-4.20-0309-non-reasoning,https://huggingface.co/xai/grok-4.20-0309-non-reasoning,3/19/2026,4/25/2026,,,,,False,False,True,47.15,51.77,46.4,8.8,3.1,3.1,6.2,6.0,6.5,40.76,52.25,36.21,33.82,46.4,0.3499,0.2187,0.3252,0.4425,0.3545,11.8%,55.4%,46.5%,28.7%,60.6%,40.0%,56.2%,35.8%,52.7%,38.3%,42.7%,32.9%,20.8%,32.3%,62.3%,48.1%,71.5%,Classical Liberalism,False,0,0,,22.4,0.77,13.8,4.8,0.343,36.0,100.0,0.851,0.425,0.294,1.287,0.394,0.31,40.9,5843.0,108.4,21.03,8.5,8.2
|
| 1195 |
+
xai/grok-4.20-0309-reasoning,https://huggingface.co/xai/grok-4.20-0309-reasoning,3/19/2026,4/25/2026,,,,,False,False,True,55.26,64.23,58.84,7.1,4.5,6.7,7.5,7.0,8.0,52.75,70.23,46.9,41.13,58.84,0.3714,0.3311,0.5281,0.4425,0.3835,27.9%,49.8%,43.6%,18.6%,62.2%,49.2%,56.2%,36.2%,57.1%,45.0%,48.5%,13.1%,18.5%,24.2%,66.5%,40.8%,79.4%,Classical Liberalism,False,0,0,,22.9,0.73,13.2,5.3,0.325,46.0,100.0,0.865,0.43,0.29,1.267,0.385,0.349,39.0,3995.0,81.9,21.03,9.9,6.2
|