Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ยท
68bc7c7
1
Parent(s): cbeb01c
update about.html
Browse files- frontend/about.html +4 -22
frontend/about.html
CHANGED
|
@@ -25,9 +25,6 @@
|
|
| 25 |
|
| 26 |
<div class="text-center mb-12">
|
| 27 |
<h1 class="text-4xl font-extrabold tracking-tight sm:text-5xl gradient-text mb-6">About QIMMA</h1>
|
| 28 |
-
<!-- <p class="text-lg text-slate-600 dark:text-slate-400 max-w-2xl mx-auto">
|
| 29 |
-
Understanding the methodology and metrics behind the Open Arabic LLM Leaderboard.
|
| 30 |
-
</p> -->
|
| 31 |
</div>
|
| 32 |
|
| 33 |
<div class="space-y-12 animate-fade-in">
|
|
@@ -43,7 +40,7 @@
|
|
| 43 |
</div>
|
| 44 |
<div class="prose dark:prose-invert max-w-none text-slate-600 dark:text-slate-300 leading-relaxed">
|
| 45 |
<p class="mb-4">
|
| 46 |
-
QIMMA ูู
ูุฉ (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on
|
| 47 |
</p>
|
| 48 |
<p>
|
| 49 |
QIMMA was constructed through a systematic benchmark curation process: candidate benchmarks were assessed using a multi-model quality validation pipeline that identified issues in the samples, including false, missing or invalid gold answers, textual encoding problems and many more. Only clean, validated samples made it into the final leaderboard. This process also revealed that quality problems are more pervasive across existing Arabic benchmarks than previously documented.
|
|
@@ -160,14 +157,6 @@
|
|
| 160 |
</div>
|
| 161 |
</div>
|
| 162 |
</div>
|
| 163 |
-
|
| 164 |
-
<!-- Pro-tip callout -->
|
| 165 |
-
<!-- <div class="flex gap-3 p-4 bg-indigo-50 dark:bg-indigo-900/20 border border-indigo-200 dark:border-indigo-700/50 rounded-xl">
|
| 166 |
-
<i data-lucide="lightbulb" class="w-5 h-5 text-indigo-500 dark:text-indigo-400 shrink-0 mt-0.5"></i>
|
| 167 |
-
<p class="text-sm text-slate-700 dark:text-slate-300 leading-relaxed">
|
| 168 |
-
<span class="font-semibold text-indigo-700 dark:text-indigo-300">Pro tip:</span> Combine benchmark domain filters, column visibility, and Filtered Average to instantly rank models on the exact subset of skills relevant to your use case.
|
| 169 |
-
</p>
|
| 170 |
-
</div> -->
|
| 171 |
</section>
|
| 172 |
|
| 173 |
<!-- Benchmarks & Metrics -->
|
|
@@ -192,14 +181,7 @@
|
|
| 192 |
<h4 class="font-bold text-slate-800 dark:text-slate-200">STEM</h4>
|
| 193 |
<span class="text-xs font-medium px-2 py-0.5 rounded-full bg-indigo-100 dark:bg-indigo-900/50 text-indigo-600 dark:text-indigo-400">MCQ</span>
|
| 194 |
</div>
|
| 195 |
-
<p class="text-sm text-slate-600 dark:text-slate-400">ArabicMMLU, 3LM STEM โ covering science, mathematics, and technical
|
| 196 |
-
</div>
|
| 197 |
-
<div class="p-4 bg-slate-50 dark:bg-slate-700/30 rounded-xl border border-slate-100 dark:border-slate-700">
|
| 198 |
-
<div class="flex items-center justify-between mb-2">
|
| 199 |
-
<h4 class="font-bold text-slate-800 dark:text-slate-200">Language & Reasoning</h4>
|
| 200 |
-
<span class="text-xs font-medium px-2 py-0.5 rounded-full bg-indigo-100 dark:bg-indigo-900/50 text-indigo-600 dark:text-indigo-400">MCQ</span>
|
| 201 |
-
</div>
|
| 202 |
-
<p class="text-sm text-slate-600 dark:text-slate-400">GAT (Saudi Aptitude Test) โ assessing verbal reasoning, language comprehension, and mathematical aptitude.</p>
|
| 203 |
</div>
|
| 204 |
<div class="p-4 bg-slate-50 dark:bg-slate-700/30 rounded-xl border border-slate-100 dark:border-slate-700">
|
| 205 |
<div class="flex items-center justify-between mb-2">
|
|
@@ -224,7 +206,7 @@
|
|
| 224 |
<div class="p-4 bg-slate-50 dark:bg-slate-700/30 rounded-xl border border-slate-100 dark:border-slate-700">
|
| 225 |
<div class="flex items-center justify-between mb-2">
|
| 226 |
<h4 class="font-bold text-slate-800 dark:text-slate-200">Poetry & Literature</h4>
|
| 227 |
-
<span class="text-xs font-medium px-2 py-0.5 rounded-full bg-
|
| 228 |
</div>
|
| 229 |
<p class="text-sm text-slate-600 dark:text-slate-400">FannOrFlop โ assessing understanding of classical and modern Arabic poetry, literary devices, and cultural context.</p>
|
| 230 |
</div>
|
|
@@ -333,7 +315,7 @@
|
|
| 333 |
</div>
|
| 334 |
<pre id="citationCode"
|
| 335 |
class="bg-slate-100 dark:bg-slate-900/50 p-6 rounded-xl border border-slate-200 dark:border-slate-700 overflow-x-auto text-xs sm:text-sm text-slate-600 dark:text-slate-400 font-mono leading-relaxed">@misc{QIMMA,
|
| 336 |
-
author = {AlQadi, Leen and Alzubaidi, Ahmed and
|
| 337 |
title = {QIMMA Leaderboard},
|
| 338 |
year = {2026},
|
| 339 |
publisher = {QIMMA},
|
|
|
|
| 25 |
|
| 26 |
<div class="text-center mb-12">
|
| 27 |
<h1 class="text-4xl font-extrabold tracking-tight sm:text-5xl gradient-text mb-6">About QIMMA</h1>
|
|
|
|
|
|
|
|
|
|
| 28 |
</div>
|
| 29 |
|
| 30 |
<div class="space-y-12 animate-fade-in">
|
|
|
|
| 40 |
</div>
|
| 41 |
<div class="prose dark:prose-invert max-w-none text-slate-600 dark:text-slate-300 leading-relaxed">
|
| 42 |
<p class="mb-4">
|
| 43 |
+
QIMMA ูู
ูุฉ (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on 14 carefully chosen benchmarks spanning STEM, legal reasoning, medical knowledge, poetry, cultural understanding, and code generation. QIMMA includes over 52,000 quality-validated samples across multiple-choice, generative, and code evaluation tracks. Over 99% of QIMMA's content is native Arabic, ensuring authentic linguistic and cultural assessment rather than relying on translated materials.
|
| 44 |
</p>
|
| 45 |
<p>
|
| 46 |
QIMMA was constructed through a systematic benchmark curation process: candidate benchmarks were assessed using a multi-model quality validation pipeline that identified issues in the samples, including false, missing or invalid gold answers, textual encoding problems and many more. Only clean, validated samples made it into the final leaderboard. This process also revealed that quality problems are more pervasive across existing Arabic benchmarks than previously documented.
|
|
|
|
| 157 |
</div>
|
| 158 |
</div>
|
| 159 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
</section>
|
| 161 |
|
| 162 |
<!-- Benchmarks & Metrics -->
|
|
|
|
| 181 |
<h4 class="font-bold text-slate-800 dark:text-slate-200">STEM</h4>
|
| 182 |
<span class="text-xs font-medium px-2 py-0.5 rounded-full bg-indigo-100 dark:bg-indigo-900/50 text-indigo-600 dark:text-indigo-400">MCQ</span>
|
| 183 |
</div>
|
| 184 |
+
<p class="text-sm text-slate-600 dark:text-slate-400">ArabicMMLU, 3LM STEM, GAT (Saudi General Aptitude Test) โ covering science, mathematics, verbal reasoning, and technical aptitude across diverse academic subjects.</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
</div>
|
| 186 |
<div class="p-4 bg-slate-50 dark:bg-slate-700/30 rounded-xl border border-slate-100 dark:border-slate-700">
|
| 187 |
<div class="flex items-center justify-between mb-2">
|
|
|
|
| 206 |
<div class="p-4 bg-slate-50 dark:bg-slate-700/30 rounded-xl border border-slate-100 dark:border-slate-700">
|
| 207 |
<div class="flex items-center justify-between mb-2">
|
| 208 |
<h4 class="font-bold text-slate-800 dark:text-slate-200">Poetry & Literature</h4>
|
| 209 |
+
<span class="text-xs font-medium px-2 py-0.5 rounded-full bg-cyan-100 dark:bg-cyan-900/50 text-cyan-600 dark:text-cyan-400">QA</span>
|
| 210 |
</div>
|
| 211 |
<p class="text-sm text-slate-600 dark:text-slate-400">FannOrFlop โ assessing understanding of classical and modern Arabic poetry, literary devices, and cultural context.</p>
|
| 212 |
</div>
|
|
|
|
| 315 |
</div>
|
| 316 |
<pre id="citationCode"
|
| 317 |
class="bg-slate-100 dark:bg-slate-900/50 p-6 rounded-xl border border-slate-200 dark:border-slate-700 overflow-x-auto text-xs sm:text-sm text-slate-600 dark:text-slate-400 font-mono leading-relaxed">@misc{QIMMA,
|
| 318 |
+
author = {AlQadi, Leen and Alzubaidi, Ahmed and Alyafeai, Mohammed and Alobeidli, Hamza and Alhammadi, Maitha and Alsuwaidi, Shaikha and Alkaabi, Omar and Boussaha, Basma El Amel and Hacid, Hakim},
|
| 319 |
title = {QIMMA Leaderboard},
|
| 320 |
year = {2026},
|
| 321 |
publisher = {QIMMA},
|