LeenAlQadi commited on
Commit
ba42b92
·
verified ·
1 Parent(s): 3085adf

update about.html

Browse files
Files changed (1) hide show
  1. frontend/about.html +1 -1
frontend/about.html CHANGED
@@ -43,7 +43,7 @@
43
  </div>
44
  <div class="prose dark:prose-invert max-w-none text-slate-600 dark:text-slate-300 leading-relaxed">
45
  <p class="mb-4">
46
- QIMMA قمّة (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on 13 carefully chosen benchmarks spanning STEM, legal reasoning, medical knowledge, poetry, cultural understanding, and code generation. QIMMA includes over 52,000 quality-validated samples across multiple-choice, generative, and code evaluation tracks. Over 99% of QIMMA's content is native Arabic, ensuring authentic linguistic and cultural assessment rather than relying on translated materials.
47
  </p>
48
  <p>
49
  QIMMA was constructed through a systematic benchmark curation process: candidate benchmarks were assessed using a multi-model quality validation pipeline that identified issues in the samples, including false, missing or invalid gold answers, textual encoding problems and many more. Only clean, validated samples made it into the final leaderboard. This process also revealed that quality problems are more pervasive across existing Arabic benchmarks than previously documented.
 
43
  </div>
44
  <div class="prose dark:prose-invert max-w-none text-slate-600 dark:text-slate-300 leading-relaxed">
45
  <p class="mb-4">
46
+ QIMMA قمّة (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on 14 carefully chosen benchmarks spanning STEM, legal reasoning, medical knowledge, poetry, cultural understanding, and code generation. QIMMA includes over 52,000 quality-validated samples across multiple-choice, generative, and code evaluation tracks. Over 99% of QIMMA's content is native Arabic, ensuring authentic linguistic and cultural assessment rather than relying on translated materials.
47
  </p>
48
  <p>
49
  QIMMA was constructed through a systematic benchmark curation process: candidate benchmarks were assessed using a multi-model quality validation pipeline that identified issues in the samples, including false, missing or invalid gold answers, textual encoding problems and many more. Only clean, validated samples made it into the final leaderboard. This process also revealed that quality problems are more pervasive across existing Arabic benchmarks than previously documented.