File size: 3,518 Bytes
90828de
cb8dda6
 
 
90828de
cb8dda6
0da0ffb
90828de
cb8dda6
90828de
 
cb8dda6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9774089
 
6369b38
9774089
 
6369b38
9774089
 
 
 
 
 
 
 
6369b38
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: Qimma Leaderboard
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: docker
python_version: 3.10.19
pinned: false
short_description: Qimma leaderboard
---

# Qimma Leaderboard

The Qimma Leaderboard is an open evaluation platform for Arabic Large Language Models (LLMs). It tracks the performance of various models across a suite of Arabic benchmarks.

## πŸš€ Features

*   **Leaderboard**: Real-time ranking of models based on their performance on multiple datasets (AlGhafa, ArabicMMLU, EXAMS, etc.).
*   **Submission System**: Allows users to submit their models for evaluation.
*   **Queue Status**: Displays the current status of submitted models (Pending, Running, Finished, Failed).
*   **Automated Updates**: The system automatically fetches new results and updates the leaderboard.

## πŸ› οΈ Installation & Setup

To run the leaderboard locally:

1.  **Clone the repository:**
    ```bash
    git clone https://huggingface.co/spaces/qimma/Qimma-Leaderboard
    cd Qimma-Leaderboard
    ```

2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

3.  **Set up Environment Variables:**
    You need a Hugging Face API token to access the datasets. Set the `HF_API_TOKEN` environment variable.
    ```bash
    export HF_API_TOKEN="your_token_here"
    ```

4.  **Run the application:**
    ```bash
    python app.py
    ```
    The app will be available at `http://localhost:7860`.

## πŸ“‚ Code Structure

The repository is organized into a frontend-backend architecture using FastAPI.

### Backend (`/backend`)
Handles data processing, API endpoints, and interaction with the Hugging Face Hub.

*   **`app.py`**: The main entry point. Initializes the FastAPI app, sets up routes, and manages background tasks for data synchronization.
*   **`backend/config.py`**: Configuration settings, including repository IDs and the list of evaluation tasks (`TASKS`).
*   **`backend/data_loader.py`**: Responsible for downloading dataset snapshots and loading leaderboard/queue data from the Hugging Face Hub.
*   **`backend/submission_handler.py`**: Logic for processing model submissions and validating input.
*   **`backend/helpers.py`**: Utility functions for data manipulation.

### Frontend (`/frontend`)
Contains the HTML templates served by the application.

*   **`index.html`**: The landing page displaying the main leaderboard.
*   **`leaderboard.html`**: The table component for displaying model rankings.
*   **`submit.html`**: The form for users to submit new models.
*   **`about.html`**: Information about the project and methodology.
*   **`header.html`**: Common header component used across pages.

## βš™οΈ Configuration

The evaluation tasks and model types are defined in `backend/config.py`. You can modify the `TASKS` list to add or remove benchmarks.

```python
TASKS = [
    ("community|alghafa:_average|0", "acc_norm", "AlGhafa"),
    # ...
]
```

# Citation

Please cite our [work](https://arxiv.org/pdf/2604.03395) if you use this space or datasets.

```
@misc{alqadi2026arabicbenchmarksreliableqimmas,
      title={Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation}, 
      author={Leen AlQadi and Ahmed Alzubaidi and Mohammed Alyafeai and Hamza Alobeidli and Maitha Alhammadi and Shaikha Alsuwaidi and Omar Alkaabi and Basma El Amel Boussaha and Hakim Hacid},
      year={2026},
      eprint={2604.03395},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.03395}, 
}
```