leaderboard / README.md
basma-b's picture
Update README.md
6369b38 verified
---
title: Qimma Leaderboard
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: docker
python_version: 3.10.19
pinned: false
short_description: Qimma leaderboard
---
# Qimma Leaderboard
The Qimma Leaderboard is an open evaluation platform for Arabic Large Language Models (LLMs). It tracks the performance of various models across a suite of Arabic benchmarks.
## πŸš€ Features
* **Leaderboard**: Real-time ranking of models based on their performance on multiple datasets (AlGhafa, ArabicMMLU, EXAMS, etc.).
* **Submission System**: Allows users to submit their models for evaluation.
* **Queue Status**: Displays the current status of submitted models (Pending, Running, Finished, Failed).
* **Automated Updates**: The system automatically fetches new results and updates the leaderboard.
## πŸ› οΈ Installation & Setup
To run the leaderboard locally:
1. **Clone the repository:**
```bash
git clone https://huggingface.co/spaces/qimma/Qimma-Leaderboard
cd Qimma-Leaderboard
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Set up Environment Variables:**
You need a Hugging Face API token to access the datasets. Set the `HF_API_TOKEN` environment variable.
```bash
export HF_API_TOKEN="your_token_here"
```
4. **Run the application:**
```bash
python app.py
```
The app will be available at `http://localhost:7860`.
## πŸ“‚ Code Structure
The repository is organized into a frontend-backend architecture using FastAPI.
### Backend (`/backend`)
Handles data processing, API endpoints, and interaction with the Hugging Face Hub.
* **`app.py`**: The main entry point. Initializes the FastAPI app, sets up routes, and manages background tasks for data synchronization.
* **`backend/config.py`**: Configuration settings, including repository IDs and the list of evaluation tasks (`TASKS`).
* **`backend/data_loader.py`**: Responsible for downloading dataset snapshots and loading leaderboard/queue data from the Hugging Face Hub.
* **`backend/submission_handler.py`**: Logic for processing model submissions and validating input.
* **`backend/helpers.py`**: Utility functions for data manipulation.
### Frontend (`/frontend`)
Contains the HTML templates served by the application.
* **`index.html`**: The landing page displaying the main leaderboard.
* **`leaderboard.html`**: The table component for displaying model rankings.
* **`submit.html`**: The form for users to submit new models.
* **`about.html`**: Information about the project and methodology.
* **`header.html`**: Common header component used across pages.
## βš™οΈ Configuration
The evaluation tasks and model types are defined in `backend/config.py`. You can modify the `TASKS` list to add or remove benchmarks.
```python
TASKS = [
("community|alghafa:_average|0", "acc_norm", "AlGhafa"),
# ...
]
```
# Citation
Please cite our [work](https://arxiv.org/pdf/2604.03395) if you use this space or datasets.
```
@misc{alqadi2026arabicbenchmarksreliableqimmas,
title={Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation},
author={Leen AlQadi and Ahmed Alzubaidi and Mohammed Alyafeai and Hamza Alobeidli and Maitha Alhammadi and Shaikha Alsuwaidi and Omar Alkaabi and Basma El Amel Boussaha and Hakim Hacid},
year={2026},
eprint={2604.03395},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.03395},
}
```