Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
| title: Qimma Leaderboard | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| python_version: 3.10.19 | |
| pinned: false | |
| short_description: Qimma leaderboard | |
| # Qimma Leaderboard | |
| The Qimma Leaderboard is an open evaluation platform for Arabic Large Language Models (LLMs). It tracks the performance of various models across a suite of Arabic benchmarks. | |
| ## π Features | |
| * **Leaderboard**: Real-time ranking of models based on their performance on multiple datasets (AlGhafa, ArabicMMLU, EXAMS, etc.). | |
| * **Submission System**: Allows users to submit their models for evaluation. | |
| * **Queue Status**: Displays the current status of submitted models (Pending, Running, Finished, Failed). | |
| * **Automated Updates**: The system automatically fetches new results and updates the leaderboard. | |
| ## π οΈ Installation & Setup | |
| To run the leaderboard locally: | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone https://huggingface.co/spaces/qimma/Qimma-Leaderboard | |
| cd Qimma-Leaderboard | |
| ``` | |
| 2. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Set up Environment Variables:** | |
| You need a Hugging Face API token to access the datasets. Set the `HF_API_TOKEN` environment variable. | |
| ```bash | |
| export HF_API_TOKEN="your_token_here" | |
| ``` | |
| 4. **Run the application:** | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will be available at `http://localhost:7860`. | |
| ## π Code Structure | |
| The repository is organized into a frontend-backend architecture using FastAPI. | |
| ### Backend (`/backend`) | |
| Handles data processing, API endpoints, and interaction with the Hugging Face Hub. | |
| * **`app.py`**: The main entry point. Initializes the FastAPI app, sets up routes, and manages background tasks for data synchronization. | |
| * **`backend/config.py`**: Configuration settings, including repository IDs and the list of evaluation tasks (`TASKS`). | |
| * **`backend/data_loader.py`**: Responsible for downloading dataset snapshots and loading leaderboard/queue data from the Hugging Face Hub. | |
| * **`backend/submission_handler.py`**: Logic for processing model submissions and validating input. | |
| * **`backend/helpers.py`**: Utility functions for data manipulation. | |
| ### Frontend (`/frontend`) | |
| Contains the HTML templates served by the application. | |
| * **`index.html`**: The landing page displaying the main leaderboard. | |
| * **`leaderboard.html`**: The table component for displaying model rankings. | |
| * **`submit.html`**: The form for users to submit new models. | |
| * **`about.html`**: Information about the project and methodology. | |
| * **`header.html`**: Common header component used across pages. | |
| ## βοΈ Configuration | |
| The evaluation tasks and model types are defined in `backend/config.py`. You can modify the `TASKS` list to add or remove benchmarks. | |
| ```python | |
| TASKS = [ | |
| ("community|alghafa:_average|0", "acc_norm", "AlGhafa"), | |
| # ... | |
| ] | |
| ``` | |
| # Citation | |
| Please cite our [work](https://arxiv.org/pdf/2604.03395) if you use this space or datasets. | |
| ``` | |
| @misc{alqadi2026arabicbenchmarksreliableqimmas, | |
| title={Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation}, | |
| author={Leen AlQadi and Ahmed Alzubaidi and Mohammed Alyafeai and Hamza Alobeidli and Maitha Alhammadi and Shaikha Alsuwaidi and Omar Alkaabi and Basma El Amel Boussaha and Hakim Hacid}, | |
| year={2026}, | |
| eprint={2604.03395}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2604.03395}, | |
| } | |
| ``` |