leaderboard / README.md
basma-b's picture
Update README.md
6369b38 verified
metadata
title: Qimma Leaderboard
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: docker
python_version: 3.10.19
pinned: false
short_description: Qimma leaderboard

Qimma Leaderboard

The Qimma Leaderboard is an open evaluation platform for Arabic Large Language Models (LLMs). It tracks the performance of various models across a suite of Arabic benchmarks.

πŸš€ Features

  • Leaderboard: Real-time ranking of models based on their performance on multiple datasets (AlGhafa, ArabicMMLU, EXAMS, etc.).
  • Submission System: Allows users to submit their models for evaluation.
  • Queue Status: Displays the current status of submitted models (Pending, Running, Finished, Failed).
  • Automated Updates: The system automatically fetches new results and updates the leaderboard.

πŸ› οΈ Installation & Setup

To run the leaderboard locally:

  1. Clone the repository:

    git clone https://huggingface.co/spaces/qimma/Qimma-Leaderboard
    cd Qimma-Leaderboard
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up Environment Variables: You need a Hugging Face API token to access the datasets. Set the HF_API_TOKEN environment variable.

    export HF_API_TOKEN="your_token_here"
    
  4. Run the application:

    python app.py
    

    The app will be available at http://localhost:7860.

πŸ“‚ Code Structure

The repository is organized into a frontend-backend architecture using FastAPI.

Backend (/backend)

Handles data processing, API endpoints, and interaction with the Hugging Face Hub.

  • app.py: The main entry point. Initializes the FastAPI app, sets up routes, and manages background tasks for data synchronization.
  • backend/config.py: Configuration settings, including repository IDs and the list of evaluation tasks (TASKS).
  • backend/data_loader.py: Responsible for downloading dataset snapshots and loading leaderboard/queue data from the Hugging Face Hub.
  • backend/submission_handler.py: Logic for processing model submissions and validating input.
  • backend/helpers.py: Utility functions for data manipulation.

Frontend (/frontend)

Contains the HTML templates served by the application.

  • index.html: The landing page displaying the main leaderboard.
  • leaderboard.html: The table component for displaying model rankings.
  • submit.html: The form for users to submit new models.
  • about.html: Information about the project and methodology.
  • header.html: Common header component used across pages.

βš™οΈ Configuration

The evaluation tasks and model types are defined in backend/config.py. You can modify the TASKS list to add or remove benchmarks.

TASKS = [
    ("community|alghafa:_average|0", "acc_norm", "AlGhafa"),
    # ...
]

Citation

Please cite our work if you use this space or datasets.

@misc{alqadi2026arabicbenchmarksreliableqimmas,
      title={Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation}, 
      author={Leen AlQadi and Ahmed Alzubaidi and Mohammed Alyafeai and Hamza Alobeidli and Maitha Alhammadi and Shaikha Alsuwaidi and Omar Alkaabi and Basma El Amel Boussaha and Hakim Hacid},
      year={2026},
      eprint={2604.03395},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.03395}, 
}