Spaces:

qimma
/

leaderboard

Running on CPU Upgrade

App Files Files Community

leaderboard / README.md

basma-b

Update README.md

6369b38 verified about 2 months ago

preview code

raw

history blame contribute delete

3.52 kB

	---
	title: Qimma Leaderboard
	emoji: 📊
	colorFrom: blue
	colorTo: green
	sdk: docker
	python_version: 3.10.19
	pinned: false
	short_description: Qimma leaderboard
	---

	# Qimma Leaderboard

	The Qimma Leaderboard is an open evaluation platform for Arabic Large Language Models (LLMs). It tracks the performance of various models across a suite of Arabic benchmarks.

	## 🚀 Features

	* Leaderboard: Real-time ranking of models based on their performance on multiple datasets (AlGhafa, ArabicMMLU, EXAMS, etc.).
	* Submission System: Allows users to submit their models for evaluation.
	* Queue Status: Displays the current status of submitted models (Pending, Running, Finished, Failed).
	* Automated Updates: The system automatically fetches new results and updates the leaderboard.

	## 🛠️ Installation & Setup

	To run the leaderboard locally:

	1. Clone the repository:
	```bash
	git clone https://huggingface.co/spaces/qimma/Qimma-Leaderboard
	cd Qimma-Leaderboard
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Set up Environment Variables:
	You need a Hugging Face API token to access the datasets. Set the `HF_API_TOKEN` environment variable.
	```bash
	export HF_API_TOKEN="your_token_here"
	```

	4. Run the application:
	```bash
	python app.py
	```
	The app will be available at `http://localhost:7860`.

	## 📂 Code Structure

	The repository is organized into a frontend-backend architecture using FastAPI.

	### Backend (`/backend`)
	Handles data processing, API endpoints, and interaction with the Hugging Face Hub.

	* `app.py`: The main entry point. Initializes the FastAPI app, sets up routes, and manages background tasks for data synchronization.
	* `backend/config.py`: Configuration settings, including repository IDs and the list of evaluation tasks (`TASKS`).
	* `backend/data_loader.py`: Responsible for downloading dataset snapshots and loading leaderboard/queue data from the Hugging Face Hub.
	* `backend/submission_handler.py`: Logic for processing model submissions and validating input.
	* `backend/helpers.py`: Utility functions for data manipulation.

	### Frontend (`/frontend`)
	Contains the HTML templates served by the application.

	* `index.html`: The landing page displaying the main leaderboard.
	* `leaderboard.html`: The table component for displaying model rankings.
	* `submit.html`: The form for users to submit new models.
	* `about.html`: Information about the project and methodology.
	* `header.html`: Common header component used across pages.

	## ⚙️ Configuration

	The evaluation tasks and model types are defined in `backend/config.py`. You can modify the `TASKS` list to add or remove benchmarks.

	```python
	TASKS = [
	("community\|alghafa:_average\|0", "acc_norm", "AlGhafa"),
	# ...
	]
	```

	# Citation

	Please cite our [work](https://arxiv.org/pdf/2604.03395) if you use this space or datasets.

	```
	@misc{alqadi2026arabicbenchmarksreliableqimmas,
	title={Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation},
	author={Leen AlQadi and Ahmed Alzubaidi and Mohammed Alyafeai and Hamza Alobeidli and Maitha Alhammadi and Shaikha Alsuwaidi and Omar Alkaabi and Basma El Amel Boussaha and Hakim Hacid},
	year={2026},
	eprint={2604.03395},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2604.03395},
	}
	```