Spaces:

pauvanbr
/

la-leaderboard-v2

Configuration error

App Files Files Community

la-leaderboard-v2 / src /about.py

pauvanbr

Upload src/about.py

ca3ab27 verified 2 days ago

raw

history blame contribute delete

24.6 kB

	TITLE = """
	<h1 align="center" id="space-title">
	LLM Leaderboard de Lenguas de Iberoamérica y sus Variedades
	<br>
	(”La Leaderboard”)
	</h1>
	"""

	INTRODUCTION_TEXT = """
	EN: Open-Source LLM Leaderboard of Languages and Language Varieties of Ibero America: All the evaluation datasets have been originally created or manually translated into the corresponding languages. The paper presenting the results of the first version was published at [ACL Main 2025](https://aclanthology.org/2025.acl-long.1561) 🎉 This is the v2 update with an upgraded evaluation framework (lm-eval-harness ≥0.4.11), expanded language support (Valencian, Portuguese), and an open roadmap for culturally-relevant datasets.
	"""

	INTRODUCTION_TEXT_ES = """
	ES: Leaderboard Abierta de LLM en Lenguas y Variedades Lingüísticas de Iberoamérica: Todas las bases de datos de evaluación han sido originalmente creadas o manualmente traducidas a las correspondientes lenguas. El paper presentando los resultados de la primera versión fue publicado en [ACL Main 2025](https://aclanthology.org/2025.acl-long.1561) 🎉 Esta es la v2 con el framework de evaluación actualizado (lm-eval-harness ≥0.4.11), soporte ampliado de lenguas (valenciano, portugués) y una hoja de ruta abierta para datasets culturalmente relevantes.
	"""

	# TODO: Update number of benchmarks
	LLM_BENCHMARKS_TEXT = """
	## 💡 About "La Leaderboard" v2

	- Paper published at [ACL Main 2025](https://aclanthology.org/2025.acl-long.1561)
	- Press release: [SomosNLP](https://somosnlp.org/blog/en/la-leaderboard), [ILENIA](https://proyectoilenia.es/en/la-leaderboard-is-born-the-first-leaderboard-for-language-models-in-spanish-and-official-languages/), [IIC (ES)](https://www.iic.uam.es/procesamiento-del-lenguaje-natural/primera-leaderboard-publica-llms-espanol/), [UPM (ES)](short.upm.es/6bfnk)
	- YouTube: [LaHoraMaker (ES)](https://www.youtube.com/watch?v=ATa-fYZJ1yw)

	### 🤗 How it works

	Submit a model for automated evaluation on our clusters on the "Submit here" tab!

	### 📈 Tasks

	We evaluate models on key benchmarks using the modern EleutherAI lm-evaluation-harness v0.4.11+, a unified framework to test generative language models. You can find more information about the tasks in the "Tasks" tab.

	v2 Updates:
	- Upgraded to lm-eval-harness ≥0.4.11 with YAML-based task definitions and improved CLI (`lm-eval run`)
	- Added Valencian (VA) and Portuguese (PT) language tabs
	- Expanded precision support: 8bit, 4bit, GPTQ
	- Modern dependency stack (Transformers 4.51, Gradio 5.25, Datasets 3.5)

	Notes:
	- The evaluations are all 5-shot.
	- The results are normalized with the following formula: `normalized_value = (raw_value - random_baseline) / (max_value - random_baseline)`, where `random_baseline` is `0` for generative tasks and `1/n` for multi-choice QA with `n` choices.
	- Results are aggregated by calculating the average of all the tasks for a given language.
	- Base and instruction-tuned models are both accepted.

	### 🔎 Results

	You can find:

	- Detailed numerical results in the [results dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-results)
	- Community queries and running status in the [requests dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-requests)

	### ✅ Reproducibility

	To reproduce our results, install the modern lm-eval-harness:

	```bash
	pip install lm-eval>=0.4.11
	```

	For custom tasks, create a YAML task definition in a local directory and use `--include_path`:

	```bash
	lm-eval run --model hf \
	--model_args "pretrained=<your_model>,revision=<rev>,dtype=<dtype>" \
	--tasks=laleaderboard \
	--num_fewshot=5 \
	--device="cuda:0" \
	--batch_size=auto \
	--output_path=<output_path>
	```

	You can also run the tasks only for one language by changing the `tasks` argument to `laleaderboard_<language>`, being `<language>` `es` for Spanish, `ca` for Catalan, `eu` for Basque, `gl` for Galician, `va` for Valencian, and `pt` for Portuguese.

	## 🙌 Acknowledgements

	This leaderboard was developed as part of the [#Somos600M Project](https://somosnlp.org/somos600m) lead by [SomosNLP](https://somosnlp.org) thanks to the donation of high-quality evaluation datasets by
	- [Instituto de Ingeniería del Conocimiento (IIC)](https://www.iic.uam.es/)
	- [LenguajeNaturalAI](https://lenguajenatural.ai/)
	- [HiTZ Basque Center for Language Technology](https://hitz.eus)
	- [Next Generation Internet Group (GING) from Universidad Politécnica de Madrid (UPM)](https://ging.github.io/)
	- [Language Technologies Unit at Barcelona Supercomputing Center (BSC)](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit)
	- [Universidad de Santiago de Compostela (USC)](https://www.usc.gal): [CiTIUS (Centro Singular de investigación en Tecnologías Inteligentes)](https://citius.gal/es/) and [ILG (Instituto da Lingua Galega)](https://ilg.usc.gal/es)
	- [Grupo de Ingeniería Lingüística de la Universidad Nacional Autónoma de México (GIL-UNAM)](http://grupos.iingen.unam.mx/iling/es-mx/Paginas/default.aspx)
	- [Grupo de investigación en Sistemas Inteligentes de Acceso a la Información (SINAI) de la Universidad de Jaén](https://sinai.ujaen.es/)
	- [Universidad Nacional de Córdoba (UNC)](https://www.unc.edu.ar/) and [Fundación Vía Libre](https://www.vialibre.org.ar/)

	The entities above are ordered chronologically by the date they joined the project. However, the logos in the footer are ordered by the number of datasets donated.

	Thank you in particular to:
	- Task implementation: Anna Sallés, Irene Baucells, Javier Aula-Blasco, Julen Etxaniz, Gonzalo Santamaría, Alejandro Vaca, María Grandury, Gonzalo Martínez, Miguel González, Iker García-Ferrero, Guido Ivetta
	- Leaderboard implementation: María Grandury, Clémentine Fourrier
	- Model evaluation: María Grandury, Miguel González, Gonzalo Martínez, Javier Aula-Blasco, Bram Vanroy
	- Communication: Maria Sayavera, Florent Daudens
	- Organization & colab leads: María Grandury, Javier Aula-Blasco, Javier Conde, Alejandro Vaca, Marta Guerrero, Álvaro Barbero, Rodrigo Agerri, Maite Martín, Helena Gómez, Luciana Benotti, Iker García-Ferrero

	For information about the dataset authors please check the corresponding Dataset Cards (linked in the "Tasks" tab) and papers (included in the "Citation" section below). We would like to specially thank the teams that created or open-sourced their datasets specifically for the leaderboard (in chronological order):
	- AQuAS and RagQuAS: Carmen Muñoz, Helena Montoro, Leire Rosado, Marta Guerrero, Nuria Aldama, Natàlia Fuertes
	- MedicalExpertES, SpaLawEx and HumorQA: Alejandro Vaca
	- TELEIA: Marina Mayor-Rocher, Nina Melero, Elena Merino-Gómez, Miguel González, Raquel Ferrando, Javier Conde, Gonzalo Martínez, Pedro Reviriego

	We also thank Barcelona Supercomputing Center, Universidad Politécnica de Madrid, HuggingFace, María Grandury, Gonzalo Martínez and The Flemish Supercomputer Center for sponsoring the inference GPUs.

	## 🚀 Collaborate!

	We would like to create a leaderboard as diverse as possible, reach out if you would like us to include your evaluation dataset!

	Comments and suggestions are more than welcome! Visit the [👏 Community](https://huggingface.co/spaces/pauvanbr/la-leaderboard-v2/discussions) page, tell us what you think about La Leaderboard and how we can improve it, or go ahead and open a PR!

	Thank you very much! 💛
	"""
	# TODO: Of the first release. Now all the evaluations run on the Mare Nostrum cluster at BSC.

	# TODO: Include link to Argilla space
	# We have launched a community effort to validate the machine-translated Spanish versions of three of the most used English benchmarks. Join the effort!

	LLM_BENCHMARKS_TEXT_ES = """
	## 💡 Sobre "La Leaderboard" v2

	- Paper publicado en [ACL Main 2025](https://aclanthology.org/2025.acl-long.1561)
	- Notas de prensa: [SomosNLP](https://somosnlp.org/blog/la-leaderboard), [ILENIA](https://proyectoilenia.es/nace-la-leaderboard-la-primera-tabla-de-clasificacion-para-modelos-de-lenguaje-en-espanol-y-las-lenguas-oficiales/), [IIC](https://www.iic.uam.es/procesamiento-del-lenguaje-natural/primera-leaderboard-publica-llms-espanol/), [UPM](short.upm.es/6bfnk)
	- YouTube: [LaHoraMaker](https://www.youtube.com/watch?v=ATa-fYZJ1yw)

	### 🤗 Cómo funciona

	Envía un modelo para que sea evaluado automáticamente en el clúster de GPUs de 🤗 en la pestaña "Submit here".

	### 📈 Tareas

	Evaluamos los modelos en tareas clave utilizando el moderno EleutherAI lm-evaluation-harness v0.4.11+, una librería para evaluar modelos de lenguaje generativos. Puedes encontrar más información sobre las tareas en la pestaña "Tasks".

	Actualizaciones v2:
	- Actualizado a lm-eval-harness ≥0.4.11 con definiciones de tareas en YAML y CLI mejorado (`lm-eval run`)
	- Añadidas pestañas de valenciano (VA) y portugués (PT)
	- Soporte ampliado de precisiones: 8bit, 4bit, GPTQ
	- Stack de dependencias moderno (Transformers 4.51, Gradio 5.25, Datasets 3.5)

	Notas:
	- Todas las evaluaciones son 5-shot.
	- Los resultados han sido normalizados con la siguiente fórmula: `valor_normalizado = (valor_original - valor_aleatorio) / (valor_máximo - valor_aleatorio)`, donde `valor_aleatorio` es `0` para tareas generativas y `1/n` para tareas de opción múltiple con `n` opciones.
	- Los resultados se agregan calculando el promedio de los resultados de las tareas para cada lengua.
	- Se aceptan modelos base y modelos ajustados para seguir instrucciones.

	### 🔎 Resultados

	Puedes encontrar:

	- Resultados numéricos detallados en el [results dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-results)
	- Peticiones de evaluación de la comunidad y estado de ejecución en el [requests dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-requests)

	### ✅ Reproducibilidad

	Para reproducir los resultados, instala el lm-eval-harness moderno:

	```bash
	pip install lm-eval>=0.4.11
	```

	Para tareas personalizadas, crea una definición YAML en un directorio local y usa `--include_path`:

	```bash
	lm-eval run --model hf \
	--model_args "pretrained=<tu_modelo>,revision=<rev>,dtype=<dtype>" \
	--tasks=laleaderboard \
	--num_fewshot=5 \
	--device="cuda:0" \
	--batch_size=auto \
	--output_path=<ruta_salida>
	```

	También puedes ejecutar las tareas solo para una lengua cambiando el argumento `tasks` a `laleaderboard_<lengua>`, siendo `<lengua>` `es` para español, `ca` para catalán, `eu` para euskera, `gl` para gallego, `va` para valenciano y `pt` para portugués.

	## 🙌 Agradecimientos

	Esta leaderboard ha sido desarrollada como parte del [Proyecto #Somos600M](https://somosnlp.org/somos600m) liderado por [SomosNLP](https://somosnlp.org) gracias a la donación de bases de datos de evaluación de alta calidad de:
	- [Instituto de Ingeniería del Conocimiento (IIC)](https://www.iic.uam.es/)
	- [LenguajeNaturalAI](https://lenguajenatural.ai/)
	- [HiTZ Basque Center for Language Technology](https://hitz.eus)
	- [Next Generation Internet Group (GING) from Universidad Politécnica de Madrid (UPM)](https://ging.github.io/)
	- [Language Technologies Unit at Barcelona Supercomputing Center (BSC)](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit)
	- [Universidad de Santiago de Compostela (USC)](https://www.usc.gal): [CiTIUS (Centro Singular de investigación en Tecnologías Inteligentes)](https://citius.gal/es/) e [ILG (Instituto da Lingua Galega)](https://ilg.usc.gal/es)
	- [Grupo de Ingeniería Lingüística de la Universidad Nacional Autónoma de México (GIL-UNAM)](http://grupos.iingen.unam.mx/iling/es-mx/Paginas/default.aspx)
	- [Grupo de investigación en Sistemas Inteligentes de Acceso a la Información (SINAI) de la Universidad de Jaén](https://sinai.ujaen.es/)
	- [Universidad Nacional de Córdoba (UNC)](https://www.unc.edu.ar/) y [Fundación Vía Libre](https://www.vialibre.org.ar/)

	Entidades ordenadas cronológicamente por fecha de incorporación al proyecto. Los logos están ordenados por número de datasets donados.

	Gracias en particular a:
	- Implementación de tareas: Anna Sallés, Irene Baucells, Javier Aula-Blasco, Julen Etxaniz, Gonzalo Santamaría, Alejandro Vaca, María Grandury, Gonzalo Martínez, Miguel González, Iker García-Ferrero, Guido Ivetta
	- Implementación de la leaderboard: María Grandury, Clémentine Fourrier
	- Evaluación de tareas: María Grandury, Miguel González, Gonzalo Martínez, Javier Aula-Blasco, Bram Vanroy
	- Comunicación: Maria Sayavera, Florent Daudens
	- Organización y gestión de colaboraciones: María Grandury, Javier Aula-Blasco, Javier Conde, Alejandro Vaca, Marta Guerrero, Álvaro Barbero, Rodrigo Agerri, Maite Martín, Helena Gómez, Luciana Benotti, Iker García-Ferrero

	Para información sobre autores y autoras de los datasets, por favor consulta las Dataset Cards (enlaces en la pestaña "Tasks") y los papers (incluidos a continuación en la sección "Citation"). Queremos agradecer especialmente a los equipos que han creado o abierto datasets específicamente para la leaderboard (en orden cronológico):
	- AQuAS y RagQuAS: Carmen Muñoz, Helena Montoro, Leire Rosado, Marta Guerrero, Nuria Aldama, Natàlia Fuertes
	- MedicalExpertES, SpaLawEx y HumorQA: Alejandro Vaca
	- TELEIA: Marina Mayor-Rocher, Nina Melero, Elena Merino-Gómez, Miguel González, Raquel Ferrando, Javier Conde, Gonzalo Martínez, Pedro Reviriego

	También queremos agradecer a Barcelona Supercomputing Center, Universidad Politécnica de Madrid, HuggingFace, María Grandury, Gonzalo Martínez y The Flemish Supercomputer Center por proporcionar las GPUs de inferencia.

	## 🚀 ¡Colabora!

	Queremos crear una leaderboard lo más diversa posible, contáctanos si te gustaría incluir tu base de datos de evaluación.

	Además, ¡los comentarios y sugerencias son más que bienvenidos! Visita la página [👏 Community](https://huggingface.co/spaces/pauvanbr/la-leaderboard-v2/discussions), cuéntanos qué te parece La Leaderboard y cómo podemos mejorarla, o abre directamente una PR.

	¡Muchas gracias! 💛
	"""

	# TODO: Add link to FAQ and docs
	# > Important: Don't forget to read the [FAQ](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/faq) and [documentation](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about) for more information! 📄
	EVALUATION_QUEUE_TEXT = """
	## Evaluation queue for "La Leaderboard" v2

	- Models added here will be automatically evaluated on all the tasks in the Mare Nostrum 5 of the Barcelona Supercomputing Center (BSC).
	- Results might take some time to appear on the leaderboard, thanks for your patience.
	- When submitting, git branches and tags will be strictly tied to the specific commit present at the time of submission to ensure revision consistency.

	### Submission disclaimer

	By submitting a model, you acknowledge that:
	- We store information about who submitted each model in [Requests dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-requests).
	- This practice helps maintain the integrity of our leaderboard, prevent spam, and ensure responsible submissions.
	- Your submission will be visible to the community and you may be contacted regarding your model.
	- Please submit carefully and responsibly 💛

	### First steps before submitting a model

	#### 1. Ensure your model loads with `AutoClasses`
	Verify that you can load your model and tokenizer using `AutoClasses`:

	```python
	from transformers import AutoConfig, AutoModel, AutoTokenizer
	config = AutoConfig.from_pretrained("your model name", revision=revision)
	model = AutoModel.from_pretrained("your model name", revision=revision)
	tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
	```

	Note:
	- If this step fails, debug your model before submitting.
	- Ensure your model is public.
	- We are working on adding support for models requiring `use_remote_code=True`.

	#### 2. Convert the weights to `Safetensors`
	[`Safetensors`](https://huggingface.co/docs/safetensors/index) is a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!

	#### 3. Verify your model's open license
	This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗

	#### 4. Complete your Model Card
	When we add extra information about models to the leaderboard, it will be automatically taken from the Model Card. You can find a template [here]().

	#### 5. Select the correct precision
	Choose the right precision to avoid evaluation errors:
	- Not all models convert properly from `float16` to `bfloat16`.
	- Incorrect precision can cause issues (e.g., loading a `bf16` model in `fp16` may generate `NaNs`).

	> Important: Base and instruction-tuned models are both accepted in v2!
	"""


	EVALUATION_QUEUE_TEXT_ES = """
	## Lista de evaluación de "La Leaderboard" v2

	- Los modelos enviados serán evaluados automáticamente en todas las tareas de La Leaderboard en el Mare Nostrum 5 del Barcelona Supercomputing Center (BSC).
	- Los resultados pueden tardar un tiempo en aparecer en La Leaderboard, gracias por tu paciencia.
	- Al enviar un modelo, las ramas y etiquetas de git quedarán ligadas al commit presente en el momento del envío para garantizar la consistencia de las versiones.

	### Disclaimer

	Al enviar un modelo, ten en cuenta que:
	- Almacenamos información sobre quién envió cada modelo en el [requests dataset](https://huggingface.co/datasets/pauvanbr/la-leaderboard-v2-requests).
	- Esto nos ayuda a mantener la integridad de nuestra leaderboard, evitar spam y garantizar envíos responsables.
	- Tu envío será visible para la comunidad y se te podrá contactar respecto a tu modelo.
	- Por favor, envía modelos de manera responsable 💛

	### Primeros pasos antes de enviar un modelo

	#### 1. Asegúrate de que tu modelo se pueda cargar con `AutoClasses`
	Verifica que puedes cargar tu modelo y tokenizador usando `AutoClasses`:

	```python
	from transformers import AutoConfig, AutoModel, AutoTokenizer
	config = AutoConfig.from_pretrained("your model name", revision=revision)
	model = AutoModel.from_pretrained("your model name", revision=revision)
	tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
	```

	Nota:
	- Si este paso falla, arregla tu modelo antes de enviarlo.
	- Asegúrate de que tu modelo sea público.
	- Estamos trabajando para admitir modelos que requieran `use_remote_code=True`.

	#### 2. Convierte los pesos a `Safetensors`
	[`Safetensors`](https://huggingface.co/docs/safetensors/index) es un nuevo formato para almacenar pesos que es más seguro y más rápido de cargar y usar. También nos permitirá añadir el número de parámetros de tu modelo a la `Extended Viewer`.

	#### 3. Verifica que tu modelo tiene una licencia abierta
	Esta es una leaderboard para modelos abiertos y nos gustaría que la mayor cantidad de gente pueda usar tu modelo 🤗

	#### 4. Completa la documentación de tu modelo
	Cuando añadamos a La Leaderboard información extra sobre los modelos, se tomará automáticamente de la Model Card.

	#### 5. Selecciona la precisión correcta
	Elige la precisión adecuada para evitar errores de evaluación:
	- No todos los modelos se convierten correctamente de `float16` a `bfloat16`.
	- Una precisión incorrecta puede causar problemas (por ejemplo, cargar un modelo `bf16` en `fp16` puede generar `NaNs`).

	> Importante: En v2 se aceptan modelos base y modelos ajustados para seguir instrucciones.
	"""

	LOGOS = [
	"https://somosnlp.github.io/assets/logo_somosnlp.png",
	"https://somosnlp.github.io/assets/images/patrocinios/BSC.png",
	"https://somosnlp.github.io/assets/images/patrocinios/HiTZ.png",
	"https://somosnlp.github.io/assets/images/patrocinios/USC.jpeg",
	"https://somosnlp.github.io/assets/images/patrocinios/IIC.png",
	"https://somosnlp.github.io/assets/images/patrocinios/LenguajeNaturalAI.jpeg",
	"https://somosnlp.github.io/assets/images/patrocinios/UPM.jpeg",
	"https://somosnlp.github.io/assets/images/patrocinios/GIL_UNAM.jpeg",
	"https://somosnlp.github.io/assets/images/patrocinios/SINAI.png",
	"https://somosnlp.github.io/assets/images/patrocinios/UNC.png",
	"https://somosnlp.github.io/assets/images/patrocinios/FundacionViaLibre.png",
	"https://somosnlp.github.io/assets/images/patrocinios/ILENIA.png",
	"https://somosnlp.github.io/assets/images/patrocinios/VlaanderenIsSupercomputing.png",
	"https://somosnlp.github.io/assets/images/patrocinios/HuggingFace.png",
	]

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_LABEL_ES = "Copia el siguiente fragmento para citar estos resultados"
	CITATION_BUTTON_TEXT = r"""@inproceedings{grandury-etal-2025-la,
	title = "La Leaderboard: A Large Language Model Leaderboard for {S}panish Varieties and Languages of {S}pain and {L}atin {A}merica",
	author = "Grandury, Mar{\'i}a and
	Aula-Blasco, Javier and
	Falc{\~a}o, J{\'u}lia and
	Fourrier, Cl{\'e}mentine and
	Saiz, Miguel Gonz{\'a}lez and
	Mart{\'i}nez, Gonzalo and
	Gomez, Gonzalo Santamaria and
	Agerri, Rodrigo and
	Garc{\'i}a, Nuria Aldama and
	Chiruzzo, Luis and
	Conde, Javier and
	Gomez Adorno, Helena and
	Nieto, Marta Guerrero and
	Ivetta, Guido and
	Fuertes, Nat{\`a}lia L{\'o}pez and
	Plaza-del-Arco, Flor Miriam and
	Mart{\'i}n-Valdivia, Mar{\'i}a-Teresa and
	Zamorano, Helena Montoro and
	Sanz, Carmen Mu{\~n}oz and
	Reviriego, Pedro and
	Plaza, Leire Rosado and
	Vaca Serrano, Alejandro and
	Vallecillo-Rodr{\'i}guez, Estrella and
	Vallego, Jorge and
	Zubiaga, Irune",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.acl-long.1561/",
	doi = "10.18653/v1/2025.acl-long.1561",
	pages = "32482--32524",
	ISBN = "979-8-89176-251-0",
	abstract = "Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Catalan, Basque, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community."
	}

	@software{eval-harness,
	author = {Gao, Leo and
	Tow, Jonathan and
	Biderman, Stella and
	Black, Sid and
	DiPofi, Anthony and
	Foster, Charles and
	Golding, Laurence and
	Hsu, Jeffrey and
	McDonell, Kyle and
	Muennighoff, Niklas and
	Phang, Jason and
	Reynolds, Laria and
	Tang, Eric and
	Thite, Anish and
	Wang, Ben and
	Wang, Kevin and
	Zou, Andy},
	title = {A framework for few-shot language model evaluation},
	month = sep,
	year = 2021,
	publisher = {Zenodo},
	version = {v0.0.1},
	doi = {10.5281/zenodo.5371628},
	url = {https://doi.org/10.5281/zenodo.5371628}
	}
	"""