does the imatrix dataset contain enough arabic data?

#1
by mahfuud33 - opened

This is a mainly Arabic LLM model. Did you use balanced mix of Arabic and English calibration dataset?

We always use the same imatrix dataset which is mainly English. There is a research paper showing that even for non-English models using an English dataset for imatrix computation often results in better quality than using one in the intended target language. This likely happens due to the language distribution of the initial base training.

Sign up or log in to comment