does the imatrix dataset contain enough arabic data?

by mahfuud33 - opened 22 days ago

Discussion

mahfuud33

22 days ago

This is a mainly Arabic LLM model. Did you use balanced mix of Arabic and English calibration dataset?

nicoboss

20 days ago

We always use the same imatrix dataset which is mainly English. There is a research paper showing that even for non-English models using an English dataset for imatrix computation often results in better quality than using one in the intended target language. This likely happens due to the language distribution of the initial base training.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment