Add a new language: Persian (Farsi)

#41
by M-sh2025 - opened

Hello and thank you for your great work on Pixtral.

I’d like to suggest adding support for the Persian language in future versions of Pixtral. Here are a few reasons why I believe this would be valuable:

  1. Large Persian-speaking community: Over 100 million people speak Persian across Iran, Afghanistan, Tajikistan, and diaspora communities worldwide.
  2. Demand for localized NLP models: Many NLP applications—such as chatbots, summarization, translation, and sentiment analysis—still lack strong Persian-language support.
  3. Availability of Persian textual data: Resources like Persian Wikipedia, digital libraries, and social media provide rich corpora for pretraining or fine-tuning.
  4. Proven feasibility: Projects like PersianBERT and FarsiGPT have shown that Persian language modeling is both feasible and impactful.

If needed, I’d be happy to contribute to data preparation, tokenizer design, or evaluation benchmarks for Persian.

Best regards,
Mehrad
(Researcher and developer of Persian language models)

Sign up or log in to comment