Moroccan Darija โ†’ English Translation (NLLB-200)

This model is a fine-tuned version of facebook/nllb-200-distilled-600m for machine translation from Moroccan Darija to English.

It is intended for informal, conversational Darija rather than Modern Standard Arabic.

Languages

  • Source: Moroccan Darija (ary_Arab)
  • Target: English (eng_Latn)
  • Primary script: Latin

Training Data

The model was fine-tuned on a custom Englishโ€“Darija parallel dataset compiled from the Darija Open Dataset.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "mwkhettab/nllb-200-darija-en"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "ุดู†ูˆ ุงู„ุทู‚ุณ ุฏุงุจุง ุจุฑุงุŸ"

inputs = tokenizer(text, return_tensors="pt", max_length=512)

outputs = model.generate(
    **inputs,
    forced_bos_token_id=tokenizer.convert_tokens_to_ids("eng_Latn"),
    max_new_tokens=128,
    num_beams=5,
)

translation = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
).strip()

print(translation)
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mwkhettab/nllb-200-darjia-en

Finetuned
(274)
this model

Collection including mwkhettab/nllb-200-darjia-en