NLLB-Distilled-600M: Kobani Kurdish → English

Fine-tuned version of facebook/nllb-200-distilled-600M for Kobani Kurdish → English neural machine translation (kob_Latn → eng_Latn).

This model adapts the multilingual NLLB-200 architecture (Cost et al., 2022) to support translation from the Kobani dialect of Kurmanji Kurdish – a severely under-resourced language variety with extremely limited parallel data and almost no representation in existing MT systems – into English.

Model Details

Base model: facebook/nllb-200-distilled-600M
Architecture: Encoder-decoder Transformer (6 encoder layers, 6 decoder layers, 600M parameters)
New language token: kob_Latn (custom token ID: 267756)
Forced BOS token during generation: 267756
Training objective: Supervised sequence-to-sequence fine-tuning on parallel sentence pairs
Direction: Kobani Kurdish (kob_Latn) → English (eng_Latn)

Training Hyperparameters

Training was performed using the Hugging Face Trainer API with the following configuration:

Number of epochs: 10
Per-device train batch size: 4
Gradient accumulation steps: 8
→ Effective batch size: 32
Learning rate: 3.0 × 10⁻⁵
Optimizer: AdamW (β₁=0.9, β₂=0.999, ε=1e-8)
Warmup steps: 500
Weight decay: 0.01
Evaluation steps: 2000
Save steps: 2000
Maximum saved checkpoints: 3
Early stopping: Patience = 3 (based on validation loss)

Training Data

The model was fine-tuned on parallel sentence, derived from aligned corpora with dialect-specific lexical and morphosyntactic perturbations.

Intended Use & Limitations

Intended use:

Research in low-resource and dialectal machine translation
Assistive translation support for Kobani Kurdish speakers
Bootstrapping additional NLP resources for Kurmanji varieties

Known limitations:

Performance heavily depends on domain similarity to training data
Potential exposure bias and hallucination in out-of-domain text
Zero-shot capability for Kobani remains limited without fine-tuning
Early evaluations indicated possible data leakage; results should be interpreted cautiously until confirmed on fully isolated test sets

Citation

If you use this model in your research, please cite:

@misc{ahmad2026kobani-nllb-600m-reverse,
  author       = {Raman Ahmad},
  title        = {nllb-distilled-600m-kob_Latn-to-eng},
  year         = {2026},
  publisher    = {Hugging Face},
  journal      = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/RamanAhmad/nllb-distilled-600m-kob_Latn-to-eng}}
}

Please also cite the original NLLB paper:

@article{cost2022nllb,
  title     = {No Language Left Behind: Scaling Human-Centered Machine Translation},
  author    = {NLLB Team and others},
  journal   = {arXiv preprint arXiv:2207.04672},
  year      = {2022},
  doi       = {10.48550/arXiv.2207.04672}
}

Downloads last month: 4

Safetensors

Model size

0.6B params

Tensor type

F32

Paper for RamanAhmad/nllb-distilled-600m-kob_Latn-to-eng

No Language Left Behind: Scaling Human-Centered Machine Translation

Paper • 2207.04672 • Published Jul 11, 2022 • 3