NLLB-Distilled-600M: Kobani Kurdish β†’ English

Fine-tuned version of facebook/nllb-200-distilled-600M for Kobani Kurdish β†’ English neural machine translation (kob_Latn β†’ eng_Latn).

This model adapts the multilingual NLLB-200 architecture (Cost et al., 2022) to support translation from the Kobani dialect of Kurmanji Kurdish – a severely under-resourced language variety with extremely limited parallel data and almost no representation in existing MT systems – into English.

Model Details

  • Base model: facebook/nllb-200-distilled-600M
  • Architecture: Encoder-decoder Transformer (6 encoder layers, 6 decoder layers, 600M parameters)
  • New language token: kob_Latn (custom token ID: 267756)
  • Forced BOS token during generation: 267756
  • Training objective: Supervised sequence-to-sequence fine-tuning on parallel sentence pairs
  • Direction: Kobani Kurdish (kob_Latn) β†’ English (eng_Latn)

Training Hyperparameters

Training was performed using the Hugging Face Trainer API with the following configuration:

  • Number of epochs: 10
  • Per-device train batch size: 4
  • Gradient accumulation steps: 8
    β†’ Effective batch size: 32
  • Learning rate: 3.0 Γ— 10⁻⁡
  • Optimizer: AdamW (β₁=0.9, Ξ²β‚‚=0.999, Ξ΅=1e-8)
  • Warmup steps: 500
  • Weight decay: 0.01
  • Evaluation steps: 2000
  • Save steps: 2000
  • Maximum saved checkpoints: 3
  • Early stopping: Patience = 3 (based on validation loss)

Training Data

The model was fine-tuned on parallel sentence, derived from aligned corpora with dialect-specific lexical and morphosyntactic perturbations.

Intended Use & Limitations

Intended use:

  • Research in low-resource and dialectal machine translation
  • Assistive translation support for Kobani Kurdish speakers
  • Bootstrapping additional NLP resources for Kurmanji varieties

Known limitations:

  • Performance heavily depends on domain similarity to training data
  • Potential exposure bias and hallucination in out-of-domain text
  • Zero-shot capability for Kobani remains limited without fine-tuning
  • Early evaluations indicated possible data leakage; results should be interpreted cautiously until confirmed on fully isolated test sets

Citation

If you use this model in your research, please cite:

@misc{ahmad2026kobani-nllb-600m-reverse,
  author       = {Raman Ahmad},
  title        = {nllb-distilled-600m-kob_Latn-to-eng},
  year         = {2026},
  publisher    = {Hugging Face},
  journal      = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/RamanAhmad/nllb-distilled-600m-kob_Latn-to-eng}}
}

Please also cite the original NLLB paper:

@article{cost2022nllb,
  title     = {No Language Left Behind: Scaling Human-Centered Machine Translation},
  author    = {NLLB Team and others},
  journal   = {arXiv preprint arXiv:2207.04672},
  year      = {2022},
  doi       = {10.48550/arXiv.2207.04672}
}
Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for RamanAhmad/nllb-distilled-600m-kob_Latn-to-eng