Fully Fine-tuned NLLB Model for Bidirectional Odia ↔ German Translation

This is a fine-tuned version of facebook/nllb-200-distilled-600M specialized for bidirectional translation between Odia (ory_Orya) and German (deu_Latn).

This model was developed as part of a thesis project focused on effective fine-tuning strategies for low-resource language pairs within the journalistic domain. It was fine-tuned on a carefully constructed hybrid dataset, combining a larger set of high-quality, human-validated translations with a smaller set of machine-translated sentences to expand lexical, contextual and grammatical coverage.

Live Demo:

You can test this model live on its Hugging Face Spaces Gradio App.

Model Details

Base Model: facebook/nllb-200-distilled-600M
Languages: Odia (or), German (de)
Fine-tuning Domain: Journalistic text sourced from contemporary Odia newspapers (Dharitri & Sambad).
Developed by: Abhinandan Samal
Thesis: Enhancing Contextual Understanding in Low-Resource Languages Using Multilingual Transformers
University: IU International University of Applied Sciences
Date: Aug 26, 2025

Fine-tuning Details

Training and Evaluation Data

The model was fine-tuned on a meticulously prepared parallel corpus. Initially, 3,676 unique parallel line pairs were collected. Each "line" in the corpus was designed to provide contextual information for the model, typically containing 2-3 sentences, although some lines consist of a single sentence.

The data originates from two specific Odia newspapers and encompasses a diverse range of news domains, including National, International, Lifestyle, Sports, Trade, Environmental, Science and Technology, Leisure, Commerce, Metro, State, and Editorial.

The curation process involved distinct quality control steps for each language:

Odia Corpus Validation: All 3,676 lines on the Odia side of the parallel corpus underwent thorough evaluation and validation by a native Odia speaker (the author), ensuring high linguistic fidelity.
German Corpus Curation:
- A high-quality subset of 2,000 German lines (corresponding to 2,000 of the original parallel pairs) was meticulously human-evaluated and corrected by a native German speaker. This segment forms a core, highly accurate dataset.
- The remaining 1,676 German lines (corresponding to the other original parallel pairs) were generated using Google Translate. These lines were utilized to broaden the model's exposure to a wider range of vocabulary and grammatical structures.

Following this rigorous curation, the corpus was transformed into a final bidirectional training dataset, resulting in 7,352 distinct training instances. This was achieved by creating two training examples from each parallel pair, utilizing task-specific prefixes (translate Odia to German: and translate German to Odia:). The overall size of this dataset was carefully managed and selected as a practical upper limit dictated by the memory and computational constraints of the available single-GPU training environment (NVIDIA A100 on Google Colab Pro).

Here, you can check the dataset.

Training Procedure

The model was fine-tuned using PyTorch and the Hugging Face Seq2SeqTrainer.

Key Hyperparameters:

Learning Rate: 2e-5
Number of Epochs: 3
Effective Batch Size: 16 (per_device_train_batch_size=4 with gradient_accumulation_steps=4)
Optimizer: adafactor
Precision: Mixed Precision (fp16=True)
Memory Optimization: gradient_checkpointing=True

Evaluation Results

The fine-tuned model's performance was rigorously evaluated against the original facebook/nllb-200-distilled-600M baseline on a held-out test set composed partially (77%) of human-validated sentence pairs. I report scores across three standard machine translation metrics: BLEU (higher is better), chrF (higher is better), and TER (Translation Edit Rate, where lower is better).

Metric	Odia → German (Baseline)	Odia → German (Fine-Tuned)	German → Odia (Baseline)	German → Odia (Fine-Tuned)
BLEU	22.0355	27.1802	9.3467	14.8624
chrF++	43.3357	54.3083	38.3720	43.4127
TER	82.7669	64.5270	97.6340	74.4360
COMET	-0.0285	0.5479	0.1876	0.8167

Interpretation of Results

The evaluation demonstrates substantial and consistent improvements across both translation directions following the full fine-tuning process.

1. `Odia → German` (Generating the High-Resource Language)

The fine-tuning process unlocked massive gains in generating German, transforming the model from a weak baseline into a much more competent translation system.

BLEU (22.04 -> 27.18):
- Result: An improvement of +5.14 points.
- In Machine Translation, a gain of >1 BLEU is noticeable, and >5 is considered a major quality shift. The model is now matching the reference phrasing and vocabulary much more accurately.
chrF++ (43.34 -> 54.31):
- Result: An improvement of +10.97 points.
- A +11 point jump in chrF++ (Character F-score) is massive. It indicates that even when the model misses the exact word, it is getting the stems and suffixes correct. It shows the Fine-Tuned model has learned the morphology of German much better than the baseline, successfully generating correct complex words rather than fragmented sub-words.
TER (82.77 -> 64.53):
- Result: A massive reduction of -18.24 points (Lower is better).
- The "Translation Edit Rate" measures how much a human would need to edit the output to match the reference. A drop this large suggests the model has stopped hallucinating or producing "broken" grammar, requiring far fewer corrections.
COMET (-0.029 -> 0.548):
- Result: A drastic shift from negative to positive.
- COMET measures semantic meaning (using embeddings). A negative score typically implies the translation was unrelated to the source or nonsensical. A score of 0.548 indicates high-quality, semantically accurate translation. This is the most significant indicator of success.

2. `German → Odia` (Generating the Low-Resource Language)

Generating Odia remains a challenging task due to its morphological richness, but the fine-tuned model shows clear progress over the zero-shot baseline.

BLEU (9.35 -> 14.86):
- Result: An improvement of +5.51 points.
- Translating into Odia is harder due to its rich morphology. While the absolute score (14.86) appears low compared to German, the relative improvement is actually higher than the reverse direction. The fine-tuning successfully adapted the model to the Odia target space.
chrF++ (38.37 -> 43.41):
- Result: An improvement of +5.04 points.
- While BLEU (word-level) improved by 5 points, the chrF++ improvement confirms that the model isn't just guessing words; it is learning to construct Odia words correctly (spelling and inflections). This metric is often more reliable than BLEU for Indian languages because it gives credit for getting the "root" of the word right, even if the suffix is slightly off.
TER (97.63 -> 74.44):
- Result: A massive reduction of -23.19 points (Lower is better).
- Dropping to 74.44 is a significant usability improvement. It suggests that while the translations are still far from perfect (Odia generation is hard), the "post-editing effort" required has dropped by nearly 25%. The model is hallucinating less and producing output that is structurally closer to the reference.
COMET (0.188 -> 0.817):
- Result: A remarkably high score.
- Despite the lower BLEU score (14.86), the COMET score is very high (0.817). This discrepancy often happens in agglutinative languages like Odia. It suggests the model is generating valid, meaningful Odia sentences that use different synonyms or word orders than the reference (which BLEU punishes, but COMET understands is correct).

Summary

The Full Fine-Tuning approach has successfully adapted the NLLB model for this specific Odia-German bitext. The transition from negative/low COMET scores to positive, high-confidence scores confirms that the model is now effectively aligning the semantics of both languages, rather than just relying on random token matches.

How to Use

The easiest way to use this model is with the translation pipeline from the transformers library. The model was trained to be bidirectional, and you can control the translation direction by specifying the src_lang and tgt_lang during the call.

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Load the translation pipeline with your fine-tuned model
model_id = "abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional"
translator = pipeline(
    "translation",
    model=model_id,
    dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto"
)

# --- Task Prefixes (As used in your training phase) ---
PREFIX_ORI_TO_DEU = "translate Odia to German: "
PREFIX_DEU_TO_ORI = "translate German to Odia: "

# --- Example 1: Translate Odia to German ---
odia_text = "ଆଜି ପାଗ ବହୁତ ଭଲ ଅଛି।"
# We prepend the prefix to match training conditions
input_text_1 = PREFIX_ORI_TO_DEU + odia_text

german_translation = translator(
    input_text_1,
    src_lang="ory_Orya",
    tgt_lang="deu_Latn",
    max_length=512
)

print(f"--- Example 1 ---")
print(f"Odia Input: {odia_text}")
print(f"German Output: {german_translation[0]['translation_text']}")

# --- Example 2: Translate German to Odia (Simple) ---
german_text_2 = "Wie ist deine Gesundheit?"
input_text_2 = PREFIX_DEU_TO_ORI + german_text_2

odia_translation_2 = translator(
    input_text_2,
    src_lang="deu_Latn",
    tgt_lang="ory_Orya",
    max_length=512
)

print(f"\n--- Example 2 ---")
print(f"German Input: {german_text_2}")
print(f"Odia Output: {odia_translation_2[0]['translation_text']}")

Note: While the model was trained with task prefixes (translate Odia to German:), using the translation pipeline with src_lang and tgt_lang arguments is the cleaner, recommended method for inference, as it abstracts this detail away.

Intended Use

This model is primarily intended for translating journalistic text between Odia and German. Given its training on articles from various news domains (e.g., National, International, Lifestyle, Sports, Science and Technology), it is suitable for academic research, cross-lingual information retrieval from news sources, and as a supportive tool for language learners focusing on news-related content in this specific language pair.

Limitations & Bias

Domain Specificity: While encompassing various news domains, the model is not optimized for vastly different fields such as legal, medical, literary, or informal conversational text. Its performance is expected to be significantly lower on out-of-domain content outside of journalism.
Data-Inherited Bias: The model inherits stylistic and topical biases from its training data sources. Despite covering multiple news domains, the primary sources are two specific Odia newspapers. Furthermore, the inclusion of Google Translate-generated German lines in a portion of the training data may introduce or reinforce specific stylistic patterns inherent to machine translation outputs.

Achievements with Current Data Constraints

Despite the constraints in computational resources (single-GPU training on NVIDIA A100 via Google Colab Pro) and the relatively small, specialized dataset size (7,352 bidirectional lines), this fine-tuning process has achieved significant positive outcomes:

Substantial Quality Improvement: The fine-tuned model demonstrates a marked improvement over the baseline, particularly evidenced by substantial gains in chrF and significant reductions in TER for both translation directions. This indicates a higher quality of translation that requires less post-editing and exhibits better character-level accuracy, showcasing the effectiveness of fine-tuning even with limited data.
Practical Viability: The results highlight the practical feasibility of developing effective Neural Machine Translation systems for under-resourced language pairs like Odia-German, even when operating with initial data limitations and constrained resources.

Areas for Future Improvement

To further enhance the model's performance, generalizability, and address existing limitations, the following factors are key considerations for future development:

Expanded High-Quality Data: Increasing the size and diversity of the human-validated parallel corpus, particularly from domains beyond journalism, would be crucial for improving robustness and reducing reliance on machine-translated data.
Refined German Corpus Curation: Exploring strategies to further reduce the dependency on machine-translated content for the German side, potentially through more extensive human validation or alternative data acquisition methods.
Addressing Directional Nuances: Further investigation into the specific performance characteristics of each translation direction (e.g., the BLEU score behavior in Odia → German) could lead to targeted optimizations for balanced bidirectional performance.
Advanced Data Augmentation: Exploring more sophisticated data augmentation techniques could effectively expand the training data's diversity without necessarily requiring more manual collection.
Model Architecture & Hyperparameter Optimization: Continued experimentation with different model architectures, fine-tuning strategies, and hyperparameter configurations could yield additional performance gains.
Bias Mitigation: Proactive strategies to identify and mitigate potential biases inherited from the training data sources could improve fairness and broader applicability.

Citation

If you use this model or the associated methodology in your research, please cite the following thesis:

@mastersthesis{SamalThesis2025,
  author = Abhinandan Samal,
  title  = Enhancing Contextual Understanding in Low-Resource Languages Using Multilingual Transformers,
  school = IU International University of Applied Sciences,
  year   = 2025
}

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional

Base model

facebook/nllb-200-distilled-600M

Finetuned

(274)

this model

abhinandansamal
/

nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional

Fully Fine-tuned NLLB Model for Bidirectional Odia ↔ German Translation

Model Details

Fine-tuning Details

Training and Evaluation Data

Training Procedure

Evaluation Results

Interpretation of Results

1. `Odia → German` (Generating the High-Resource Language)

2. `German → Odia` (Generating the Low-Resource Language)

Summary

How to Use

Intended Use

Limitations & Bias

Achievements with Current Data Constraints

Areas for Future Improvement

Citation

Model tree for abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional

Space using abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional 1

Fully Fine-tuned NLLB Model for Bidirectional Odia ↔ German Translation

Model Details

Fine-tuning Details

Training and Evaluation Data

Training Procedure

Evaluation Results

Interpretation of Results

1. Odia → German (Generating the High-Resource Language)

2. German → Odia (Generating the Low-Resource Language)

Summary

How to Use

Intended Use

Limitations & Bias

Achievements with Current Data Constraints

Areas for Future Improvement

Citation

Model tree for abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional

Space using abhinandansamal/nllb-200-distilled-600M-full-finetuned-odia-german-bidirectional 1

1. `Odia → German` (Generating the High-Resource Language)

2. `German → Odia` (Generating the Low-Resource Language)