You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

translategemma-4b-it-nb-nn

A fine-tuned Gemma 3 4B Instruct model for translating Norwegian Bokmål (nb) to Norwegian Nynorsk (nn) intended for deployment testing by the NB-ASR beta program.

Uploaded: 07-04-2026

The immediate purpose of this release is to support:

-reproducible beta evaluation,

-loading and inference validation in realistic environments,

-and packaging of a reviewed checkpoint for Hugging Face distribution.

Confidential beta release: this model card and the associated weights are intended for approved evaluators and collaborators. Treat the checkpoint as beta material rather than a public production release.

Model Description

This model was fine-tuned from google/gemma-3-4b-it on the NbAiLab/merged_npk_ndla_parallel_paragraphs dataset.

Intended Use

  • Primary use: Translating Norwegian Bokmål text to Norwegian Nynorsk
  • Language pair: nb → nn

Training Data

The model was trained on NbAiLab/merged_npk_ndla_parallel_paragraphs, a merged corpus of parallel Bokmål–Nynorsk paragraphs from NPK and NDLA.

Training Details

Parameter Value
Base model google/gemma-3-4b-it
Epochs 3
Global steps 46,728
Precision bfloat16
Optimizer AdamW (β1=0.9, β2=0.999, ε=1e-8)
Weight decay
Warmup ratio 0.1
Eval strategy Every 2,000 steps
Dataloader workers 4
Train samples/sec 103.05
Train runtime ~8.1 hours

Evaluation Results

Evaluated on two test sets at the end of training (epoch 3):

NbAiLab Test Set (in-domain)

Metric Score
BLEU 89.02
chrF 95.37
Loss 0.0750

Tatoeba nb→nn (out-of-domain)

Metric Score
BLEU 72.20
chrF 85.68
Loss 0.4106

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NbAiLab/translategemma-4b-it-nb-nn"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

def translate_nb_to_nn(text: str) -> str:
    prompt = (
        "<start_of_turn>user\n"
        "You are a professional Norwegian (no) to Norwegian Nynorsk (nn) translator. "
        "Your goal is to accurately convey the meaning and nuances of the original Norwegian text while "
        "adhering to Norwegian Nynorsk grammar, vocabulary, and cultural sensitivities. Produce only the "
        "Norwegian Nynorsk translation, without any additional explanations or commentary. Please translate "
        "the following Norwegian text into Norwegian Nynorsk:\n\n\n"
        f"{text}<end_of_turn>\n"
        "<start_of_turn>model\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512)
    # Decode only the newly generated tokens (skip the prompt)
    generated = outputs[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(generated, skip_special_tokens=True).strip()

text = "Dette er en setning på bokmål som skal oversettes til nynorsk."
print(translate_nb_to_nn(text))

License

This model is subject to the Gemma Terms of Use.

Downloads last month
1
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NbAiLab/translategemma-4b-it-nb-nn

Finetuned
(657)
this model

Collection including NbAiLab/translategemma-4b-it-nb-nn