You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

translategemma-4b-it-nb-nn

A fine-tuned Gemma 3 4B Instruct model for translating Norwegian Bokmål (nb) to Norwegian Nynorsk (nn) intended for deployment testing by the NB-ASR beta program.

Uploaded: 07-04-2026

The immediate purpose of this release is to support:

-reproducible beta evaluation,

-loading and inference validation in realistic environments,

-and packaging of a reviewed checkpoint for Hugging Face distribution.

Confidential beta release: this model card and the associated weights are intended for approved evaluators and collaborators. Treat the checkpoint as beta material rather than a public production release.

Model Description

This model was fine-tuned from google/gemma-3-4b-it on the NbAiLab/merged_npk_ndla_parallel_paragraphs dataset.

Intended Use

Primary use: Translating Norwegian Bokmål text to Norwegian Nynorsk
Language pair: nb → nn

Training Data

The model was trained on NbAiLab/merged_npk_ndla_parallel_paragraphs, a merged corpus of parallel Bokmål–Nynorsk paragraphs from NPK and NDLA.

Training Details

Parameter	Value
Base model	google/gemma-3-4b-it
Epochs	3
Global steps	46,728
Precision	bfloat16
Optimizer	AdamW (β1=0.9, β2=0.999, ε=1e-8)
Weight decay	—
Warmup ratio	0.1
Eval strategy	Every 2,000 steps
Dataloader workers	4
Train samples/sec	103.05
Train runtime	~8.1 hours

Evaluation Results

Evaluated on two test sets at the end of training (epoch 3):

NbAiLab Test Set (in-domain)

Metric	Score
BLEU	89.02
chrF	95.37
Loss	0.0750

Tatoeba nb→nn (out-of-domain)

Metric	Score
BLEU	72.20
chrF	85.68
Loss	0.4106

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NbAiLab/translategemma-4b-it-nb-nn"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

def translate_nb_to_nn(text: str) -> str:
    prompt = (
        "<start_of_turn>user\n"
        "You are a professional Norwegian (no) to Norwegian Nynorsk (nn) translator. "
        "Your goal is to accurately convey the meaning and nuances of the original Norwegian text while "
        "adhering to Norwegian Nynorsk grammar, vocabulary, and cultural sensitivities. Produce only the "
        "Norwegian Nynorsk translation, without any additional explanations or commentary. Please translate "
        "the following Norwegian text into Norwegian Nynorsk:\n\n\n"
        f"{text}<end_of_turn>\n"
        "<start_of_turn>model\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=512)
    # Decode only the newly generated tokens (skip the prompt)
    generated = outputs[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(generated, skip_special_tokens=True).strip()

text = "Dette er en setning på bokmål som skal oversettes til nynorsk."
print(translate_nb_to_nn(text))