leesplank-noot-llama-3.2-3b-dpo-finetuned — Dutch text simplification (B1) with DPO

What is this? A Dutch text-simplification model tuned towards clear, accessible B1-level language in the Leesplank Noot style (clear and simple, but not childish).

This repo contains a merged checkpoint (base model + trained adapter merged into one set of weights), so you can load it like a regular transformers model.

Quick start (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

prompt = "Vereenvoudig: Indien u zich toch in dit gebied begeeft, bent u verplicht dit te melden aan de lokale autoriteiten."
inputs = tok(prompt, return_tensors="pt").to(mdl.device)

out = mdl.generate(**inputs, max_new_tokens=120, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))

Recommended prompting (Leesplank style)

System:

Je bent een AI-assistent die Nederlandse teksten vereenvoudigt naar een helder, toegankelijk niveau voor iedereen, vergelijkbaar met de heldere taal die het Jeugdjournaal gebruikt. Behoud de betekenis en belangrijke informatie, maar gebruik eenvoudigere woorden en kortere zinnen. Schrijf niet kinderlijk, maar wel toegankelijk.

User:

Vereenvoudig:

If this repo includes a chat_template.jinja, you can use that as well.

What changed vs the base model?

Base model: UWV/leesplank-noot-llama-3.2-3b

On top of the base model, preference alignment with DPO was applied on prompt/chosen/rejected triplets. This teaches the model to prefer:

meaning preservation (faithfulness)
retaining key information
clear language (shorter sentences, simpler words)

…over common failure modes such as information loss, meaning drift, hallucinations, and overgeneralization.

Training data

Trained on a Dutch DPO preference dataset with columns:

prompt: original sentence/text
chosen: preferred simplification
rejected: less preferred (errorful/weaker) simplification
error_type (optional): label describing the type of error in rejected

Dataset: yusraghazi/INT-Extended-DPO-v1

Training method

DPO (Direct Preference Optimization)

Trainer: trl.DPOTrainer
beta = 0.1

QLoRA / PEFT (adapter training)

Training used 4-bit quantization (QLoRA) + a LoRA adapter:

4-bit: nf4, double quantization enabled
LoRA: r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
Precision: bf16 if supported, otherwise fp16
Gradient checkpointing: enabled

Optimization / schedule

epochs: 2
learning rate: 5e-5
batch size: 2 (per device)
grad accumulation: 8
max prompt length: 256
max sequence length: 512
scheduler: cosine
warmup ratio: 0.03
optimizer: paged_adamw_32bit
seed: 42

Evaluation

This release was primarily qualitatively evaluated on held-out examples (base vs tuned), compared against chosen.

Recommended next step: add reproducible metrics (e.g., SARI + meaning-preservation checks) on a fixed benchmark set.

Intended use

Good for:

simplifying Dutch texts to B1 for accessibility
rewriting letters/customer communication/web copy into clearer Dutch
plain-language editing while preserving meaning

Not recommended for:

high-stakes decisions without human review
medical/legal guidance without expert verification
domains where nuance loss can cause harm

Limitations & known failure modes

Even after DPO, the model can sometimes:

oversimplify (loss of nuance)
subtly shift meaning
hallucinate small details under weak prompting
reflect biases present in source data

Keep a human in the loop when accuracy matters.

License & attribution

Base model license: Apache-2.0 (UWV)
Training data license: CC BY-NC 4.0 (INT-Extended-DPO-v1)
This repo is released as CC BY-NC 4.0 to respect the non-commercial constraints of the training data.

Citation

If you use this model, please cite/mention:

Base model: UWV/leesplank-noot-llama-3.2-3b
Training dataset: yusraghazi/INT-Extended-DPO-v1

Contact

Maintainer: Yusraghazi
Feedback: open an issue/discussion on this Hugging Face repo.

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

F16

Model tree for yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned

Base model

UWV/leesplank-noot-llama-3.2-3b

Adapter

(1)

this model