leesplank-noot-llama-3.2-3b-dpo-finetuned — Dutch text simplification (B1) with DPO

What is this? A Dutch text-simplification model tuned towards clear, accessible B1-level language in the Leesplank Noot style (clear and simple, but not childish).

This repo contains a merged checkpoint (base model + trained adapter merged into one set of weights), so you can load it like a regular transformers model.


Quick start (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

prompt = "Vereenvoudig: Indien u zich toch in dit gebied begeeft, bent u verplicht dit te melden aan de lokale autoriteiten."
inputs = tok(prompt, return_tensors="pt").to(mdl.device)

out = mdl.generate(**inputs, max_new_tokens=120, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))

Recommended prompting (Leesplank style)

System:

Je bent een AI-assistent die Nederlandse teksten vereenvoudigt naar een helder, toegankelijk niveau voor iedereen, vergelijkbaar met de heldere taal die het Jeugdjournaal gebruikt. Behoud de betekenis en belangrijke informatie, maar gebruik eenvoudigere woorden en kortere zinnen. Schrijf niet kinderlijk, maar wel toegankelijk.

User:

Vereenvoudig:

If this repo includes a chat_template.jinja, you can use that as well.


What changed vs the base model?

Base model: UWV/leesplank-noot-llama-3.2-3b

On top of the base model, preference alignment with DPO was applied on prompt/chosen/rejected triplets. This teaches the model to prefer:

  • meaning preservation (faithfulness)
  • retaining key information
  • clear language (shorter sentences, simpler words)

…over common failure modes such as information loss, meaning drift, hallucinations, and overgeneralization.


Training data

Trained on a Dutch DPO preference dataset with columns:

  • prompt: original sentence/text
  • chosen: preferred simplification
  • rejected: less preferred (errorful/weaker) simplification
  • error_type (optional): label describing the type of error in rejected

Dataset: yusraghazi/INT-Extended-DPO-v1


Training method

DPO (Direct Preference Optimization)

  • Trainer: trl.DPOTrainer
  • beta = 0.1

QLoRA / PEFT (adapter training)

Training used 4-bit quantization (QLoRA) + a LoRA adapter:

  • 4-bit: nf4, double quantization enabled
  • LoRA: r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
  • Precision: bf16 if supported, otherwise fp16
  • Gradient checkpointing: enabled

Optimization / schedule

  • epochs: 2
  • learning rate: 5e-5
  • batch size: 2 (per device)
  • grad accumulation: 8
  • max prompt length: 256
  • max sequence length: 512
  • scheduler: cosine
  • warmup ratio: 0.03
  • optimizer: paged_adamw_32bit
  • seed: 42

Evaluation

This release was primarily qualitatively evaluated on held-out examples (base vs tuned), compared against chosen.

Recommended next step: add reproducible metrics (e.g., SARI + meaning-preservation checks) on a fixed benchmark set.


Intended use

Good for:

  • simplifying Dutch texts to B1 for accessibility
  • rewriting letters/customer communication/web copy into clearer Dutch
  • plain-language editing while preserving meaning

Not recommended for:

  • high-stakes decisions without human review
  • medical/legal guidance without expert verification
  • domains where nuance loss can cause harm

Limitations & known failure modes

Even after DPO, the model can sometimes:

  • oversimplify (loss of nuance)
  • subtly shift meaning
  • hallucinate small details under weak prompting
  • reflect biases present in source data

Keep a human in the loop when accuracy matters.


License & attribution

  • Base model license: Apache-2.0 (UWV)
  • Training data license: CC BY-NC 4.0 (INT-Extended-DPO-v1)
  • This repo is released as CC BY-NC 4.0 to respect the non-commercial constraints of the training data.

Citation

If you use this model, please cite/mention:

  • Base model: UWV/leesplank-noot-llama-3.2-3b
  • Training dataset: yusraghazi/INT-Extended-DPO-v1

Contact

Maintainer: Yusraghazi
Feedback: open an issue/discussion on this Hugging Face repo.

Downloads last month
2
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned

Adapter
(1)
this model