leesplank-noot-llama-3.2-3b-dpo-finetuned — Dutch text simplification (B1) with DPO
What is this? A Dutch text-simplification model tuned towards clear, accessible B1-level language in the Leesplank Noot style (clear and simple, but not childish).
This repo contains a merged checkpoint (base model + trained adapter merged into one set of weights), so you can load it like a regular transformers model.
Quick start (Transformers)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
prompt = "Vereenvoudig: Indien u zich toch in dit gebied begeeft, bent u verplicht dit te melden aan de lokale autoriteiten."
inputs = tok(prompt, return_tensors="pt").to(mdl.device)
out = mdl.generate(**inputs, max_new_tokens=120, do_sample=False)
print(tok.decode(out[0], skip_special_tokens=True))
Recommended prompting (Leesplank style)
System:
Je bent een AI-assistent die Nederlandse teksten vereenvoudigt naar een helder, toegankelijk niveau voor iedereen, vergelijkbaar met de heldere taal die het Jeugdjournaal gebruikt. Behoud de betekenis en belangrijke informatie, maar gebruik eenvoudigere woorden en kortere zinnen. Schrijf niet kinderlijk, maar wel toegankelijk.
User:
Vereenvoudig:
If this repo includes a chat_template.jinja, you can use that as well.
What changed vs the base model?
Base model: UWV/leesplank-noot-llama-3.2-3b
On top of the base model, preference alignment with DPO was applied on prompt/chosen/rejected triplets. This teaches the model to prefer:
- meaning preservation (faithfulness)
- retaining key information
- clear language (shorter sentences, simpler words)
…over common failure modes such as information loss, meaning drift, hallucinations, and overgeneralization.
Training data
Trained on a Dutch DPO preference dataset with columns:
prompt: original sentence/textchosen: preferred simplificationrejected: less preferred (errorful/weaker) simplificationerror_type(optional): label describing the type of error inrejected
Dataset: yusraghazi/INT-Extended-DPO-v1
Training method
DPO (Direct Preference Optimization)
- Trainer:
trl.DPOTrainer beta = 0.1
QLoRA / PEFT (adapter training)
Training used 4-bit quantization (QLoRA) + a LoRA adapter:
- 4-bit:
nf4, double quantization enabled - LoRA:
r=16,lora_alpha=32,lora_dropout=0.05,bias="none",task_type="CAUSAL_LM" - Precision:
bf16if supported, otherwisefp16 - Gradient checkpointing: enabled
Optimization / schedule
- epochs:
2 - learning rate:
5e-5 - batch size:
2(per device) - grad accumulation:
8 - max prompt length:
256 - max sequence length:
512 - scheduler: cosine
- warmup ratio:
0.03 - optimizer:
paged_adamw_32bit - seed:
42
Evaluation
This release was primarily qualitatively evaluated on held-out examples (base vs tuned), compared against chosen.
Recommended next step: add reproducible metrics (e.g., SARI + meaning-preservation checks) on a fixed benchmark set.
Intended use
Good for:
- simplifying Dutch texts to B1 for accessibility
- rewriting letters/customer communication/web copy into clearer Dutch
- plain-language editing while preserving meaning
Not recommended for:
- high-stakes decisions without human review
- medical/legal guidance without expert verification
- domains where nuance loss can cause harm
Limitations & known failure modes
Even after DPO, the model can sometimes:
- oversimplify (loss of nuance)
- subtly shift meaning
- hallucinate small details under weak prompting
- reflect biases present in source data
Keep a human in the loop when accuracy matters.
License & attribution
- Base model license: Apache-2.0 (UWV)
- Training data license: CC BY-NC 4.0 (
INT-Extended-DPO-v1) - This repo is released as CC BY-NC 4.0 to respect the non-commercial constraints of the training data.
Citation
If you use this model, please cite/mention:
- Base model:
UWV/leesplank-noot-llama-3.2-3b - Training dataset:
yusraghazi/INT-Extended-DPO-v1
Contact
Maintainer: Yusraghazi
Feedback: open an issue/discussion on this Hugging Face repo.
- Downloads last month
- 2
Model tree for yusraghazi/leesplank-noot-llama-3.2-3b-dpo-finetuned
Base model
UWV/leesplank-noot-llama-3.2-3b