About
Finetuned version of UWV/leesplank-noot-eurollm-1.7b on Dutch municipal texts.
Model Details
- Base Model: UWV/leesplank-noot-eurollm-1.7b
- LoRA Adapter: uaebn/leesplank-municipal-lora
- Model Type: Merged (adapter weights integrated into base model)
- Precision: BFloat16
- Parameters: 1.7B
Training Details
Hyperparameters
- LoRA Rank: 32
- LoRA Alpha: 64
- LoRA Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning Rate: 7e-5
- Optimizer: AdamW (32-bit paged)
- LR Scheduler: Cosine with warmup (100 steps)
- Weight Decay: 0.01
- Gradient Clipping: 1.0
- NEFTune Noise Alpha: 5
- Epochs: 3
- Batch Size: 4 per device
- Gradient Accumulation: 4 steps
- Effective Batch Size: 16
- Training Precision: BFloat16
Training Results
| Metric | Value |
|---|---|
| Final Training Loss | 1.320 |
| Final Validation Loss | 1.251 |
| Best Validation Loss | 1.251 (Step 200) |
| Training Time | 28 minutes 29 seconds |
| Samples/Second | 2.07 |
| Total Steps | 222 |
Training Progress
| Step | Training Loss | Validation Loss | Token Accuracy |
|---|---|---|---|
| 50 | 1.427 | 1.427 | 72.4% |
| 100 | 1.233 | 1.297 | 73.8% |
| 150 | 1.155 | 1.252 | 74.3% |
| 200 | 1.025 | 1.251 | 74.5% |
Usage
Simple Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model
model = AutoModelForCausalLM.from_pretrained(
"uaebn/leesplank-municipal-merged",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("uaebn/leesplank-noot-eurollm-1.7b-municipal")
# Simplify text
text = "De gemeenteraad heeft besloten tot het instellen van een commissie."
messages = [{"role": "user", "content": f"Vereenvoudig: {text}"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=150, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With 4-bit Quantization (Memory Efficient)
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
"uaebn/leesplank-municipal-merged",
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("uaebn/leesplank-noot-eurollm-1.7b-municipal")
Limitations
- Optimized for municipal/government Dutch text
- May not generalize well to other domains (medical, legal, technical)
- Inherits base model limitations (Dutch only, context length 1024 tokens)
- Training dataset size: ~1,100 municipal simplification pairs
Citation
@software{{leesplank_noot_2025,
author = {{UWV InnovatieHub}},
title = {{Leesplank Noot: Dutch Text Simplification}},
year = {{2025}},
publisher = {{HuggingFace}},
url = {{https://huggingface.co/UWV/leesplank-noot-eurollm-1.7b}}
}}
Acknowledgments
Based on UWV Leesplank Noot, which in turn is based on EuroLLM-1.7B.
- Downloads last month
- 3
Model tree for uaebn/leesplank-noot-eurollm-1.7b-municipal-priv
Base model
utter-project/EuroLLM-1.7B Finetuned
UWV/leesplank-noot-eurollm-1.7b