📄 Technical Report (SSRN) | 📊 NOR-CASEHOLD Benchmark | 💻 GitHub
Norwegian Legal BERT
A domain-adapted BERT model for Norwegian legal text, built by continuing masked language model pretraining of NbAiLab/nb-bert-base on 9,140 Norwegian legal documents.
To the author's knowledge, this is the first open-source domain-adapted legal language model for Norwegian.
Model Details
- Base model: NbAiLab/nb-bert-base
- Architecture: BERT-base (178M parameters, 119,547 token vocabulary)
- Training corpus: 9,140 Norwegian legal documents (~32.7M whitespace tokens)
- Sources: domstol.no (Supreme Court decisions), data.stortinget.no (parliamentary documents), lovdata.no (statutory texts)
- Training: Continued MLM pretraining, lr=1e-5, seq_len=512, batch_size=64, ~8,950 steps on A100 GPU
Fill-Mask: Legal BERT vs General NB-BERT
Norwegian Legal BERT predicts domain-appropriate legal terminology where the general NB-BERT model does not.
| Sentence | Norwegian Legal BERT | NB-BERT-base | |
|---|---|---|---|
| ⚖️ | Lagmannsretten forkastet [MASK] over tingrettens dom. | anken (88.1%), denne, og, dom, den | anken (35.8%), dom, sin, sitt, tross |
| 📜 | Tiltalte ble dømt for overtredelse av straffeloven § [MASK]. | §317 (20.7%), §282, §281, §323, §239 | ❌ ##veri, ##graf, ##86, ##itte, ##i |
| 💰 | Skattyters [MASK] ble ikke ansett som fradragsberettiget. | tap (64.7%), gjeld, bidrag, arbeid, krav | tap (56.2%), gjeld, bidrag, kapital, porto |
| 🏛️ | Høyesterett kom til at avtalen var i strid med [MASK]. | aml (65.8%), lov (20.3%), asl, kontrakt, art | aml (75.1%), asl (20.6%), asal, lov (0.4%), NL |
Legal BERT dominates on legal procedure (anken), statutory references (§317, §282), and produces coherent legal vocabulary throughout. NB-BERT breaks completely on statutory section numbers and scatters probability across non-legal terms.
from transformers import pipeline
legal = pipeline("fill-mask", model="bendik-eeg-henriksen/norwegian-legal-bert")
general = pipeline("fill-mask", model="NbAiLab/nb-bert-base")
text = "Lagmannsretten forkastet [MASK] over tingrettens dom."
print("Legal BERT:", legal(text, top_k=5))
print("NB-BERT: ", general(text, top_k=5))
MLM Evaluation
Evaluated on 25 held-out legal documents from Sivilombudet and Datatilsynet (not in training data).
| Metric | nb-bert-base | Norwegian Legal BERT |
|---|---|---|
| Top-1 Accuracy | 13.4% | 88.4% |
| Top-5 Accuracy | 22.3% | 95.9% |
| Perplexity | 1,592 | 1.65 |
NOR-CASEHOLD Benchmark Results (v2)
Evaluated on the NOR-CASEHOLD benchmark (233-doc test set, 117 Høyesterett + 116 BFU).
| Method | Type | ROUGE-1 | 95% CI |
|---|---|---|---|
| TF-IDF | Sparse | 47.85 | [46.50, 49.23] |
| BM25 | Sparse | 47.49 | [46.16, 48.82] |
| Norwegian Legal BERT | Dense | 38.40 | [37.19, 39.64] |
| MiniLM | Dense (ST) | 37.47 | [36.28, 38.73] |
| mBERT | Dense | 37.34 | [36.28, 38.41] |
| NB-BERT-base | Dense | 37.28 | [36.18, 38.35] |
Norwegian Legal BERT significantly outperforms NB-BERT-base (p < 0.001) and mBERT (p = 0.001), confirming that domain-specific pretraining improves retrieval performance on Norwegian legal text.
Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bendik-eeg-henriksen/norwegian-legal-bert")
model = AutoModel.from_pretrained("bendik-eeg-henriksen/norwegian-legal-bert")
# Encode legal text
inputs = tokenizer("Høyesterett fant at skattyters fradragsrett var begrenset.", return_tensors="pt")
outputs = model(**inputs)
Acknowledgments
Norwegian Legal BERT builds on NbAiLab/nb-bert-base.
Evaluated using the NOR-CASEHOLD benchmark, which was inspired by ITA-CASEHOLD (Licari et al., ICAIL 2023) and CaseHOLD (Zheng et al., 2021).
Related
- Benchmark: NOR-CASEHOLD
- Code: GitHub
Citation
If you use this model or the NOR-CASEHOLD benchmark in your research, please consider citing:
@misc{eeg-henriksen-2026-norwegian-legal-bert,
title = {Norwegian Legal {BERT}: A Domain-Adapted Language Model for Norwegian Law},
author = {Eeg-Henriksen, Bendik},
year = {2026},
howpublished = {HuggingFace Model Hub},
url = {https://huggingface.co/bendik-eeg-henriksen/norwegian-legal-bert}
}
- Downloads last month
- 308
