NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Paper • 2504.07749 • Published • 1
GGUF quantized versions of m51Lab-NorskGemma4-31B for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible tools.
NorEval score: 0.836 — Norway's top-scoring open-source model.
| Model | Params | NorEval Avg |
|---|---|---|
| m51Lab-NorskGemma4-31B | 31B | 0.836 |
| m51Lab-NorskMistral-119B | 119B MoE | 0.764 |
| NorMistral-11B-thinking | 11B | 0.731 |
See full model card for complete benchmark details and training methodology.
| File | Quant | Size | RAM needed | Description |
|---|---|---|---|---|
NorskGemma4-31B-Q4_K_M.gguf |
Q4_K_M | 18 GB | ~22 GB | Recommended — good balance of quality and speed |
NorskGemma4-31B-Q8_0.gguf |
Q8_0 | 31 GB | ~35 GB | High quality, near-lossless |
NorskGemma4-31B-F16.gguf |
F16 | 58 GB | ~62 GB | Full precision, not quantized |
# Download
huggingface-cli download dervig/m51Lab-NorskGemma4-31B-GGUF \
NorskGemma4-31B-Q4_K_M.gguf --local-dir .
# Run (GPU accelerated)
./llama-cli -m NorskGemma4-31B-Q4_K_M.gguf \
-p "Kva er hovudstaden i Noreg?" \
-n 256 -ngl 99 -c 4096 --jinja
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./NorskGemma4-31B-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM Du er ein hjelpsom norsk AI-assistent.
EOF
# Create and run
ollama create norskgemma4 -f Modelfile
ollama run norskgemma4 "Forklar kva Stortinget er."
Tested with llama.cpp on 1x NVIDIA H100:
| Quant | Prompt speed | Generation speed |
|---|---|---|
| Q4_K_M | 138 t/s | 66 t/s |
| Q8_0 | 202 t/s | 56 t/s |
| F16 | 7 t/s | 40 t/s |
Built by m51.ai Lab. Based on Google Gemma 4 31B-it, fine-tuned with data from NbAiLab, evaluated on NorEval by LTG, University of Oslo.
4-bit
8-bit
16-bit