m51Lab-SeoGemma4-v2-31B-GGUF

GGUF quantizations of m51Lab-SeoGemma4-v2-31B for use with llama.cpp and compatible inference engines.

See the main model card for full details on training, evaluation, and usage.

Available Quantizations

File	Size	Description
`m51Lab-SeoGemma4-v2-31B-F16.gguf`	58 GB	Full precision (F16) — reference quality
`m51Lab-SeoGemma4-v2-31B-Q8_0.gguf`	31 GB	Recommended — best quality with quantization
`m51Lab-SeoGemma4-v2-31B-Q4_K_M.gguf`	14 GB	Good balance — runs on 24GB+ VRAM

Usage with llama-server

llama-server \
  --model m51Lab-SeoGemma4-v2-31B-Q8_0.gguf \
  --host 0.0.0.0 --port 8000 \
  --n-gpu-layers 999 \
  --ctx-size 50000 \
  --jinja \
  --reasoning-format deepseek-legacy

Critical flags:

--jinja — required for Gemma 4 chat template and native function calling
--reasoning-format deepseek-legacy — preserves thinking in content field

Important Disclaimer

This model is released for research and educational purposes. See the main model card for details.

Credits

Base model: google/gemma-4-31B-it by Google DeepMind (Apache 2.0)
Fine-tuning: m51 Lab
Quantization: Produced using llama.cpp by Georgi Gerganov

Downloads last month: 124

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for dervig/m51Lab-SeoGemma4-v2-31B-GGUF

Base model

google/gemma-4-31B-it

Adapter

dervig/m51Lab-SeoGemma4-v2-31B

Quantized

(1)

this model