m51Lab-SeoGemma4-v2-31B-GGUF

GGUF quantizations of m51Lab-SeoGemma4-v2-31B for use with llama.cpp and compatible inference engines.

See the main model card for full details on training, evaluation, and usage.

Available Quantizations

File Size Description
m51Lab-SeoGemma4-v2-31B-F16.gguf 58 GB Full precision (F16) — reference quality
m51Lab-SeoGemma4-v2-31B-Q8_0.gguf 31 GB Recommended — best quality with quantization
m51Lab-SeoGemma4-v2-31B-Q4_K_M.gguf 14 GB Good balance — runs on 24GB+ VRAM

Usage with llama-server

llama-server \
  --model m51Lab-SeoGemma4-v2-31B-Q8_0.gguf \
  --host 0.0.0.0 --port 8000 \
  --n-gpu-layers 999 \
  --ctx-size 50000 \
  --jinja \
  --reasoning-format deepseek-legacy

Critical flags:

  • --jinja — required for Gemma 4 chat template and native function calling
  • --reasoning-format deepseek-legacy — preserves thinking in content field

Important Disclaimer

This model is released for research and educational purposes. See the main model card for details.

Credits

Downloads last month
124
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dervig/m51Lab-SeoGemma4-v2-31B-GGUF

Quantized
(1)
this model