m51Lab-SeoGemma4-v2-31B-GGUF
GGUF quantizations of m51Lab-SeoGemma4-v2-31B for use with llama.cpp and compatible inference engines.
See the main model card for full details on training, evaluation, and usage.
Available Quantizations
| File | Size | Description |
|---|---|---|
m51Lab-SeoGemma4-v2-31B-F16.gguf |
58 GB | Full precision (F16) — reference quality |
m51Lab-SeoGemma4-v2-31B-Q8_0.gguf |
31 GB | Recommended — best quality with quantization |
m51Lab-SeoGemma4-v2-31B-Q4_K_M.gguf |
14 GB | Good balance — runs on 24GB+ VRAM |
Usage with llama-server
llama-server \
--model m51Lab-SeoGemma4-v2-31B-Q8_0.gguf \
--host 0.0.0.0 --port 8000 \
--n-gpu-layers 999 \
--ctx-size 50000 \
--jinja \
--reasoning-format deepseek-legacy
Critical flags:
--jinja— required for Gemma 4 chat template and native function calling--reasoning-format deepseek-legacy— preserves thinking in content field
Important Disclaimer
This model is released for research and educational purposes. See the main model card for details.
Credits
- Base model: google/gemma-4-31B-it by Google DeepMind (Apache 2.0)
- Fine-tuning: m51 Lab
- Quantization: Produced using llama.cpp by Georgi Gerganov
- Downloads last month
- 124
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit