GGUF available β€” Cerebellum v3 (11 GB, ablation-guided mixed-precision)

#1
by deucebucket - opened

For those looking for a GGUF version to run in llama.cpp / ollama:

deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v3-GGUF β€” 11 GB

This is an ablation-guided mixed-precision quant of the bf16 base model (not a conversion of NVFP4). We ran 90 individual ablation experiments to measure which tensors are sensitive vs. tolerant and assigned precision accordingly β€” 99 per-tensor overrides on top of Q3_K_M with bartowski's imatrix.

Benchmarks (same hardware, RTX 3090):

Benchmark Cerebellum v3 (11 GB) Q3_K_M (13 GB) Q4_K_M (16 GB)
WikiText PPL 19,826 42,369 27,362
HumanEval pass@1 67.1% 62.2% 59.8%
ARC-Challenge 95.5% 95.2% 96.7%

Fits a 24 GB GPU with room for 4K context. Method details in the model card.

Sign up or log in to comment