GGUF available β Cerebellum v3 (11 GB, ablation-guided mixed-precision)
#1
by deucebucket - opened
For those looking for a GGUF version to run in llama.cpp / ollama:
deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v3-GGUF β 11 GB
This is an ablation-guided mixed-precision quant of the bf16 base model (not a conversion of NVFP4). We ran 90 individual ablation experiments to measure which tensors are sensitive vs. tolerant and assigned precision accordingly β 99 per-tensor overrides on top of Q3_K_M with bartowski's imatrix.
Benchmarks (same hardware, RTX 3090):
| Benchmark | Cerebellum v3 (11 GB) | Q3_K_M (13 GB) | Q4_K_M (16 GB) |
|---|---|---|---|
| WikiText PPL | 19,826 | 42,369 | 27,362 |
| HumanEval pass@1 | 67.1% | 62.2% | 59.8% |
| ARC-Challenge | 95.5% | 95.2% | 96.7% |
Fits a 24 GB GPU with room for 4K context. Method details in the model card.