GGUF available — Cerebellum v3 (11 GB, ablation-guided mixed-precision)

by deucebucket - opened 7 days ago

For those looking for a GGUF version to run in llama.cpp / ollama:

deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v3-GGUF — 11 GB

This is an ablation-guided mixed-precision quant of the bf16 base model (not a conversion of NVFP4). We ran 90 individual ablation experiments to measure which tensors are sensitive vs. tolerant and assigned precision accordingly — 99 per-tensor overrides on top of Q3_K_M with bartowski's imatrix.

Benchmarks (same hardware, RTX 3090):

Benchmark	Cerebellum v3 (11 GB)	Q3_K_M (13 GB)	Q4_K_M (16 GB)
WikiText PPL	19,826	42,369	27,362
HumanEval pass@1	67.1%	62.2%	59.8%
ARC-Challenge	95.5%	95.2%	96.7%

Fits a 24 GB GPU with room for 4K context. Method details in the model card.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment