Gemma 4
Collection
RAM optimised Gemma 4 models by baa.ai • 11 items • Updated
Mixed-precision quantized version of google/gemma-4-31B-it optimised by baa.ai using a proprietary Black Sheep AI method.
Per-tensor bit-width allocation via advanced sensitivity analysis with adjusted vision encoder allocation.
| Metric | Value |
|---|---|
| Size | 30 GB |
| Average bits | 7.8 |
| MMLU vs BF16 | 100% of BF16 |
from mlx_lm import load, generate
model, tokenizer = load("baa-ai/Gemma-4-31B-it-RAM-30GB-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)
Quantized by baa.ai
4-bit
Base model
google/gemma-4-31B-it