Gemma 4
Collection
RAM optimised Gemma 4 models by baa.ai • 11 items • Updated
A quantized build of google/gemma-4-31B-it produced by baa.ai. Retains the full vision tower, unlike other pre-quantized MLX variants of this model.
| Property | Value |
|---|---|
| Size on disk | 32.59 GB |
| Format | MLX |
| Base model | google/gemma-4-31B-it |
| Vision tower | Retained |
from mlx_vlm import load, generate
model, processor = load("baa-ai/Gemma-4-31B-it-RAM-8bit-MLX")
prompt = processor.tokenizer.apply_chat_template(
[{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}],
add_generation_prompt=True, tokenize=False,
)
result = generate(model, processor, prompt, max_tokens=512, verbose=True)
print(result.text)
Measured on vanilla MMLU (80 questions) and MathVision MCQ (20 questions). Every RAM variant is smaller than the corresponding unsloth MLX variant.
Unsloth Gemma 4 MLX variants strip the vision tower — they cannot process images.
| Family | Bits | Unsloth Size | RAM Size | Unsloth MMLU | RAM MMLU | GAP | RAM Vision |
|---|---|---|---|---|---|---|---|
| 26B-A4B | 8bit | 26.81 GB | 26.52 GB | 56.2% | 90.0% | +33.8 | 35.0% |
| 26B-A4B | 4bit | 14.96 GB | 14.63 GB | 55.0% | 88.8% | +33.8 | 50.0% |
| 31B | 8bit | 32.62 GB | 32.59 GB | 70.7% | 90.0% | +19.3 | 55.0% |
| 31B | 4bit | 22.12 GB | 22.09 GB | 73.8% | 86.2% | +12.4 | 60.0% |
| 31B | 3bit | 19.52 GB | 19.49 GB | 72.5% | 86.2% | +13.7 | 50.0% |
Inherited from the upstream Gemma license.
4-bit
Base model
google/gemma-4-31B-it