Gemma-4-31B-it — RAM 8bit (MLX)

A quantized build of google/gemma-4-31B-it produced by baa.ai. Retains the full vision tower, unlike other pre-quantized MLX variants of this model.

Property Value
Size on disk 32.59 GB
Format MLX
Base model google/gemma-4-31B-it
Vision tower Retained

Usage

from mlx_vlm import load, generate

model, processor = load("baa-ai/Gemma-4-31B-it-RAM-8bit-MLX")
prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}],
    add_generation_prompt=True, tokenize=False,
)
result = generate(model, processor, prompt, max_tokens=512, verbose=True)
print(result.text)

Benchmark results

Measured on vanilla MMLU (80 questions) and MathVision MCQ (20 questions). Every RAM variant is smaller than the corresponding unsloth MLX variant.

Unsloth Gemma 4 MLX variants strip the vision tower — they cannot process images.

Family Bits Unsloth Size RAM Size Unsloth MMLU RAM MMLU GAP RAM Vision
26B-A4B 8bit 26.81 GB 26.52 GB 56.2% 90.0% +33.8 35.0%
26B-A4B 4bit 14.96 GB 14.63 GB 55.0% 88.8% +33.8 50.0%
31B 8bit 32.62 GB 32.59 GB 70.7% 90.0% +19.3 55.0%
31B 4bit 22.12 GB 22.09 GB 73.8% 86.2% +12.4 60.0%
31B 3bit 19.52 GB 19.49 GB 72.5% 86.2% +13.7 50.0%

License

Inherited from the upstream Gemma license.


baa.ai

Downloads last month
-
Safetensors
Model size
11B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baa-ai/Gemma-4-31B-it-RAM-8bit-MLX

Quantized
(107)
this model

Collection including baa-ai/Gemma-4-31B-it-RAM-8bit-MLX