Gemma-4-31B-it — RAM 8bit (MLX)

A quantized build of google/gemma-4-31B-it produced by baa.ai. Retains the full vision tower, unlike other pre-quantized MLX variants of this model.

Property	Value
Size on disk	32.59 GB
Format	MLX
Base model	google/gemma-4-31B-it
Vision tower	Retained

Usage

from mlx_vlm import load, generate

model, processor = load("baa-ai/Gemma-4-31B-it-RAM-8bit-MLX")
prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": [{"type": "text", "text": "Hello!"}]}],
    add_generation_prompt=True, tokenize=False,
)
result = generate(model, processor, prompt, max_tokens=512, verbose=True)
print(result.text)

Benchmark results

Measured on vanilla MMLU (80 questions) and MathVision MCQ (20 questions). Every RAM variant is smaller than the corresponding unsloth MLX variant.

Unsloth Gemma 4 MLX variants strip the vision tower — they cannot process images.

Family	Bits	Unsloth Size	RAM Size	Unsloth MMLU	RAM MMLU	GAP	RAM Vision
26B-A4B	8bit	26.81 GB	26.52 GB	56.2%	90.0%	+33.8	35.0%
26B-A4B	4bit	14.96 GB	14.63 GB	55.0%	88.8%	+33.8	50.0%
31B	8bit	32.62 GB	32.59 GB	70.7%	90.0%	+19.3	55.0%
31B	4bit	22.12 GB	22.09 GB	73.8%	86.2%	+12.4	60.0%
31B	3bit	19.52 GB	19.49 GB	72.5%	86.2%	+13.7	50.0%