Osaurus AI

Gemma 4 26B-A4B-it — MXFP4 (MLX)

Microscaling FP4 quantization with verified vision tower weights

Website  OsaurusAI


Model Details

Property Value
Base Model google/gemma-4-26B-A4B-it
Parameters 26B total, 4B active (Mixture of Experts)
Quantization MXFP4 (Microscaling FP4), mixed-precision
Avg Bits/Weight 4.604
Model Size 14.8 GB
Architecture Gemma 4 (text + vision)
Context Length 128K tokens
Vocabulary 262K tokens

Weight Verification

Every tensor in the vision tower was loaded and checked for max(abs(tensor)) > 0. Zero broken weights found.

Component Tensor Count Status
Vision Tower (SigLIP) 355 All non-zero
Language Model (MoE) 1,135 All non-zero
Total 1,490 All verified

MXFP4 Quantization

MXFP4 (Microscaling FP4) uses block-scaled 4-bit floating point values, offering better precision characteristics than standard affine INT4 quantization at similar model sizes. MLP gate/up/down projections are kept at 8-bit for quality.

Usage

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("OsaurusAI/gemma-4-26B-A4B-it-mxfp4")

# Text
prompt = apply_chat_template(processor, model.config, "Write a haiku about cats.")
output = generate(model, processor, prompt, max_tokens=200)
print(output.text)

# Vision
prompt = apply_chat_template(processor, model.config, "Describe this image.", num_images=1)
output = generate(model, processor, prompt, image="photo.jpg", max_tokens=200)
print(output.text)

Conversion

Converted from google/gemma-4-26B-A4B-it using mlx-vlm v0.4.4:

mlx_vlm.convert --hf-path google/gemma-4-26B-A4B-it \
  --mlx-path gemma-4-26b-a4b-it-mxfp4 \
  -q --q-mode mxfp4 --dtype bfloat16
Downloads last month
1,056
Safetensors
Model size
5B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/gemma-4-26B-A4B-it-mxfp4

Quantized
(149)
this model