mlx-community/gemma-4-E2B-it-assistant-bf16

This model was converted to MLX format from google/gemma-4-E2B-it-assistant using mlx-vlm version 0.4.5. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm

Single request — --draft-block-size 6:

python -m mlx_vlm generate \
    --model mlx-community/gemma-4-E2B-it-bf16 \
    --draft-model mlx-community/gemma-4-E2B-it-assistant-bf16 \
    --draft-kind mtp \
    --draft-block-size 6 \
    --prompt "Explain speculative decoding in 3 sentences." \
    --max-tokens 256 --temperature 0

Batched generation — --draft-block-size 3, use batch_generate:

from mlx_vlm.utils import load
from mlx_vlm.generate import batch_generate
from mlx_vlm.speculative.drafters import load_drafter

model, processor = load("mlx-community/gemma-4-E2B-it-bf16")
drafter = load_drafter("mlx-community/gemma-4-E2B-it-assistant-bf16", kind="mtp")

prompts = [
    "Explain speculative decoding in 3 sentences.",
    "What is MLX?",
    "Summarize attention in one paragraph.",
    "List three prime numbers.",
]

response = batch_generate(
    model,
    processor,
    prompts=prompts,
    max_tokens=256,
    temperature=0.0,
    draft_model=drafter,
    draft_kind="mtp",
    draft_block_size=3,
)
for text in response.texts:
    print(text)

About

MLX port of Google's Gemma 4 Multi-Token Prediction (MTP) drafter for speculative decoding. A small 4-layer assistant drafts several candidate tokens per round; the full Gemma 4 target verifies them in a single forward pass. Output is byte-identical to no-drafter at temperature=0.

Recommended --draft-block-size: 6 for single requests, 3 for batched generation.

See the drafter docs for architecture, supported pairings, performance numbers, and caveats.

Downloads last month: 821

Safetensors

Model size

78M params

Tensor type

I64

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/gemma-4-E2B-it-assistant-bf16

Base model

google/gemma-4-E2B-it-assistant

Finetuned

(1)

this model

Collection including mlx-community/gemma-4-E2B-it-assistant-bf16

Gemma-4 Assistant (MTP)

Collection

4 items • Updated 2 days ago • 12