Youssofal's picture
Add files using upload-large-folder tool
cc1cd06 verified
metadata
license: apache-2.0
license_link: https://ai.google.dev/gemma/docs/gemma_4_license
base_model:
  - google/gemma-4-31B-it
  - google/gemma-4-31B-it-assistant
library_name: mlx
tags:
  - mlx
  - gemma4
  - mtplx
  - speculative-decoding
  - apple-silicon
  - text-generation
pipeline_tag: text-generation

Gemma4 MTPLX Optimized Speed

This is an MTPLX pair bundle for Gemma 4 31B speculative decoding on Apple Silicon.

It is not a single vanilla Transformers model directory. The repository contains two MLX-format artifacts:

  • target/ - Gemma 4 31B IT target, MLX Q4 affine group-size 64
  • assistant/ - official Gemma 4 31B assistant drafter, MLX Q6 affine group-size 64

Use this pair when absolute throughput is the priority.

Source

  • Target source: google/gemma-4-31B-it
  • Target revision: 145dc2508c480a64b47242f160d286cff94a2343
  • Assistant source: google/gemma-4-31B-it-assistant
  • Assistant revision: cffbbd2cea41ea56a0fa5b0487e0d445121fd204

Both artifacts were converted locally to MLX format.

Quantization

Target:

bits: 4
group_size: 64
mode: affine

Assistant:

bits: 6
group_size: 64
mode: affine

MTPLX Usage

After downloading this repository, point MTPLX at the two subdirectories:

mtplx bench gemma-mtp \
  --target-model ./target \
  --assistant-model ./assistant \
  --prompt-suite mtplx/benchmarks/prompts/flappy.jsonl \
  --max-tokens 1000 \
  --draft-block-sizes 6 \
  --allow-unverified-gemma

The Gemma 4 assistant is a separate drafter model. MTPLX uses exact speculative sampling with target verification and residual correction.

Local Benchmark

Prompt: single-file HTML5 Canvas Flappy Bird game, capped at 1000 generated tokens.

Sampler:

temperature: 1.0
top_p: 0.95
top_k: 64
seed: 0

Best observed block size:

block_size: 6
acceptance: 830 / 846 = 98.11%

Observed MTPLX throughput samples:

43.56 tok/s
44.46 tok/s
44.07 tok/s

The bundled benchmark JSON files are in benchmarks/.

Notes

This release is optimized for MTPLX speed experiments. For a higher-precision target, use Youssofal/Gemma4-MTPLX-Optimized-Quality.

Gemma 4 is released by Google under the Gemma 4 license terms linked above.