--- license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license base_model: - google/gemma-4-31B-it - google/gemma-4-31B-it-assistant library_name: mlx tags: - mlx - gemma4 - mtplx - speculative-decoding - apple-silicon - text-generation pipeline_tag: text-generation --- # Gemma4 MTPLX Optimized Speed This is an **MTPLX pair bundle** for Gemma 4 31B speculative decoding on Apple Silicon. It is not a single vanilla Transformers model directory. The repository contains two MLX-format artifacts: - `target/` - Gemma 4 31B IT target, MLX Q4 affine group-size 64 - `assistant/` - official Gemma 4 31B assistant drafter, MLX Q6 affine group-size 64 Use this pair when absolute throughput is the priority. ## Source - Target source: `google/gemma-4-31B-it` - Target revision: `145dc2508c480a64b47242f160d286cff94a2343` - Assistant source: `google/gemma-4-31B-it-assistant` - Assistant revision: `cffbbd2cea41ea56a0fa5b0487e0d445121fd204` Both artifacts were converted locally to MLX format. ## Quantization Target: ```text bits: 4 group_size: 64 mode: affine ``` Assistant: ```text bits: 6 group_size: 64 mode: affine ``` ## MTPLX Usage After downloading this repository, point MTPLX at the two subdirectories: ```bash mtplx bench gemma-mtp \ --target-model ./target \ --assistant-model ./assistant \ --prompt-suite mtplx/benchmarks/prompts/flappy.jsonl \ --max-tokens 1000 \ --draft-block-sizes 6 \ --allow-unverified-gemma ``` The Gemma 4 assistant is a separate drafter model. MTPLX uses exact speculative sampling with target verification and residual correction. ## Local Benchmark Prompt: single-file HTML5 Canvas Flappy Bird game, capped at 1000 generated tokens. Sampler: ```text temperature: 1.0 top_p: 0.95 top_k: 64 seed: 0 ``` Best observed block size: ```text block_size: 6 acceptance: 830 / 846 = 98.11% ``` Observed MTPLX throughput samples: ```text 43.56 tok/s 44.46 tok/s 44.07 tok/s ``` The bundled benchmark JSON files are in `benchmarks/`. ## Notes This release is optimized for MTPLX speed experiments. For a higher-precision target, use `Youssofal/Gemma4-MTPLX-Optimized-Quality`. Gemma 4 is released by Google under the Gemma 4 license terms linked above.