--- license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license base_model: - google/gemma-4-31B-it - google/gemma-4-31B-it-assistant library_name: mlx tags: - mlx - gemma4 - mtplx - speculative-decoding - apple-silicon - text-generation pipeline_tag: text-generation --- # Gemma4 MTPLX Optimized Quality This is an **MTPLX pair bundle** for Gemma 4 31B speculative decoding on Apple Silicon. It is not a single vanilla Transformers model directory. The repository contains two MLX-format artifacts: - `target/` - Gemma 4 31B IT target, MLX Q8 affine group-size 64 - `assistant/` - official Gemma 4 31B assistant drafter, MLX Q8 affine group-size 64 Use this pair when target precision and high acceptance are the priority. ## Source - Target source: `google/gemma-4-31B-it` - Target revision: `145dc2508c480a64b47242f160d286cff94a2343` - Assistant source: `google/gemma-4-31B-it-assistant` - Assistant revision: `cffbbd2cea41ea56a0fa5b0487e0d445121fd204` Both artifacts were converted locally to MLX format. ## Quantization Target: ```text bits: 8 group_size: 64 mode: affine ``` Assistant: ```text bits: 8 group_size: 64 mode: affine ``` ## MTPLX Usage After downloading this repository, point MTPLX at the two subdirectories: ```bash mtplx bench gemma-mtp \ --target-model ./target \ --assistant-model ./assistant \ --prompt-suite mtplx/benchmarks/prompts/flappy.jsonl \ --max-tokens 1000 \ --draft-block-sizes 6 \ --allow-unverified-gemma ``` The Gemma 4 assistant is a separate drafter model. MTPLX uses exact speculative sampling with target verification and residual correction. ## Local Benchmark Prompt: single-file HTML5 Canvas Flappy Bird game, capped at 1000 generated tokens. Sampler: ```text temperature: 1.0 top_p: 0.95 top_k: 64 seed: 0 ``` Best observed block size: ```text block_size: 6 acceptance: 833 / 835 = 99.76% speedup_vs_ar: 2.49x ``` Observed MTPLX throughput samples: ```text 34.22 tok/s 32.88 tok/s 33.12 tok/s ``` The bundled benchmark JSON file is in `benchmarks/`. ## Notes This release is optimized for target precision and high acceptance. It is not the fastest absolute-TPS pair; for speed, use `Youssofal/Gemma4-MTPLX-Optimized-Speed`. Gemma 4 is released by Google under the Gemma 4 license terms linked above.