Instructions to use Youssofal/Gemma4-MTPLX-Optimized-Quality with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Youssofal/Gemma4-MTPLX-Optimized-Quality with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Youssofal/Gemma4-MTPLX-Optimized-Quality") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Youssofal/Gemma4-MTPLX-Optimized-Quality with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Youssofal/Gemma4-MTPLX-Optimized-Quality" --prompt "Once upon a time"
| { | |
| "format_version": 1, | |
| "name": "Gemma4 MTPLX Optimized Quality", | |
| "variant": "quality", | |
| "layout": { | |
| "target": "target", | |
| "assistant": "assistant" | |
| }, | |
| "source": { | |
| "target_repo": "google/gemma-4-31B-it", | |
| "target_revision": "145dc2508c480a64b47242f160d286cff94a2343", | |
| "assistant_repo": "google/gemma-4-31B-it-assistant", | |
| "assistant_revision": "cffbbd2cea41ea56a0fa5b0487e0d445121fd204" | |
| }, | |
| "target": { | |
| "role": "verifier", | |
| "model_type": "gemma4", | |
| "quantization": { | |
| "bits": 8, | |
| "group_size": 64, | |
| "mode": "affine" | |
| } | |
| }, | |
| "assistant": { | |
| "role": "drafter", | |
| "model_type": "gemma4_assistant", | |
| "quantization": { | |
| "bits": 8, | |
| "group_size": 64, | |
| "mode": "affine" | |
| } | |
| }, | |
| "benchmark": { | |
| "prompt_suite": "flappy", | |
| "max_tokens": 1000, | |
| "temperature": 1.0, | |
| "top_p": 0.95, | |
| "top_k": 64, | |
| "seed": 0, | |
| "best_block_size": 6, | |
| "acceptance": { | |
| "accepted": 833, | |
| "drafted": 835, | |
| "ratio": 0.9976047904191617 | |
| }, | |
| "observed_mtp_tok_s": [ | |
| 34.22416818179891, | |
| 32.87803735799434, | |
| 33.11645340400705 | |
| ], | |
| "speedup_vs_ar": 2.491870791785778 | |
| } | |
| } | |