Instructions to use Youssofal/Gemma4-MTPLX-Optimized-Speed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Youssofal/Gemma4-MTPLX-Optimized-Speed with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Youssofal/Gemma4-MTPLX-Optimized-Speed") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use Youssofal/Gemma4-MTPLX-Optimized-Speed with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "Youssofal/Gemma4-MTPLX-Optimized-Speed" --prompt "Once upon a time"
any MTP upcoming ?
#1
by lefromage - opened
[looking to any ](model: /Volumes/M2_4TB/huggingface/hub/models--Youssofal--Gemma4-MTPLX-Optimized-Speed/snapshots/cc1cd067badceda6babb46477df8785fed7bf777
tier: no-MTP
runtime: unsupported
reason: Model has no MTP head. MTPLX requires an MTP-equipped model.
try: mtplx inspect MODEL)
Struggling with Gemma support. It should be in soon.