Instructions to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed
Run Hermes
hermes
- MLX LM
How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed", "messages": [ {"role": "user", "content": "Hello"} ] }'
How to get mtplx binary?
I've found
https://github.com/mlx-community/speculative-decoding
but I'm not sure if is equivalent to mtplx binary tool.
Releasing later today! I have not released it yet.
When it is out expect a 2- 2.5x speed increase on temp 0.6.
I'm using this test:
Given this PGN string of a chess game:
1. b3 e5 2. Nf3 h5 3. d4 exd4 4. Nxd4 Nf6 5. f4 Ke7 6. Qd3 d5 7. h4 *
Figure out the current state of the chessboard, create an image in SVG code, also highlight the last move.
This test makes the model to draw from code the SVG position of the chess game described, it is harder than it sounds, it demonstrates if a model starts to drift or forgetting details.
It takes around 13k context tokens for a model to produce and answer. Believe me, quantization erosion is quite real, this test made me stick around to Q8 on MoE or at least to Q6 on dense models.
Your custom Qwen 3.6 27B model did a good job! Almost correct, it forgot to generate a white Rock.
mtplx serve --model Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed --port 8081 --max-tokens 262411 --mtp --depth 3
But you don't mention which quantization this variant have, is it Q4?
After a second attempt, the model started to show up the quantization erosion issues:
That's why I believe this is Q4.
And this is the right position, generated with a frontier model:
Haha I also use a similar test to build an HTML chess game with an opponent AI to see if it correctly implements stalemate and checkmate rules.
This model is 4 bit with 16 bit MTP heads. I have another varient available at 4.75 bits and I am releasing a 6 and 8 bit varient soon.



