How to get mtplx binary?

#1
by calvarado2004 - opened

I've found

https://github.com/mlx-community/speculative-decoding

but I'm not sure if is equivalent to mtplx binary tool.

Releasing later today! I have not released it yet.

When it is out expect a 2- 2.5x speed increase on temp 0.6.

I'm using this test:

Given this PGN string of a chess game:

1. b3 e5 2. Nf3 h5 3. d4 exd4 4. Nxd4 Nf6 5. f4 Ke7 6. Qd3 d5 7. h4 *

Figure out the current state of the chessboard, create an image in SVG code, also highlight the last move.

This test makes the model to draw from code the SVG position of the chess game described, it is harder than it sounds, it demonstrates if a model starts to drift or forgetting details.

It takes around 13k context tokens for a model to produce and answer. Believe me, quantization erosion is quite real, this test made me stick around to Q8 on MoE or at least to Q6 on dense models.

Your custom Qwen 3.6 27B model did a good job! Almost correct, it forgot to generate a white Rock.

mtplx serve --model Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed --port 8081 --max-tokens 262411 --mtp --depth 3 

But you don't mention which quantization this variant have, is it Q4?

Screenshot 2026-05-09 at 3.04.00 PM

Screenshot 2026-05-09 at 3.07.14 PM

After a second attempt, the model started to show up the quantization erosion issues:

Screenshot 2026-05-09 at 3.14.22 PM

That's why I believe this is Q4.

And this is the right position, generated with a frontier model:

Screenshot 2026-05-09 at 3.22.02 PM

Haha I also use a similar test to build an HTML chess game with an opponent AI to see if it correctly implements stalemate and checkmate rules.

This model is 4 bit with 16 bit MTP heads. I have another varient available at 4.75 bits and I am releasing a 6 and 8 bit varient soon.

Sign up or log in to comment