slower than non-LIMI

by bobig - opened Sep 28, 2025

Sep 28, 2025

Initial TPS speed about 10% slower than same size model: unsloth-glm-4.5-air-mlx MXFP4
LIMI speed seems to drop faster than similar model as context grows.

Maybe the extra LIMI thinking is filling up the context causing speed drop.

nightmedia

Owner Sep 28, 2025

Yup, I noticed the same thing, and uploaded the mxfp4 for compare. As I can’t run tests, model being too big for that for my hardware, vibe check is the only way to test it. The qx quants mitigate the issue somewhat but my take on it is that mxfp4 is too “noisy” for the inference, and the model struggles. The hi quants think less(more self-confidence) and you see some differences in speed

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment