A fun, quick little model!

#1
by phakio - opened

Always like to see competition in the local LLM field. This model is quick, and due to it not being trained as a thinking model, produces a reply much quicker than the Qwen3.5 series. Sure you can disable thinking in the Qwen 3.5 series, but why not try out a different architecture!

Reading the release paper, this model has a little bit of everything, image analysis, text to speech, speech to speech, and this quant is only accessing the text generation side of it, but on my M1 Max Macbook Pro with 64gb ram, I've found that its generation is adequate for my needs, and quick enough to not feel sluggish.

Overall generations are without fluff and to the point. short, concise replies, but model will elaborate and explain things if needed. I like it.

((mlx_env) ) phone@phones-MacBook-Pro longcat-mlx % mlx_lm.generate \
    --model /Users/phone/.lmstudio/models/mlx-community/LongCat-Next-4bit \
    --prompt "Write a very long essay about probiscus monkeys." \
    --max-tokens 5000 \
    --trust-remote-code \
    --verbose VERBOSE
==========


--TRUNCATED TO NOT FILL THIS DISCUSSION WITH MONKEY FACTS--


==========
Prompt: 12 tokens, 2.615 tokens-per-sec
Generation: 2191 tokens, 59.284 tokens-per-sec
Peak memory: 38.674 GB
((mlx_env) ) phone@phones-MacBook-Pro longcat-mlx % 
Prompt: 1046 tokens, 156.851 tokens-per-sec
Generation: 186 tokens, 59.222 tokens-per-sec
Peak memory: 39.457 GB
((mlx_env) ) phone@phones-MacBook-Pro longcat-mlx % 

Sign up or log in to comment