Performance on Intel QYFS, 512GB DDR5 and 96GB VRAM
wow talk about a blast from the past! lol I'll redownload it and give it a try with my current workflows. thanks for the update -
just a little update, seems coherent and able to accurately describe and explain entry level medical student questions, which is my current area of study. seems like a great improvement, though admitidly in my initial post I believe I was using @ubergarm Q3 quant, and your Q4_0x quant (so I don't really have a before after reference)...
solid speeds, and able to use 100k context with q8_0 kv quant!
i'll keep using it for a bit, for use cases like mine, general explanations of conecepts as I study, i've always found higher parameter models to be more knowledgeable. I think the recent qwen models are very smart, however they reason a lot, and their outputs have a lot of "filler" text that isn't really needed, and ultimately distracting.
and here's my launch script for the curious ones (wrong alias, the model is indeed this one! lol)
Thank you for the feedback! The new Qwen3.5 models are nice too, but personally K2.5 is my preferred model for writing and chatting as well. Happy to hear that the quant is working nicely!
closing this thread with this definitive comment! one more piece of feedback! the model successfully was able to work in opencode, and create a whole xcode project from scratch and compile it. great quant, will definitely keep it around -
i had it code a native macOS llama.cpp endpoint chat client. simple app, but creating all of the various files and getting it to compile in xcode is no easy task! ui could use work but the app works great, and is only 1mb in size once compiled. (once again, wrong model name in opencode, im too lazy to always change the current active model config lmao)






