Request higher quants
fanstastic work ! thank you very much , everything works well , I'd say on par with 27B Q4 but much much faster
I have a bunch of requests
Q5, Q6, Q8 version
GGUF models
We have flipped the way the new quantization method works, you specify your memory budget, and we create the optimal quantization version to support that memory footprint. So if you have a 64GB machine this version works great.
If you let us know what your memory budget is, we will create those versions for the community.
I have to say, I'm very grateful for this specific size... are the MTP heads still inside the model (for future use, when the feature finally arrives)? βοΈπ
Hey, we do not touch the MTP heads, clearly there is some optimization that could be done to shrink the model even smaller, but we want to keep expected functionality for users that might use them in the future.
Glad you are finding it useful, we will throw up versions that support 128GB, 192GB, and 256GB unified ram later today, each version is smarter than the previous.
We have flipped the way the new quantization method works, you specify your memory budget, and we create the optimal quantization version to support that memory footprint. So if you have a 64GB machine this version works great.
If you let us know what your memory budget is, we will create those versions for the community.
sure , i think i'd love to see a
48GB qwen 3.5 122B A10B GGUF , whatever highest quant possible i don't mind it
GLM 5.1 , 48GB & 96GB, GGUF & MLX if that is possible , will be a dream come true
You won't be able to get down to even 96GB for GLM5.1, it collapses in quality at anything below 270GB.