Request higher quants

#2
by hugypufy - opened

fanstastic work ! thank you very much , everything works well , I'd say on par with 27B Q4 but much much faster

I have a bunch of requests

Q5, Q6, Q8 version
GGUF models

baa.ai org

We have flipped the way the new quantization method works, you specify your memory budget, and we create the optimal quantization version to support that memory footprint. So if you have a 64GB machine this version works great.

If you let us know what your memory budget is, we will create those versions for the community.

I have to say, I'm very grateful for this specific size... are the MTP heads still inside the model (for future use, when the feature finally arrives)? βœŒοΈπŸ˜ƒ

baa.ai org

Hey, we do not touch the MTP heads, clearly there is some optimization that could be done to shrink the model even smaller, but we want to keep expected functionality for users that might use them in the future.

Glad you are finding it useful, we will throw up versions that support 128GB, 192GB, and 256GB unified ram later today, each version is smarter than the previous.

baa.ai org

We have flipped the way the new quantization method works, you specify your memory budget, and we create the optimal quantization version to support that memory footprint. So if you have a 64GB machine this version works great.

If you let us know what your memory budget is, we will create those versions for the community.

sure , i think i'd love to see a
48GB qwen 3.5 122B A10B GGUF , whatever highest quant possible i don't mind it
GLM 5.1 , 48GB & 96GB, GGUF & MLX if that is possible , will be a dream come true

baa.ai org

You won't be able to get down to even 96GB for GLM5.1, it collapses in quality at anything below 270GB.

Sign up or log in to comment