This quantization model is amzing

#1
by hyunw55 - opened

I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to me

Do you have any plan for 35B A3B mxfp quant model?

3090x4 with power limit 150w

vllm-qwen35           | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%

Totally agree with you @hyunw55 . I need 35B A3B mxfp quant model too :)

Owner

Thanks! Iโ€™ll upload it in next 2 days

Can you make nemotron super 3 120b with mxfp4 quantization, please?

Owner

Absolutely - can expect next 24 hours :)

I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to me

Do you have any plan for 35B A3B mxfp quant model?

3090x4 with power limit 150w

vllm-qwen35           | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%

185w ้€Ÿๅบฆไผšๆๅ‡็‚น

Sign up or log in to comment