This quantization model is amzing
#1
by hyunw55 - opened
I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to me
Do you have any plan for 35B A3B mxfp quant model?
3090x4 with power limit 150w
vllm-qwen35 | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%
Thanks! Iโll upload it in next 2 days
Can you make nemotron super 3 120b with mxfp4 quantization, please?
Absolutely - can expect next 24 hours :)
I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to meDo you have any plan for 35B A3B mxfp quant model?
3090x4 with power limit 150w
vllm-qwen35 | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%
185w ้ๅบฆไผๆๅ็น