This quantization model is amzing

by hyunw55 - opened Mar 10

•

I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to me

Do you have any plan for 35B A3B mxfp quant model?

3090x4 with power limit 150w

vllm-qwen35           | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%

jellyjel

Mar 11

Totally agree with you @hyunw55 . I need 35B A3B mxfp quant model too :)

olka-fi

Owner Mar 12

Thanks! I’ll upload it in next 2 days

triakmk

Mar 12

Can you make nemotron super 3 120b with mxfp4 quantization, please?

olka-fi

Owner Mar 12

Absolutely - can expect next 24 hours :)

SongXiaoMao

23 days ago

I tried qunattrio awq, and official GPTQ-INT4
but this model is very fit to me (3090 quad)
speed and accuracy are nice to me

Do you have any plan for 35B A3B mxfp quant model?

3090x4 with power limit 150w
vllm-qwen35           | (APIServer pid=1) INFO 03-10 07:59:22 [loggers.py:259] Engine 000: Avg prompt throughput: 519.4 tokens/s, Avg generation throughput: 133.8 tokens/s, Running: 16 reqs, Waiting: 4 reqs, GPU KV cache usage: 63.1%, Prefix cache hit rate: 19.7%

185w 速度会提升点

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment