gpqa diamond

by madferit421 - opened May 1, 2025

May 1, 2025

Thanks Christian! I did one benchmark run using evalscope (vLLM with 32K context):

evalscope eval
--eval-type service
--api-key dummy
--api-url http://localhost:8000/v1
--model "Qwen3-32B-AWQ"
--generation-config temperature=0.6,top_p=0.95,top_k=20
--datasets gpqa
--dataset-args '{"gpqa": {"subset_list": ["gpqa_diamond"], "few_shot_num": 0}}'
--eval-batch-size 2

With Qwen/QwQ-32B-AWQ I benchmarked 0.6313 in diamond.

stelterlab

Owner May 4, 2025

Nice. Now Qwen has also released its own AWQ Quants. ;-) I assume they used also AutoAWQ as they patched it themselves:

https://github.com/casper-hansen/AutoAWQ/pull/751

They are also about to add support for the MoE as I have read. Although the project has been adopted by the vllm-project and will not be continued.

As I had the opportunity to run evalscope with your parameters against a non quantized version on a H100 and I was curious, here are the results of my run.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment