VLLM warning about Using uncalibrated q_scale 1.0 and/or prob_scale 1.0 with fp8 attention

#5
by androiddrew - opened

Is there something missing from the config.json for this?

(Worker_TP0 pid=269) WARNING 04-23 14:48:42 [kv_cache.py:162] Using uncalibrated q_scale 1.0 and/or prob_scale 1.0 with fp8 attention. This may cause accuracy issues. Please make sure q/prob scaling factors are available in the fp8 checkpoint.

VLLM: version 0.20.0rc1
OS: Linux toor-runcible 6.17.0-20-generic #20~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 19 01:28:37 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Devices: 2x AMD Radeon AI Pro 9700 (gfx1201)

Aren't you supposed to calibrate for KV Cache using LLM Compressor?

Sign up or log in to comment