get_cuda_version in core.sageattn_qk_int8_pv_fp8_cuda causes severe slowdown on Windows

by pamparamm - opened Jun 27, 2025

Jun 27, 2025

•

edited Jun 27, 2025

I believe that output of core.get_cuda_version() should be cached - it causes severe (x10 times performance loss compared to sageattention2 from github) slowdown when using sageattn_qk_int8_pv_fp8_cuda

xiaomingxu1995

Jun 29, 2025

Good point, that’s a reasonable suggestion. I’ll check and update the code accordingly.

pamparamm

Jun 30, 2025

Thanks for a fix! After 3ea20a2, everything seems to work as expected

pamparamm changed discussion status to closed Jun 30, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment