get_cuda_version in core.sageattn_qk_int8_pv_fp8_cuda causes severe slowdown on Windows
#1
by pamparamm - opened
I believe that output of core.get_cuda_version() should be cached - it causes severe (x10 times performance loss compared to sageattention2 from github) slowdown when using sageattn_qk_int8_pv_fp8_cuda
Good point, that’s a reasonable suggestion. I’ll check and update the code accordingly.
pamparamm changed discussion status to closed