model seems to run really slowly on new blackwell architecture

#2
by princemjp - opened

Was testing this model with my 5090 and results in 9 second inference for a 3000 token video.

With a 4090 the same inference took 3 seconds....

But when testing the original 7B model from qwen directly, i can achieve better inference timings. Checked that bnb supports blackwell and unsloth also supports blackwell at this stage with cuda 12.8.... so im not sure where im getting this issue from

Sign up or log in to comment