model seems to run really slowly on new blackwell architecture

by princemjp - opened Jul 27, 2025

Jul 27, 2025

Was testing this model with my 5090 and results in 9 second inference for a 3000 token video.

With a 4090 the same inference took 3 seconds....

But when testing the original 7B model from qwen directly, i can achieve better inference timings. Checked that bnb supports blackwell and unsloth also supports blackwell at this stage with cuda 12.8.... so im not sure where im getting this issue from

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment