SageAttention 2.2.0 — RTX 5090 Blackwell (Windows)

Pre-built SageAttention 2.2.0 for NVIDIA RTX 5090 (sm_120 Blackwell) on Windows. INT8 quantized attention achieving 2-3x speedup over FlashAttention2 without accuracy loss.

Wheel

sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl

Requirements

Component	Version
GPU	NVIDIA RTX 50 series (sm_120 Blackwell)
OS	Windows 10/11 x64
Python	3.12
PyTorch	2.12.0 nightly cu128
Triton	3.6.0

Install PyTorch

pip install torch==2.12.0.dev20260402+cu128 --index-url https://download.pytorch.org/whl/nightly/cu128
pip install triton==3.6.0

Installation

pip install sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps

Important: Always use --no-deps to prevent pip from overwriting your PyTorch installation.

Verify

import torch
print(torch.__version__)  # 2.12.0.dev...

from sageattention import sageattn_varlen
print("SageAttention 2.2.0 OK")

Features

sageattn_varlen: Variable-length attention (used by SeedVR2, WanVideo, etc.)
INT8 QK quantization with per-thread granularity
FP16 PV accumulation for accuracy
Supports sm_80 (Ampere), sm_89 (Ada), sm_120 (Blackwell) via included CUDA kernels + Triton fallback
Native varlen API — no reshaping overhead

Performance (SeedVR2 7B fp16, RTX 5090)

Resolution	DiT Time	Total	VRAM
4K (2160p)	2.59s	5.7s	19.1GB
12MP (3000p)	4.68s	9.2s	22.4GB

Build from Source

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention

# Open fresh CMD, activate MSVC first
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"

set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1
set TORCH_CUDA_ARCH_LIST=12.0
set DISTUTILS_USE_SDK=1
set MAX_JOBS=4

Required PyTorch Header Patches

Add #ifdef __CUDACC__ guards to 3 PyTorch header files to prevent CUDA/C++ compilation conflicts:

torch/include/torch/csrc/dynamo/compiled_autograd.h — fixes C2872 'std' ambiguous
torch/include/torch/csrc/autograd/custom_function.h — fixes incomplete PackedArgs type
torch/include/torch/csrc/autograd/_functions.h — fixes missing autograd::Function

Required Source Patches

Remove torch/torch.h and torch/cuda.h includes from SA source .cu and .cuh files (replace with lighter headers).

Build

pip install . --no-build-isolation

Known Issues

PyTorch ABI: Built against PyTorch 2.12 nightly. Different PyTorch versions may cause ImportError: DLL load failed. Rebuild from source if needed.
PATH overflow: vcvars64.bat adds ~2000 chars to PATH each call. Use a fresh CMD window to avoid "The input line is too long" errors.

Source

thu-ml/SageAttention v2.2.0

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support