YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SageAttention 2.2.0 β RTX 5090 Blackwell (Windows)
Pre-built SageAttention 2.2.0 for NVIDIA RTX 5090 (sm_120 Blackwell) on Windows. INT8 quantized attention achieving 2-3x speedup over FlashAttention2 without accuracy loss.
Wheel
sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl
Requirements
| Component | Version |
|---|---|
| GPU | NVIDIA RTX 50 series (sm_120 Blackwell) |
| OS | Windows 10/11 x64 |
| Python | 3.12 |
| PyTorch | 2.12.0 nightly cu128 |
| Triton | 3.6.0 |
Install PyTorch
pip install torch==2.12.0.dev20260402+cu128 --index-url https://download.pytorch.org/whl/nightly/cu128
pip install triton==3.6.0
Installation
pip install sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps
Important: Always use
--no-depsto prevent pip from overwriting your PyTorch installation.
Verify
import torch
print(torch.__version__) # 2.12.0.dev...
from sageattention import sageattn_varlen
print("SageAttention 2.2.0 OK")
Features
- sageattn_varlen: Variable-length attention (used by SeedVR2, WanVideo, etc.)
- INT8 QK quantization with per-thread granularity
- FP16 PV accumulation for accuracy
- Supports sm_80 (Ampere), sm_89 (Ada), sm_120 (Blackwell) via included CUDA kernels + Triton fallback
- Native varlen API β no reshaping overhead
Performance (SeedVR2 7B fp16, RTX 5090)
| Resolution | DiT Time | Total | VRAM |
|---|---|---|---|
| 4K (2160p) | 2.59s | 5.7s | 19.1GB |
| 12MP (3000p) | 4.68s | 9.2s | 22.4GB |
Build from Source
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
# Open fresh CMD, activate MSVC first
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1
set TORCH_CUDA_ARCH_LIST=12.0
set DISTUTILS_USE_SDK=1
set MAX_JOBS=4
Required PyTorch Header Patches
Add #ifdef __CUDACC__ guards to 3 PyTorch header files to prevent CUDA/C++ compilation conflicts:
torch/include/torch/csrc/dynamo/compiled_autograd.hβ fixes C2872 'std' ambiguoustorch/include/torch/csrc/autograd/custom_function.hβ fixes incomplete PackedArgs typetorch/include/torch/csrc/autograd/_functions.hβ fixes missing autograd::Function
Required Source Patches
Remove torch/torch.h and torch/cuda.h includes from SA source .cu and .cuh files (replace with lighter headers).
Build
pip install . --no-build-isolation
Known Issues
- PyTorch ABI: Built against PyTorch 2.12 nightly. Different PyTorch versions may cause
ImportError: DLL load failed. Rebuild from source if needed. - PATH overflow:
vcvars64.batadds ~2000 chars to PATH each call. Use a fresh CMD window to avoid "The input line is too long" errors.
Source
thu-ml/SageAttention v2.2.0
License
Apache 2.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support