YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SageAttention 2.2.0 β€” RTX 5090 Blackwell (Windows)

Pre-built SageAttention 2.2.0 for NVIDIA RTX 5090 (sm_120 Blackwell) on Windows. INT8 quantized attention achieving 2-3x speedup over FlashAttention2 without accuracy loss.

Wheel

sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl

Requirements

Component Version
GPU NVIDIA RTX 50 series (sm_120 Blackwell)
OS Windows 10/11 x64
Python 3.12
PyTorch 2.12.0 nightly cu128
Triton 3.6.0

Install PyTorch

pip install torch==2.12.0.dev20260402+cu128 --index-url https://download.pytorch.org/whl/nightly/cu128
pip install triton==3.6.0

Installation

pip install sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps

Important: Always use --no-deps to prevent pip from overwriting your PyTorch installation.

Verify

import torch
print(torch.__version__)  # 2.12.0.dev...

from sageattention import sageattn_varlen
print("SageAttention 2.2.0 OK")

Features

  • sageattn_varlen: Variable-length attention (used by SeedVR2, WanVideo, etc.)
  • INT8 QK quantization with per-thread granularity
  • FP16 PV accumulation for accuracy
  • Supports sm_80 (Ampere), sm_89 (Ada), sm_120 (Blackwell) via included CUDA kernels + Triton fallback
  • Native varlen API β€” no reshaping overhead

Performance (SeedVR2 7B fp16, RTX 5090)

Resolution DiT Time Total VRAM
4K (2160p) 2.59s 5.7s 19.1GB
12MP (3000p) 4.68s 9.2s 22.4GB

Build from Source

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention

# Open fresh CMD, activate MSVC first
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"

set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1
set TORCH_CUDA_ARCH_LIST=12.0
set DISTUTILS_USE_SDK=1
set MAX_JOBS=4

Required PyTorch Header Patches

Add #ifdef __CUDACC__ guards to 3 PyTorch header files to prevent CUDA/C++ compilation conflicts:

  1. torch/include/torch/csrc/dynamo/compiled_autograd.h β€” fixes C2872 'std' ambiguous
  2. torch/include/torch/csrc/autograd/custom_function.h β€” fixes incomplete PackedArgs type
  3. torch/include/torch/csrc/autograd/_functions.h β€” fixes missing autograd::Function

Required Source Patches

Remove torch/torch.h and torch/cuda.h includes from SA source .cu and .cuh files (replace with lighter headers).

Build

pip install . --no-build-isolation

Known Issues

  • PyTorch ABI: Built against PyTorch 2.12 nightly. Different PyTorch versions may cause ImportError: DLL load failed. Rebuild from source if needed.
  • PATH overflow: vcvars64.bat adds ~2000 chars to PATH each call. Use a fresh CMD window to avoid "The input line is too long" errors.

Source

thu-ml/SageAttention v2.2.0

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support