YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SpargeAttn 0.1.0 β RTX 5090 Blackwell (Windows)
Pre-built SpargeAttn (Sparse SageAttention2) for NVIDIA RTX 5090 (sm_120 Blackwell) on Windows. Block-sparse attention built on top of SageAttention2 β accelerates models without training.
Wheel
spas_sage_attn-0.1.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl
Requirements
| Component | Version |
|---|---|
| GPU | NVIDIA RTX 50 series (sm_120 Blackwell) |
| OS | Windows 10/11 x64 |
| Python | 3.12 |
| PyTorch | 2.12.0 nightly cu128 |
| Triton | 3.6.0 |
| SageAttention | 2.2.0 (install SA2 wheel first) |
Installation
# Install SA2 first (dependency)
pip install sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps
# Install SpargeAttn
pip install spas_sage_attn-0.1.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps
Important: Always use
--no-depsto prevent pip from overwriting your PyTorch installation.
Verify
from spas_sage_attn import spas_sage2_attn_meansim_cuda
print("SpargeAttn OK")
Usage
from spas_sage_attn import spas_sage2_attn_meansim_cuda
# q, k, v: (batch, heads, seq_len, head_dim) in fp16/bf16
output = spas_sage2_attn_meansim_cuda(
q, k, v,
is_causal=False,
smooth_k=True,
tensor_layout="HND",
output_dtype=q.dtype
)
Performance Note
Benchmarks on SeedVR2 show pure SageAttention 2 is faster than SpargeAttn on this workload. SpargeAttn adds sparse pattern computation overhead that exceeds savings for SeedVR2's window attention pattern. SpargeAttn may provide better results on workloads with naturally sparse attention patterns.
Build from Source
Built alongside SageAttention 2.2.0 from thu-ml/SageAttention. Same build prerequisites and header patches as SA2. See SA2 README for build instructions.
Source
thu-ml/SageAttention β SpargeAttn module (ICML 2025)
License
Apache 2.0