SpargeAttn 0.1.0 — RTX 5090 Blackwell (Windows)

Pre-built SpargeAttn (Sparse SageAttention2) for NVIDIA RTX 5090 (sm_120 Blackwell) on Windows. Block-sparse attention built on top of SageAttention2 — accelerates models without training.

Wheel

spas_sage_attn-0.1.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl

Requirements

Component	Version
GPU	NVIDIA RTX 50 series (sm_120 Blackwell)
OS	Windows 10/11 x64
Python	3.12
PyTorch	2.12.0 nightly cu128
Triton	3.6.0
SageAttention	2.2.0 (install SA2 wheel first)

Installation

# Install SA2 first (dependency)
pip install sageattention-2.2.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps

# Install SpargeAttn
pip install spas_sage_attn-0.1.0+cu131torch2.12.blackwell-cp312-cp312-win_amd64.whl --no-deps

Important: Always use --no-deps to prevent pip from overwriting your PyTorch installation.

Verify

from spas_sage_attn import spas_sage2_attn_meansim_cuda
print("SpargeAttn OK")

Usage

from spas_sage_attn import spas_sage2_attn_meansim_cuda

# q, k, v: (batch, heads, seq_len, head_dim) in fp16/bf16
output = spas_sage2_attn_meansim_cuda(
    q, k, v,
    is_causal=False,
    smooth_k=True,
    tensor_layout="HND",
    output_dtype=q.dtype
)

Performance Note

Benchmarks on SeedVR2 show pure SageAttention 2 is faster than SpargeAttn on this workload. SpargeAttn adds sparse pattern computation overhead that exceeds savings for SeedVR2's window attention pattern. SpargeAttn may provide better results on workloads with naturally sparse attention patterns.

Build from Source

Built alongside SageAttention 2.2.0 from thu-ml/SageAttention. Same build prerequisites and header patches as SA2. See SA2 README for build instructions.

Source

thu-ml/SageAttention — SpargeAttn module (ICML 2025)

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support