SageAttention 3 — Precompiled Wheel for RTX 5090 (Blackwell)

This repository provides a precompiled Python wheel (.whl) of SageAttention 3 built specifically for the NVIDIA RTX 5090 (Blackwell architecture), CUDA 13.0, and PyTorch 2.10.0 on Linux.

🚀 Why RTX 5090 and NVFP4?

The NVIDIA RTX 50-series (Blackwell sm_120 architecture) introduces hardware-level support for NVFP4 (4-bit floating-point precision) via its new Tensor Cores.

If you are running heavy models (like Flux.1 or Wan 2.2) in ComfyUI or native PyTorch, utilizing NVFP4 instead of standard FP8 or FP16 gives you:

~2x Speedup in generation times.
Massive VRAM savings, allowing you to run 30B+ parameter models on a single 32GB GPU.
Zero perceived quality loss compared to FP8.

SageAttention 3 is explicitly optimized for these Blackwell NVFP4 capabilities, drastically outperforming standard scaled dot-product attention (SDPA) or older versions of FlashAttention/SageAttention.

📦 Quick Install

If your environment matches the requirements (Python 3.12, CUDA 13.0, PyTorch 2.10.0+cu130, Linux), you can install this wheel directly without compiling:

pip install [https://huggingface.co/Seryoger/Sageattention-3-cu130-5090-endpoint/resolve/main/sageattn3-1.0.0-cp312-cp312-linux_x86_64.whl](https://huggingface.co/Seryoger/Sageattention-3-cu130-5090-endpoint/resolve/main/sageattn3-1.0.0-cp312-cp312-linux_x86_64.whl)

🛠️ How to Compile it Yourself (RunPod Guide)

If you want to build this wheel from scratch (or update to a newer commit), here is the exact step-by-step guide to doing it on a Serverless GPU provider like RunPod.

Step 1: Deploy a Builder Pod

Go to RunPod -> Pods and click Deploy.

Select an RTX 5090 GPU.

Click Customize Deployment and set the following:

Container Image: nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04 (Must be the devel image to include the nvcc compiler).

Start Command: sleep infinity (Prevents the container from exiting immediately).

Container Disk: 20 GB

Deploy the Pod.

Step 2: Compile the Source Code

Once the Pod is running, click Connect -> Web Terminal and paste this entire bash script. It will set up the environment, patch a known Docker/CUDA compatibility error (Error 804), and compile the wheel.

Bash
# 1. Update system and install basic build tools
apt-get update && apt-get install -y python3.12 python3.12-dev python3.12-venv build-essential git wget
ln -sf /usr/bin/python3.12 /usr/bin/python

# 2. Install the blazing-fast 'uv' package manager
wget -qO- [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh
export PATH="/root/.local/bin:${PATH}"

# 3. Create a virtual environment
uv venv /opt/venv
export PATH="/opt/venv/bin:${PATH}"

# 4. Install PyTorch 2.10.0+cu130 and build dependencies
uv pip install torch==2.10.0 --index-url [https://download.pytorch.org/whl/cu130](https://download.pytorch.org/whl/cu130)
uv pip install ninja packaging wheel einops numpy

# 5. Clone the SageAttention repository
git clone [https://github.com/thu-ml/SageAttention.git](https://github.com/thu-ml/SageAttention.git)
cd SageAttention/sageattention3_blackwell

# 6. Fix "Error 804: forward compatibility" (Disables hardware polling)
sed -i 's/cc_major, cc_minor = torch.cuda.get_device_capability()/cc_major, cc_minor = 12, 0/g' setup.py

# 7. Set Blackwell architecture flag (sm_120) and build!
export TORCH_CUDA_ARCH_LIST="10.0;12.0"
python setup.py bdist_wheel
Compilation will take about 5-10 minutes. Ignore the C++ warnings.

Step 3: Download your Compiled Wheel

Once finished, your .whl file will be located in /SageAttention/sageattention3_blackwell/dist/. Here are two ways to download it to your local machine:

Option A: Using RunPod Cloud Sync (Google Cloud Bucket / AWS S3)

If you have connected your cloud storage in RunPod settings:

Copy the file to the default workspace directory:

Bash

mkdir -p /workspace
cp /SageAttention/sageattention3_blackwell/dist/*.whl /workspace/

In the RunPod UI, click Cloud Sync on your Pod.

Select your Bucket, ensure the "Path to copy from" is set to /workspace, and click Copy to Cloud Storage.

Option B: Using a Terminal File Sharing Service If you don't have cloud storage linked, you can upload it directly from the terminal to get a temporary download link:

Bash

curl --upload-file /SageAttention/sageattention3_blackwell/dist/sageattn3-1.0.0-cp312-cp312-linux_x86_64.whl [https://transfer.sh/sageattn3.whl](https://transfer.sh/sageattn3.whl)

Copy the URL outputted in the console, paste it into your local browser, and download the file.

⚠️ Don't forget to Terminate your Pod after downloading to stop billing!

Plans

Quantize some models by official nvidia guide https://build.nvidia.com/spark/nvfp4-quantization/instructions if its realy matter, test on nvfp4 klein first

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support