license: apache-2.0
library_name: pytorch
tags:
- super-resolution
- diffusion
- pixel-diffusion-decoder
- vae-decoder
pipeline_tag: image-to-image
PiD β Pixel Diffusion Decoder
PiD reformulates the latent-to-pixel decoder as a conditional pixel-space diffusion model, unifying decoding and upsampling into a single generative module. It denoises directly in high-resolution pixel space and produces a super-resolved image in one pass. This repository hosts the released decoder checkpoints, plus the encoder/decoder ("VAE") weights they depend on.
All PiD_* checkpoints in this repo are 4-step distilled. The non-PiD_*
entries (ae.safetensors, flux2_ae.safetensors, sd3_vae/, rae/,
scale_rae/) are the corresponding encoder/decoder VAE weights that PiD
plugs into β they're not PiD checkpoints themselves.
PiD checkpoints
Two variants are released for each diffusers-style backbone:
2kβ trained at 2048px, used as a 4Γ decoder (512 LDM β 2048 px), or as an 8Γ decoder for the Scale-RAE backbone (256 β 2048).2kto4kβ trained with multi-resolution data bucketing 2048β3840 and an SD3-style dynamic shift; designed for 1024 LDM β 4K (3840 px) decoding. Only released for the diffusers backbones.
| Path | Backbone (encoder side) | SR factor | Variant |
|---|---|---|---|
checkpoints/PiD_res2k_sr4x_official_flux_distill_4step |
Flux1-dev (16-ch VAE) | 4Γ | 2k |
checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step |
Flux2-dev (128-ch BN VAE) | 4Γ | 2k |
checkpoints/PiD_res2k_sr4x_official_sd3_distill_4step |
SD3 medium (16-ch VAE) | 4Γ | 2k |
checkpoints/PiD_res2k_sr4x_official_dinov2_distill_4step |
DINOv2-B + RAE ViT-XL (768-ch) | 4Γ | 2k |
checkpoints/PiD_res2k_sr8x_official_siglip_distill_4step |
SigLIP-2 So400M + Scale-RAE ViT-XL (1152) | 8Γ | 2k |
checkpoints/PiD_res2kto4k_sr4x_official_flux_distill_4step |
Flux1-dev (16-ch VAE) | 4Γ | 2kto4k |
checkpoints/PiD_res2kto4k_sr4x_official_flux2_distill_4step |
Flux2-dev (128-ch BN VAE) | 4Γ | 2kto4k |
checkpoints/PiD_res2kto4k_sr4x_official_sd3_distill_4step |
SD3 medium (16-ch VAE) | 4Γ | 2kto4k |
Z-Image shares Flux1's VAE, so its inference path reuses the flux checkpoints
(both 2k and 2kto4k) β no separate zimage checkpoint is shipped.
Each directory contains a single file, model_ema_bf16.pth, which is the EMA
weights cast to bfloat16 β the format the inference scripts load by default.
VAE / encoder weights
These are the per-backbone encoder (and, where applicable, original decoder) weights that PiD pairs with. They're hosted here so a single download brings everything needed end-to-end.
| Path | Description |
|---|---|
checkpoints/ae.safetensors |
Flux1-dev / Z-Image 16-ch VAE (encoder + original Flux decoder). |
checkpoints/flux2_ae.safetensors |
Flux2-dev 128-ch BN VAE. |
checkpoints/sd3_vae/ |
SD3 medium 16-ch VAE in diffusers format. |
checkpoints/rae/ |
DINOv2-B image encoder + RAE ViT-XL decoder + ImageNet-512 normalization statistics. |
checkpoints/scale_rae/ |
SigLIP-2 So400M encoder + Scale-RAE ViT-XL decoder + decoder config. |
Usage
The decoder checkpoints are loaded by the inference scripts in the PiD
codebase. The exact (backbone, ckpt_type) β path mapping is the single source
of truth in
pid/_src/inference/checkpoint_registry.py β clone the
repo, point it at this snapshot, and the demos pick the right file
automatically:
# Pull just the checkpoints/ tree into the repo root (skips this README and
# the teaser figure so they don't clobber the files in the source repo).
hf download nvidia/PiD --local-dir . --include "checkpoints/*"
# Then run any of the demos, e.g.:
PYTHONPATH=. python -m pid._src.inference.from_ldm_flux \
--prompt "A photorealistic cat" \
--ldm_inference_steps 28 --save_xt_steps 22 24 26 \
--output_dir ./results/demo \
--cfg_scale 1 --pid_inference_steps 4 --scale 4
Pick the 2kto4k variant via --pid_ckpt_type 2kto4k when decoding at 4K.
License
Released under the Apache License 2.0. Copyright 2026 NVIDIA Corporation
& Affiliates. See the LICENSE file in the source repository for the full
text.
The upstream encoder backbones (DINOv2, SigLIP-2, Flux, SD3, Z-Image) and their weights remain under their own original licenses; PiD's Apache-2.0 release covers only the PiD decoder weights and code.