nvidia
/

PiD

@@ -6,6 +6,12 @@ tags:
 - pixel-diffusion-decoder
 - vae-decoder
 pipeline_tag: image-to-image
 ---
 # PiD — Pixel Diffusion Decoder
@@ -14,6 +20,18 @@ pipeline_tag: image-to-image
   <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
 </p>
 PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
 diffusion model, unifying decoding and upsampling into a single generative
 module. It denoises directly in high-resolution pixel space and produces a
@@ -25,6 +43,15 @@ entries (`ae.safetensors`, `flux2_ae.safetensors`, `sd3_vae/`, `rae/`,
 `scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
 plugs into — they're not PiD checkpoints themselves.
 ## PiD checkpoints
 Two variants are released for each diffusers-style backbone:
@@ -88,4 +115,14 @@ PYTHONPATH=. python -m pid._src.inference.from_ldm_flux \
     --cfg_scale 1 --pid_inference_steps 4 --scale 4
 ```
-Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.

 - pixel-diffusion-decoder
 - vae-decoder
 pipeline_tag: image-to-image
+base_model:
+- Tongyi-MAI/Z-Image
+- black-forest-labs/FLUX.1-dev
+- black-forest-labs/FLUX.2-dev
+- nyu-visionx/Scale-RAE-Qwen7B_DiT9.8B
+- nvidia/PixelDiT-1300M-1024px
 ---
 # PiD — Pixel Diffusion Decoder
   <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
 </p>
+**[Paper](), [Project Page](https://research.nvidia.com/labs/sil/projects/pid/)**
+[Yifan Lu](https://yifanlu0227.github.io/)\*,
+[Qi Wu](https://wilsoncernwq.github.io/),
+[Jay Zhangjie Wu](https://zhangjiewu.github.io/),
+[Zian Wang](https://www.cs.toronto.edu/~zianwang/),
+[Huan Ling](https://www.cs.toronto.edu/~linghuan/),
+[Sanja Fidler](https://www.cs.utoronto.ca/~fidler/),
+[Xuanchi Ren](https://xuanchiren.com/)\* <br>
 PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
 diffusion model, unifying decoding and upsampling into a single generative
 module. It denoises directly in high-resolution pixel space and produces a
 `scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
 plugs into — they're not PiD checkpoints themselves.
+### License/Terms of Use
+This model is released under the [NVIDIA Internal Scientific Research and Development Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/).
+Important Note: The Model and any Derivative Model may not be distributed, deployed, sublicensed, publicly displayed, publicly performed, or sublicensed. You may not use the Model or a Derivative Model in a production environment or for the purpose of generating works for sale or distribution. If you fail to comply with any of the terms in this Agreement, your rights under the NVIDIA Internal Scientific Research and Development Model License will automatically terminate.
+### Deployment Geography:
+Global
 ## PiD checkpoints
 Two variants are released for each diffusers-style backbone:
     --cfg_scale 1 --pid_inference_steps 4 --scale 4
 ```
+Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.
+## Citation
+```
+@article{lu2026pid,
+  title={},
+  author={},
+  journal={},
+  year={2026}
+}
+```