Add model card with checkpoint inventory and teaser

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +101 -0
figures/teaser.jpg +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+figures/teaser.jpg filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,101 @@

+---
+license: apache-2.0
+library_name: pytorch
+tags:
+  - super-resolution
+  - diffusion
+  - pixel-diffusion-decoder
+  - vae-decoder
+pipeline_tag: image-to-image
+---
+# PiD — Pixel Diffusion Decoder
+<p align="center">
+  <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
+</p>
+PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
+diffusion model, unifying decoding and upsampling into a single generative
+module. It denoises directly in high-resolution pixel space and produces a
+super-resolved image in one pass. This repository hosts the released decoder
+checkpoints, plus the encoder/decoder ("VAE") weights they depend on.
+All `PiD_*` checkpoints in this repo are **4-step distilled**. The non-`PiD_*`
+entries (`ae.safetensors`, `flux2_ae.safetensors`, `sd3_vae/`, `rae/`,
+`scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
+plugs into — they're not PiD checkpoints themselves.
+## PiD checkpoints
+Two variants are released for each diffusers-style backbone:
+- **`2k`** — trained at 2048px, used as a 4× decoder (512 LDM → 2048 px), or as
+  an 8× decoder for the Scale-RAE backbone (256 → 2048).
+- **`2kto4k`** — trained with multi-resolution data bucketing 2048→3840 and an
+  SD3-style dynamic shift; designed for 1024 LDM → 4K (3840 px) decoding. Only
+  released for the diffusers backbones.
+| Path                                                          | Backbone (encoder side)                    | SR factor | Variant   |
+|---------------------------------------------------------------|--------------------------------------------|-----------|-----------|
+| `checkpoints/PiD_res2k_sr4x_official_flux_distill_4step`      | Flux1-dev (16-ch VAE)                      | 4×        | 2k        |
+| `checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step`     | Flux2-dev (128-ch BN VAE)                  | 4×        | 2k        |
+| `checkpoints/PiD_res2k_sr4x_official_sd3_distill_4step`       | SD3 medium (16-ch VAE)                     | 4×        | 2k        |
+| `checkpoints/PiD_res2k_sr4x_official_dinov2_distill_4step`    | DINOv2-B + RAE ViT-XL (768-ch)             | 4×        | 2k        |
+| `checkpoints/PiD_res2k_sr8x_official_siglip_distill_4step`    | SigLIP-2 So400M + Scale-RAE ViT-XL (1152)  | 8×        | 2k        |
+| `checkpoints/PiD_res2kto4k_sr4x_official_flux_distill_4step`  | Flux1-dev (16-ch VAE)                      | 4×        | 2kto4k    |
+| `checkpoints/PiD_res2kto4k_sr4x_official_flux2_distill_4step` | Flux2-dev (128-ch BN VAE)                  | 4×        | 2kto4k    |
+| `checkpoints/PiD_res2kto4k_sr4x_official_sd3_distill_4step`   | SD3 medium (16-ch VAE)                     | 4×        | 2kto4k    |
+Z-Image shares Flux1's VAE, so its inference path reuses the `flux` checkpoints
+(both `2k` and `2kto4k`) — no separate `zimage` checkpoint is shipped.
+Each directory contains a single file, `model_ema_bf16.pth`, which is the EMA
+weights cast to bfloat16 — the format the inference scripts load by default.
+## VAE / encoder weights
+These are the per-backbone encoder (and, where applicable, original decoder)
+weights that PiD pairs with. They're hosted here so a single download brings
+everything needed end-to-end.
+| Path                            | Description                                                                          |
+|---------------------------------|--------------------------------------------------------------------------------------|
+| `checkpoints/ae.safetensors`    | Flux1-dev / Z-Image 16-ch VAE (encoder + original Flux decoder).                     |
+| `checkpoints/flux2_ae.safetensors` | Flux2-dev 128-ch BN VAE.                                                          |
+| `checkpoints/sd3_vae/`          | SD3 medium 16-ch VAE in diffusers format.                                            |
+| `checkpoints/rae/`              | DINOv2-B image encoder + RAE ViT-XL decoder + ImageNet-512 normalization statistics. |
+| `checkpoints/scale_rae/`        | SigLIP-2 So400M encoder + Scale-RAE ViT-XL decoder + decoder config.                 |
+## Usage
+The decoder checkpoints are loaded by the inference scripts in the PiD
+codebase. The exact `(backbone, ckpt_type) → path` mapping is the single source
+of truth in
+[`pid/_src/inference/checkpoint_registry.py`](https://github.com/) — clone the
+repo, point it at this snapshot, and the demos pick the right file
+automatically:
+```bash
+# Download this whole snapshot into ./checkpoints
+hf download nvidia/PiD --local-dir .
+# Then run any of the demos, e.g.:
+PYTHONPATH=. python -m pid._src.inference.from_ldm_flux \
+    --prompt "A photorealistic cat" \
+    --ldm_inference_steps 28 --save_xt_steps 22 24 26 \
+    --output_dir ./results/demo \
+    --cfg_scale 1 --pid_inference_steps 4 --scale 4
+```
+Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.
+## License
+Released under the **Apache License 2.0**. Copyright 2026 NVIDIA Corporation
+& Affiliates. See the `LICENSE` file in the source repository for the full
+text.
+The upstream encoder backbones (DINOv2, SigLIP-2, Flux, SD3, Z-Image) and their
+weights remain under their own original licenses; PiD's Apache-2.0 release
+covers only the PiD decoder weights and code.

figures/teaser.jpg ADDED Viewed

Git LFS Details

SHA256: fb74f71364bd8fc0901650d6c7b5b8ef8efac751b7d248d2c9a3d7accf031d17
Pointer size: 132 Bytes
Size of remote file: 1.36 MB