xrenaa commited on
Commit
626adcb
·
verified ·
1 Parent(s): ae5500a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -1
README.md CHANGED
@@ -6,6 +6,12 @@ tags:
6
  - pixel-diffusion-decoder
7
  - vae-decoder
8
  pipeline_tag: image-to-image
 
 
 
 
 
 
9
  ---
10
 
11
  # PiD — Pixel Diffusion Decoder
@@ -14,6 +20,18 @@ pipeline_tag: image-to-image
14
  <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
15
  </p>
16
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
18
  diffusion model, unifying decoding and upsampling into a single generative
19
  module. It denoises directly in high-resolution pixel space and produces a
@@ -25,6 +43,15 @@ entries (`ae.safetensors`, `flux2_ae.safetensors`, `sd3_vae/`, `rae/`,
25
  `scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
26
  plugs into — they're not PiD checkpoints themselves.
27
 
 
 
 
 
 
 
 
 
 
28
  ## PiD checkpoints
29
 
30
  Two variants are released for each diffusers-style backbone:
@@ -88,4 +115,14 @@ PYTHONPATH=. python -m pid._src.inference.from_ldm_flux \
88
  --cfg_scale 1 --pid_inference_steps 4 --scale 4
89
  ```
90
 
91
- Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.
 
 
 
 
 
 
 
 
 
 
 
6
  - pixel-diffusion-decoder
7
  - vae-decoder
8
  pipeline_tag: image-to-image
9
+ base_model:
10
+ - Tongyi-MAI/Z-Image
11
+ - black-forest-labs/FLUX.1-dev
12
+ - black-forest-labs/FLUX.2-dev
13
+ - nyu-visionx/Scale-RAE-Qwen7B_DiT9.8B
14
+ - nvidia/PixelDiT-1300M-1024px
15
  ---
16
 
17
  # PiD — Pixel Diffusion Decoder
 
20
  <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
21
  </p>
22
 
23
+
24
+ **[Paper](), [Project Page](https://research.nvidia.com/labs/sil/projects/pid/)**
25
+
26
+ [Yifan Lu](https://yifanlu0227.github.io/)\*,
27
+ [Qi Wu](https://wilsoncernwq.github.io/),
28
+ [Jay Zhangjie Wu](https://zhangjiewu.github.io/),
29
+ [Zian Wang](https://www.cs.toronto.edu/~zianwang/),
30
+ [Huan Ling](https://www.cs.toronto.edu/~linghuan/),
31
+ [Sanja Fidler](https://www.cs.utoronto.ca/~fidler/),
32
+ [Xuanchi Ren](https://xuanchiren.com/)\* <br>
33
+
34
+
35
  PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
36
  diffusion model, unifying decoding and upsampling into a single generative
37
  module. It denoises directly in high-resolution pixel space and produces a
 
43
  `scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
44
  plugs into — they're not PiD checkpoints themselves.
45
 
46
+ ### License/Terms of Use
47
+
48
+ This model is released under the [NVIDIA Internal Scientific Research and Development Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-internal-scientific-research-and-development-model-license/).
49
+
50
+ Important Note: The Model and any Derivative Model may not be distributed, deployed, sublicensed, publicly displayed, publicly performed, or sublicensed. You may not use the Model or a Derivative Model in a production environment or for the purpose of generating works for sale or distribution. If you fail to comply with any of the terms in this Agreement, your rights under the NVIDIA Internal Scientific Research and Development Model License will automatically terminate.
51
+
52
+ ### Deployment Geography:
53
+ Global
54
+
55
  ## PiD checkpoints
56
 
57
  Two variants are released for each diffusers-style backbone:
 
115
  --cfg_scale 1 --pid_inference_steps 4 --scale 4
116
  ```
117
 
118
+ Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.
119
+
120
+ ## Citation
121
+ ```
122
+ @article{lu2026pid,
123
+ title={},
124
+ author={},
125
+ journal={},
126
+ year={2026}
127
+ }
128
+ ```