SearchingMan commited on
Commit
0179f45
·
verified ·
1 Parent(s): 4c2a888

Z-Image-Turbo with student+adapter text encoder

Browse files
.gitattributes CHANGED
@@ -33,3 +33,14 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ assets/DMDR.webp filter=lfs diff=lfs merge=lfs -text
38
+ assets/architecture.webp filter=lfs diff=lfs merge=lfs -text
39
+ assets/decoupled-dmd.webp filter=lfs diff=lfs merge=lfs -text
40
+ assets/reasoning.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/showcase.jpg filter=lfs diff=lfs merge=lfs -text
42
+ assets/showcase_editing.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/showcase_realistic.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/showcase_rendering.png filter=lfs diff=lfs merge=lfs -text
45
+ assets/Z-Image-Gallery.pdf filter=lfs diff=lfs merge=lfs -text
46
+ assets/leaderboard.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-to-image
6
+ library_name: diffusers
7
+ ---
8
+
9
+
10
+ <h1 align="center">⚡️- Image<br><sub><sup>An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer</sup></sub></h1>
11
+
12
+ <div align="center">
13
+
14
+ [![Official Site](https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage)](https://tongyi-mai.github.io/Z-Image-blog/)&#160;
15
+ [![GitHub](https://img.shields.io/badge/GitHub-Z--Image-181717?logo=github&logoColor=white)](https://github.com/Tongyi-MAI/Z-Image)&#160;
16
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)&#160;
17
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Online_Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo)&#160;
18
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Mobile_Demo-Z--Image--Turbo-red)](https://huggingface.co/spaces/akhaliq/Z-Image-Turbo)&#160;
19
+ [![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo)&#160;
20
+ [![ModelScope Space](https://img.shields.io/badge/🤖%20Online_Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster)&#160;
21
+ [![Art Gallery PDF](https://img.shields.io/badge/%F0%9F%96%BC%20Art_Gallery-PDF-ff69b4)](assets/Z-Image-Gallery.pdf)&#160;
22
+ [![Web Art Gallery](https://img.shields.io/badge/%F0%9F%8C%90%20Web_Art_Gallery-online-00bfff)](https://modelscope.cn/studios/Tongyi-MAI/Z-Image-Gallery/summary)&#160;
23
+ <a href="https://arxiv.org/abs/2511.22699" target="_blank"><img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="21px"></a>
24
+
25
+
26
+ Welcome to the official repository for the Z-Image(造相)project!
27
+
28
+ </div>
29
+
30
+
31
+
32
+ ## ✨ Z-Image
33
+
34
+ Z-Image is a powerful and highly efficient image generation model family with **6B** parameters. Currently there are four variants:
35
+
36
+ - 🚀 **Z-Image-Turbo** – A distilled version of Z-Image that matches or exceeds leading competitors with only **8 NFEs** (Number of Function Evaluations). It offers **⚡️sub-second inference latency⚡️** on enterprise-grade H800 GPUs and fits comfortably within **16G VRAM consumer devices**. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
37
+
38
+ - 🎨 **Z-Image** – The foundation model behind Z-Image-Turbo. Z-Image focuses on **high-quality generation**, **rich aesthetics**, **strong diversity**, and **controllability**, well-suited for creative generation, **fine-tuning**, and downstream development. It supports a wide range of artistic styles, effective negative prompting, and high diversity across identities, poses, compositions, and layouts.
39
+
40
+ - 🧱 **Z-Image-Omni-Base** – The versatile foundation model capable of both **generation and editing tasks**. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development, providing the most "raw" and diverse starting point for the open-source community.
41
+
42
+ - ✍️ **Z-Image-Edit** – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
43
+
44
+ ### 📥 Model Zoo
45
+
46
+ | Model | Pre-Training | SFT | RL | Step | CFG | Task | Visual Quality | Diversity | Fine-Tunability | Hugging Face | ModelScope |
47
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
48
+ | **Z-Image-Omni-Base** | ✅ | ❌ | ❌ | 50 | ✅ | Gen. / Editing | Medium | High | Easy | *To be released* | *To be released* |
49
+ | **Z-Image** | ✅ | ✅ | ❌ | 50 | ✅ | Gen. | High | Medium | Easy | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=569345&modelType=Checkpoint&sdVersion=Z_IMAGE&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image%3Frevision%3Dmaster) |
50
+ | **Z-Image-Turbo** | ✅ | ✅ | ✅ | 8 | ❌ | Gen. | Very High | Low | N/A | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint%20-Z--Image--Turbo-yellow)](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) <br> [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Demo-Z--Image--Turbo-blue)](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo) | [![ModelScope Model](https://img.shields.io/badge/🤖%20%20Checkpoint-Z--Image--Turbo-624aff)](https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo) <br> [![ModelScope Space](https://img.shields.io/badge/%F0%9F%A4%96%20Demo-Z--Image--Turbo-17c7a7)](https://www.modelscope.cn/aigc/imageGeneration?tab=advanced&versionId=469191&modelType=Checkpoint&sdVersion=Z_IMAGE_TURBO&modelUrl=modelscope%3A%2F%2FTongyi-MAI%2FZ-Image-Turbo%3Frevision%3Dmaster) |
51
+ | **Z-Image-Edit** | ✅ | ✅ | ❌ | 50 | ✅ | Editing | High | Medium | Easy | *To be released* | *To be released* | | *To be released* |
52
+
53
+ ### 🖼️ Showcase
54
+
55
+ 📸 **Photorealistic Quality**: **Z-Image-Turbo** delivers strong photorealistic image generation while maintaining excellent aesthetic quality.
56
+
57
+ ![Showcase of Z-Image on Photo-realistic image Generation](assets/showcase_realistic.png)
58
+
59
+ 📖 **Accurate Bilingual Text Rendering**: **Z-Image-Turbo** excels at accurately rendering complex Chinese and English text.
60
+
61
+ ![Showcase of Z-Image on Bilingual Text Rendering](assets/showcase_rendering.png)
62
+
63
+ 💡 **Prompt Enhancing & Reasoning**: Prompt Enhancer empowers the model with reasoning capabilities, enabling it to transcend surface-level descriptions and tap into underlying world knowledge.
64
+
65
+ ![reasoning.jpg](assets/reasoning.png)
66
+
67
+ 🧠 **Creative Image Editing**: **Z-Image-Edit** shows a strong understanding of bilingual editing instructions, enabling imaginative and flexible image transformations.
68
+
69
+ ![Showcase of Z-Image-Edit on Image Editing](assets/showcase_editing.png)
70
+
71
+ ### 🏗️ Model Architecture
72
+ We adopt a **Scalable Single-Stream DiT** (S3-DiT) architecture. In this setup, text, visual semantic tokens, and image VAE tokens are concatenated at the sequence level to serve as a unified input stream, maximizing parameter efficiency compared to dual-stream approaches.
73
+
74
+ ![Architecture of Z-Image and Z-Image-Edit](assets/architecture.webp)
75
+
76
+ ### 📈 Performance
77
+ According to the Elo-based Human Preference Evaluation (on [*Alibaba AI Arena*](https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I)), Z-Image-Turbo shows highly competitive performance against other leading models, while achieving state-of-the-art results among open-source models.
78
+
79
+ <p align="center">
80
+ <a href="https://aiarena.alibaba-inc.com/corpora/arena/leaderboard?arenaType=T2I">
81
+ <img src="assets/leaderboard.png" alt="Z-Image Elo Rating on AI Arena"/><br />
82
+ <span style="font-size:1.05em; cursor:pointer; text-decoration:underline;"> Click to view the full leaderboard</span>
83
+ </a>
84
+ </p>
85
+
86
+ ### 🚀 Quick Start
87
+ Install the latest version of diffusers, use the following command:
88
+ <details>
89
+ <summary><sup>Click here for details for why you need to install diffusers from source</sup></summary>
90
+
91
+ We have submitted two pull requests ([#12703](https://github.com/huggingface/diffusers/pull/12703) and [#12715](https://github.com/huggingface/diffusers/pull/12715)) to the 🤗 diffusers repository to add support for Z-Image. Both PRs have been merged into the latest official diffusers release.
92
+ Therefore, you need to install diffusers from source for the latest features and Z-Image support.
93
+
94
+ </details>
95
+
96
+ ```bash
97
+ pip install git+https://github.com/huggingface/diffusers
98
+ ```
99
+
100
+ ```python
101
+ import torch
102
+ from diffusers import ZImagePipeline
103
+
104
+ # 1. Load the pipeline
105
+ # Use bfloat16 for optimal performance on supported GPUs
106
+ pipe = ZImagePipeline.from_pretrained(
107
+ "Tongyi-MAI/Z-Image-Turbo",
108
+ torch_dtype=torch.bfloat16,
109
+ low_cpu_mem_usage=False,
110
+ )
111
+ pipe.to("cuda")
112
+
113
+ # [Optional] Attention Backend
114
+ # Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
115
+ # pipe.transformer.set_attention_backend("flash") # Enable Flash-Attention-2
116
+ # pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3
117
+
118
+ # [Optional] Model Compilation
119
+ # Compiling the DiT model accelerates inference, but the first run will take longer to compile.
120
+ # pipe.transformer.compile()
121
+
122
+ # [Optional] CPU Offloading
123
+ # Enable CPU offloading for memory-constrained devices.
124
+ # pipe.enable_model_cpu_offload()
125
+
126
+ prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
127
+
128
+ # 2. Generate Image
129
+ image = pipe(
130
+ prompt=prompt,
131
+ height=1024,
132
+ width=1024,
133
+ num_inference_steps=9, # This actually results in 8 DiT forwards
134
+ guidance_scale=0.0, # Guidance should be 0 for the Turbo models
135
+ generator=torch.Generator("cuda").manual_seed(42),
136
+ ).images[0]
137
+
138
+ image.save("example.png")
139
+ ```
140
+
141
+ ## 🔬 Decoupled-DMD: The Acceleration Magic Behind Z-Image
142
+
143
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.22677-b31b1b.svg)](https://arxiv.org/abs/2511.22677)
144
+
145
+ Decoupled-DMD is the core few-step distillation algorithm that empowers the 8-step Z-Image model.
146
+
147
+ Our core insight in Decoupled-DMD is that the success of existing DMD (Distributaion Matching Distillation) methods is the result of two independent, collaborating mechanisms:
148
+
149
+ - **CFG Augmentation (CA)**: The primary **engine** 🚀 driving the distillation process, a factor largely overlooked in previous work.
150
+ - **Distribution Matching (DM)**: Acts more as a **regularizer** ⚖️, ensuring the stability and quality of the generated output.
151
+
152
+ By recognizing and decoupling these two mechanisms, we were able to study and optimize them in isolation. This ultimately motivated us to develop an improved distillation process that significantly enhances the performance of few-step generation.
153
+
154
+ ![Diagram of Decoupled-DMD](assets/decoupled-dmd.webp)
155
+
156
+ ## 🤖 DMDR: Fusing DMD with Reinforcement Learning
157
+
158
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.13649-b31b1b.svg)](https://arxiv.org/abs/2511.13649)
159
+
160
+ Building upon the strong foundation of Decoupled-DMD, our 8-step Z-Image model has already demonstrated exceptional capabilities. To achieve further improvements in terms of semantic alignment, aesthetic quality, and structural coherence—while producing images with richer high-frequency details—we present **DMDR**.
161
+
162
+ Our core insight behind DMDR is that Reinforcement Learning (RL) and Distribution Matching Distillation (DMD) can be synergistically integrated during the post-training of few-step models. We demonstrate that:
163
+
164
+ - **RL Unlocks the Performance of DMD** 🚀
165
+ - **DMD Effectively Regularizes RL** ⚖️
166
+
167
+ ![Diagram of DMDR](assets/DMDR.webp)
168
+
169
+ ## ⏬ Download
170
+ ```bash
171
+ pip install -U huggingface_hub
172
+ HF_XET_HIGH_PERFORMANCE=1 hf download Tongyi-MAI/Z-Image-Turbo
173
+ ```
174
+
175
+ ## 📜 Citation
176
+
177
+ If you find our work useful in your research, please consider citing:
178
+
179
+ ```bibtex
180
+ @article{team2025zimage,
181
+ title={Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer},
182
+ author={Z-Image Team},
183
+ journal={arXiv preprint arXiv:2511.22699},
184
+ year={2025}
185
+ }
186
+
187
+ @article{liu2025decoupled,
188
+ title={Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield},
189
+ author={Dongyang Liu and Peng Gao and David Liu and Ruoyi Du and Zhen Li and Qilong Wu and Xin Jin and Sihan Cao and Shifeng Zhang and Hongsheng Li and Steven Hoi},
190
+ journal={arXiv preprint arXiv:2511.22677},
191
+ year={2025}
192
+ }
193
+
194
+ @article{jiang2025distribution,
195
+ title={Distribution Matching Distillation Meets Reinforcement Learning},
196
+ author={Jiang, Dengyang and Liu, Dongyang and Wang, Zanyi and Wu, Qilong and Jin, Xin and Liu, David and Li, Zhen and Wang, Mengmeng and Gao, Peng and Yang, Harry},
197
+ journal={arXiv preprint arXiv:2511.13649},
198
+ year={2025}
199
+ }
200
+ ```
assets/DMDR.webp ADDED

Git LFS Details

  • SHA256: 2e6f3053b98d097f2aa11d3892bd9307326db41b65336bea54dc5825a0e03077
  • Pointer size: 131 Bytes
  • Size of remote file: 173 kB
assets/Z-Image-Gallery.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f9895b3246d2547bac74bbe0be975da500eaae93f2cad4248ad3281786b1ac6
3
+ size 15767436
assets/architecture.webp ADDED

Git LFS Details

  • SHA256: 261af62ecc7e9749ae28e1d3a84e2f70a6c192d2017b7d8f020c7bff982ef59c
  • Pointer size: 131 Bytes
  • Size of remote file: 422 kB
assets/decoupled-dmd.webp ADDED

Git LFS Details

  • SHA256: 4568ca559b997fc38f57dc1c3f5b1da3a3c144ae12419caa855ced972bf8c7aa
  • Pointer size: 131 Bytes
  • Size of remote file: 152 kB
assets/leaderboard.png ADDED

Git LFS Details

  • SHA256: e9fd4aa185bb7bff2b5515f2001b4d80df330595e78d6a098142e5a232bb4e4e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.03 MB
assets/leaderboard.webp ADDED
assets/reasoning.png ADDED

Git LFS Details

  • SHA256: 96c16b2c8d8dc67bb92ecc22d54b9955ab55136977f515bb76f4b2eb42eb3cdb
  • Pointer size: 132 Bytes
  • Size of remote file: 7.7 MB
assets/showcase.jpg ADDED

Git LFS Details

  • SHA256: f6ee74e066e00596e429f5a08140aebae1678e5935ce1e11ca6c1c6cd72432ee
  • Pointer size: 132 Bytes
  • Size of remote file: 6.43 MB
assets/showcase_editing.png ADDED

Git LFS Details

  • SHA256: 7d720c3157fd0b0c1f07ac826c6d380b4bcb1b6933c64eb11bfe804ccf7c26f4
  • Pointer size: 132 Bytes
  • Size of remote file: 4.75 MB
assets/showcase_realistic.png ADDED

Git LFS Details

  • SHA256: 697e6f6857f619314173508df72a14314cbb43e67475de7494123bb8b4f4eb2c
  • Pointer size: 132 Bytes
  • Size of remote file: 6.26 MB
assets/showcase_rendering.png ADDED

Git LFS Details

  • SHA256: 3556dd66be2200d53f957424e12ecf914ddf3eded151cde86c7353f8b231284f
  • Pointer size: 132 Bytes
  • Size of remote file: 7.6 MB
custom_adapter/adapter_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_student_model": "Qwen/Qwen3-1.7B",
3
+ "teacher_model": "Tongyi-MAI/Z-Image-Turbo",
4
+ "zimage_source": "/content/hf_models_unzipped/Z-Image-Turbo",
5
+ "student_hidden_size": 2048,
6
+ "teacher_hidden_size": 2560,
7
+ "adapter_dim": 1024,
8
+ "adapter_heads": 8,
9
+ "adapter_blocks": 2,
10
+ "adapter_ff_mult": 4,
11
+ "adapter_dropout": 0.1,
12
+ "hs_tap_index": -2,
13
+ "layer_mapping": [
14
+ [
15
+ -2,
16
+ -2
17
+ ]
18
+ ],
19
+ "zimage_enable_thinking": true
20
+ }
custom_adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6e198d801f05cd3d4b16232035b4603d48c957429564697340c9e2e8c73827f
3
+ size 128070720
model_index.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ZImagePipeline",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "FlowMatchEulerDiscreteScheduler"
7
+ ],
8
+ "text_encoder": [
9
+ "transformers",
10
+ "Qwen3Model"
11
+ ],
12
+ "tokenizer": [
13
+ "transformers",
14
+ "Qwen2Tokenizer"
15
+ ],
16
+ "transformer": [
17
+ "diffusers",
18
+ "ZImageTransformer2DModel"
19
+ ],
20
+ "vae": [
21
+ "diffusers",
22
+ "AutoencoderKL"
23
+ ]
24
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "FlowMatchEulerDiscreteScheduler",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "num_train_timesteps": 1000,
5
+ "use_dynamic_shifting": false,
6
+ "shift": 3.0
7
+ }
student_adapter_build_meta.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_repo": "Tongyi-MAI/Z-Image-Turbo",
3
+ "student_source": "/content/drive/MyDrive/zimage_distillkit_contract_qwen17b/run_20260414_092303/phase2",
4
+ "adapter_source": "/content/drive/MyDrive/zimage_distillkit_contract_qwen17b/run_20260414_092303/phase2/custom_adapter",
5
+ "adapter_config": {
6
+ "base_student_model": "Qwen/Qwen3-1.7B",
7
+ "teacher_model": "Tongyi-MAI/Z-Image-Turbo",
8
+ "zimage_source": "/content/hf_models_unzipped/Z-Image-Turbo",
9
+ "student_hidden_size": 2048,
10
+ "teacher_hidden_size": 2560,
11
+ "adapter_dim": 1024,
12
+ "adapter_heads": 8,
13
+ "adapter_blocks": 2,
14
+ "adapter_ff_mult": 4,
15
+ "adapter_dropout": 0.1,
16
+ "hs_tap_index": -2,
17
+ "layer_mapping": [
18
+ [
19
+ -2,
20
+ -2
21
+ ]
22
+ ],
23
+ "zimage_enable_thinking": true
24
+ },
25
+ "created_at_unix": 1778152795.3211486
26
+ }
text_encoder/__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Student Adapter Text Encoder — drop-in replacement for Qwen3 text encoder
2
+ # Supports ZImagePipeline via trust_remote_code=True auto_map.
3
+ from .configuration_student_adapter import StudentAdapterConfig
4
+ from .modeling_student_adapter import StudentAdapterTextEncoder
text_encoder/config.json ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adapter_blocks": 2,
3
+ "adapter_dim": 1024,
4
+ "adapter_dropout": 0.1,
5
+ "adapter_ff_mult": 4,
6
+ "adapter_heads": 8,
7
+ "architectures": [
8
+ "StudentAdapterTextEncoder"
9
+ ],
10
+ "dtype": "bfloat16",
11
+ "hs_tap_index": -2,
12
+ "model_type": "zimage_student_adapter",
13
+ "student_config_dict": {
14
+ "_name_or_path": "/content/drive/MyDrive/zimage_distillkit_contract_qwen17b/run_20260414_092303/phase2",
15
+ "add_cross_attention": false,
16
+ "architectures": [
17
+ "Qwen3ForCausalLM"
18
+ ],
19
+ "attention_bias": false,
20
+ "attention_dropout": 0.0,
21
+ "bad_words_ids": null,
22
+ "begin_suppress_tokens": null,
23
+ "bos_token_id": null,
24
+ "chunk_size_feed_forward": 0,
25
+ "cross_attention_hidden_size": null,
26
+ "decoder_start_token_id": null,
27
+ "diversity_penalty": 0.0,
28
+ "do_sample": false,
29
+ "dtype": "bfloat16",
30
+ "early_stopping": false,
31
+ "encoder_no_repeat_ngram_size": 0,
32
+ "eos_token_id": 151645,
33
+ "exponential_decay_length_penalty": null,
34
+ "finetuning_task": null,
35
+ "forced_bos_token_id": null,
36
+ "forced_eos_token_id": null,
37
+ "head_dim": 128,
38
+ "hidden_act": "silu",
39
+ "hidden_size": 2048,
40
+ "id2label": {
41
+ "0": "LABEL_0",
42
+ "1": "LABEL_1"
43
+ },
44
+ "initializer_range": 0.02,
45
+ "intermediate_size": 6144,
46
+ "is_decoder": false,
47
+ "is_encoder_decoder": false,
48
+ "label2id": {
49
+ "LABEL_0": 0,
50
+ "LABEL_1": 1
51
+ },
52
+ "layer_types": [
53
+ "full_attention",
54
+ "full_attention",
55
+ "full_attention",
56
+ "full_attention",
57
+ "full_attention",
58
+ "full_attention",
59
+ "full_attention",
60
+ "full_attention",
61
+ "full_attention",
62
+ "full_attention",
63
+ "full_attention",
64
+ "full_attention",
65
+ "full_attention",
66
+ "full_attention",
67
+ "full_attention",
68
+ "full_attention",
69
+ "full_attention",
70
+ "full_attention",
71
+ "full_attention",
72
+ "full_attention",
73
+ "full_attention",
74
+ "full_attention",
75
+ "full_attention",
76
+ "full_attention",
77
+ "full_attention",
78
+ "full_attention",
79
+ "full_attention",
80
+ "full_attention"
81
+ ],
82
+ "length_penalty": 1.0,
83
+ "max_length": 20,
84
+ "max_position_embeddings": 40960,
85
+ "max_window_layers": 28,
86
+ "min_length": 0,
87
+ "model_type": "qwen3",
88
+ "no_repeat_ngram_size": 0,
89
+ "num_attention_heads": 16,
90
+ "num_beam_groups": 1,
91
+ "num_beams": 1,
92
+ "num_hidden_layers": 28,
93
+ "num_key_value_heads": 8,
94
+ "num_return_sequences": 1,
95
+ "output_attentions": false,
96
+ "output_hidden_states": false,
97
+ "output_scores": false,
98
+ "pad_token_id": 151643,
99
+ "prefix": null,
100
+ "problem_type": null,
101
+ "pruned_heads": {},
102
+ "remove_invalid_values": false,
103
+ "repetition_penalty": 1.0,
104
+ "return_dict": true,
105
+ "return_dict_in_generate": false,
106
+ "rms_norm_eps": 1e-06,
107
+ "rope_parameters": {
108
+ "rope_theta": 1000000,
109
+ "rope_type": "default"
110
+ },
111
+ "rope_scaling": null,
112
+ "rope_theta": 10000.0,
113
+ "sep_token_id": null,
114
+ "sliding_window": null,
115
+ "suppress_tokens": null,
116
+ "task_specific_params": null,
117
+ "temperature": 1.0,
118
+ "tf_legacy_loss": false,
119
+ "tie_encoder_decoder": false,
120
+ "tie_word_embeddings": true,
121
+ "tokenizer_class": null,
122
+ "top_k": 50,
123
+ "top_p": 1.0,
124
+ "torchscript": false,
125
+ "transformers_version": "4.57.6",
126
+ "typical_p": 1.0,
127
+ "use_bfloat16": false,
128
+ "use_cache": false,
129
+ "use_sliding_window": false,
130
+ "vocab_size": 151936
131
+ },
132
+ "student_hidden_size": 2048,
133
+ "student_model_type": "qwen3",
134
+ "teacher_hidden_size": 2560,
135
+ "transformers_version": "4.57.6",
136
+ "auto_map": {
137
+ "AutoConfig": "configuration_student_adapter.StudentAdapterConfig",
138
+ "AutoModel": "modeling_student_adapter.StudentAdapterTextEncoder",
139
+ "AutoModelForCausalLM": "modeling_student_adapter.StudentAdapterTextEncoder"
140
+ }
141
+ }
text_encoder/configuration_student_adapter.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import PretrainedConfig
2
+
3
+ class StudentAdapterConfig(PretrainedConfig):
4
+ model_type = "zimage_student_adapter"
5
+
6
+ def __init__(
7
+ self,
8
+ student_config_dict=None,
9
+ student_model_type=None,
10
+ hs_tap_index=-2,
11
+ adapter_dim=1024,
12
+ adapter_heads=8,
13
+ adapter_blocks=2,
14
+ adapter_ff_mult=4,
15
+ adapter_dropout=0.1,
16
+ teacher_hidden_size=None,
17
+ student_hidden_size=None,
18
+ **kwargs,
19
+ ):
20
+ super().__init__(**kwargs)
21
+ self.student_config_dict = student_config_dict or {}
22
+ self.student_model_type = student_model_type
23
+ self.hs_tap_index = int(hs_tap_index)
24
+ self.adapter_dim = int(adapter_dim)
25
+ self.adapter_heads = int(adapter_heads)
26
+ self.adapter_blocks = int(adapter_blocks)
27
+ self.adapter_ff_mult = int(adapter_ff_mult)
28
+ self.adapter_dropout = float(adapter_dropout)
29
+ self.teacher_hidden_size = teacher_hidden_size
30
+ self.student_hidden_size = student_hidden_size
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1526ed816759565aa04aa5a33a847c6652ed6b765addeb5f7d8eaaf46d96ff15
3
+ size 3569259104
text_encoder/modeling_student_adapter.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ from transformers import AutoConfig, AutoModelForCausalLM, PreTrainedModel
4
+ from transformers.modeling_outputs import BaseModelOutputWithPast
5
+ from .configuration_student_adapter import StudentAdapterConfig
6
+
7
+
8
+ class XAttnBlock(nn.Module):
9
+ def __init__(self, dim, heads, ff_mult=4, dropout=0.1):
10
+ super().__init__()
11
+ self.norm_q = nn.LayerNorm(dim)
12
+ self.norm_kv = nn.LayerNorm(dim)
13
+ self.attn = nn.MultiheadAttention(dim, heads, dropout=dropout, batch_first=True)
14
+ self.norm_ff = nn.LayerNorm(dim)
15
+ self.ff = nn.Sequential(
16
+ nn.Linear(dim, dim * ff_mult),
17
+ nn.GELU(),
18
+ nn.Dropout(dropout),
19
+ nn.Linear(dim * ff_mult, dim),
20
+ nn.Dropout(dropout),
21
+ )
22
+
23
+ def forward(self, q, kv, key_padding_mask=None):
24
+ q = q + self.attn(
25
+ self.norm_q(q),
26
+ self.norm_kv(kv),
27
+ self.norm_kv(kv),
28
+ key_padding_mask=key_padding_mask,
29
+ need_weights=False,
30
+ )[0]
31
+ q = q + self.ff(self.norm_ff(q))
32
+ return q
33
+
34
+
35
+ class Adapter(nn.Module):
36
+ def __init__(self, s_dim, t_dim, dim=1024, heads=8, blocks=2, ff_mult=4, dropout=0.1):
37
+ super().__init__()
38
+ self.q_proj = nn.Linear(s_dim, dim)
39
+ self.kv_proj = nn.Linear(s_dim, dim)
40
+ self.blocks = nn.ModuleList([
41
+ XAttnBlock(dim, heads, ff_mult=ff_mult, dropout=dropout)
42
+ for _ in range(blocks)
43
+ ])
44
+ self.proj_out = nn.Linear(dim, t_dim)
45
+
46
+ def forward(self, student_hs, mask):
47
+ q = self.q_proj(student_hs)
48
+ kv = self.kv_proj(student_hs)
49
+ key_padding_mask = ~mask.bool()
50
+ for block in self.blocks:
51
+ q = block(q, kv, key_padding_mask=key_padding_mask)
52
+ out = self.proj_out(q)
53
+ out = out.masked_fill(~mask[..., None].bool(), 0)
54
+ return out
55
+
56
+
57
+ class StudentAdapterTextEncoder(PreTrainedModel):
58
+ config_class = StudentAdapterConfig
59
+ base_model_prefix = "student"
60
+
61
+ def __init__(self, config: StudentAdapterConfig):
62
+ super().__init__(config)
63
+ student_cfg_dict = dict(config.student_config_dict or {})
64
+ if not student_cfg_dict:
65
+ raise ValueError("StudentAdapterConfig.student_config_dict is required")
66
+
67
+ model_type = student_cfg_dict.get("model_type") or config.student_model_type
68
+ if model_type is None:
69
+ raise ValueError("Missing student model_type")
70
+
71
+ cfg_kwargs = dict(student_cfg_dict)
72
+ cfg_kwargs.pop("model_type", None)
73
+ student_cfg = AutoConfig.for_model(model_type, **cfg_kwargs)
74
+ self.student = AutoModelForCausalLM.from_config(student_cfg, trust_remote_code=True)
75
+
76
+ s_dim = int(getattr(self.student.config, "hidden_size", config.student_hidden_size))
77
+ t_dim = int(config.teacher_hidden_size)
78
+ self.adapter = Adapter(
79
+ s_dim=s_dim,
80
+ t_dim=t_dim,
81
+ dim=config.adapter_dim,
82
+ heads=config.adapter_heads,
83
+ blocks=config.adapter_blocks,
84
+ ff_mult=config.adapter_ff_mult,
85
+ dropout=config.adapter_dropout,
86
+ )
87
+ self.hs_tap_index = int(config.hs_tap_index)
88
+ self.post_init()
89
+
90
+ def _extract_hs(self, outputs, idx: int):
91
+ hs = outputs.hidden_states
92
+ if hs is None:
93
+ raise RuntimeError("Student output_hidden_states is required")
94
+ if not (-len(hs) <= idx < len(hs)):
95
+ raise IndexError(f"hidden-state index {idx} out of range for len={len(hs)}")
96
+ return hs[idx]
97
+
98
+ def forward(self, input_ids=None, attention_mask=None, output_hidden_states=True, return_dict=True, **kwargs):
99
+ if input_ids is None:
100
+ raise ValueError("input_ids is required")
101
+ if attention_mask is None:
102
+ attention_mask = torch.ones_like(input_ids, dtype=torch.long)
103
+ # Qwen3 student model expects long dtype; pipeline may pass bool masks
104
+ if attention_mask.dtype == torch.bool:
105
+ attention_mask = attention_mask.long()
106
+
107
+ out = self.student(
108
+ input_ids=input_ids,
109
+ attention_mask=attention_mask,
110
+ output_hidden_states=True,
111
+ return_dict=True,
112
+ **kwargs,
113
+ )
114
+
115
+ hs_list = list(out.hidden_states)
116
+ s_hs = self._extract_hs(out, self.hs_tap_index)
117
+
118
+ ad_dtype = next(self.adapter.parameters()).dtype
119
+ if s_hs.dtype != ad_dtype:
120
+ s_hs = s_hs.to(ad_dtype)
121
+
122
+ adapted = self.adapter(s_hs, attention_mask)
123
+
124
+ if len(hs_list) >= 2:
125
+ hs_list[-2] = adapted
126
+ else:
127
+ hs_list.append(adapted)
128
+
129
+ if not return_dict:
130
+ return (adapted, None, tuple(hs_list), None)
131
+
132
+ return BaseModelOutputWithPast(
133
+ last_hidden_state=adapted,
134
+ past_key_values=None,
135
+ hidden_states=tuple(hs_list),
136
+ attentions=None,
137
+ )
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
transformer/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ZImageTransformer2DModel",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "all_f_patch_size": [
5
+ 1
6
+ ],
7
+ "all_patch_size": [
8
+ 2
9
+ ],
10
+ "axes_dims": [
11
+ 32,
12
+ 48,
13
+ 48
14
+ ],
15
+ "axes_lens": [
16
+ 1536,
17
+ 512,
18
+ 512
19
+ ],
20
+ "cap_feat_dim": 2560,
21
+ "dim": 3840,
22
+ "in_channels": 16,
23
+ "n_heads": 30,
24
+ "n_kv_heads": 30,
25
+ "n_layers": 30,
26
+ "n_refiner_layers": 2,
27
+ "norm_eps": 1e-05,
28
+ "qk_norm": true,
29
+ "rope_theta": 256.0,
30
+ "t_scale": 1000.0
31
+ }
transformer/diffusion_pytorch_model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95facd593e2549e8252acb571c653d57f7ddb7f1060d4e81712f152555a88804
3
+ size 9973693184
transformer/diffusion_pytorch_model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4bbe43ee184a1fb5af4b412d27555f532893bdc3165b1149e304ed82b5d7015
3
+ size 9973714824
transformer/diffusion_pytorch_model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aba4e37a590e63210878160a718d916d80398f4e1f78ab6c9b2b2a00d92769fa
3
+ size 4672282880
transformer/diffusion_pytorch_model.safetensors.index.json ADDED
@@ -0,0 +1,528 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 24619634944
4
+ },
5
+ "weight_map": {
6
+ "all_final_layer.2-1.adaLN_modulation.1.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
7
+ "all_final_layer.2-1.adaLN_modulation.1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
8
+ "all_final_layer.2-1.linear.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
9
+ "all_final_layer.2-1.linear.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
10
+ "all_x_embedder.2-1.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
11
+ "all_x_embedder.2-1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
12
+ "cap_embedder.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
13
+ "cap_embedder.1.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
14
+ "cap_embedder.1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
15
+ "cap_pad_token": "diffusion_pytorch_model-00001-of-00003.safetensors",
16
+ "context_refiner.0.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
17
+ "context_refiner.0.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
18
+ "context_refiner.0.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
19
+ "context_refiner.0.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
20
+ "context_refiner.0.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
21
+ "context_refiner.0.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
22
+ "context_refiner.0.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
23
+ "context_refiner.0.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
24
+ "context_refiner.0.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
25
+ "context_refiner.0.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
26
+ "context_refiner.0.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
27
+ "context_refiner.0.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
28
+ "context_refiner.0.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
29
+ "context_refiner.1.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
30
+ "context_refiner.1.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
31
+ "context_refiner.1.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
32
+ "context_refiner.1.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
33
+ "context_refiner.1.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
34
+ "context_refiner.1.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
35
+ "context_refiner.1.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
36
+ "context_refiner.1.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
37
+ "context_refiner.1.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
38
+ "context_refiner.1.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
39
+ "context_refiner.1.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
40
+ "context_refiner.1.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
41
+ "context_refiner.1.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
42
+ "layers.0.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
43
+ "layers.0.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
44
+ "layers.0.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
45
+ "layers.0.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
46
+ "layers.0.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
47
+ "layers.0.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
48
+ "layers.0.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
49
+ "layers.0.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
50
+ "layers.0.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
51
+ "layers.0.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
52
+ "layers.0.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
53
+ "layers.0.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
54
+ "layers.0.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
55
+ "layers.0.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
56
+ "layers.0.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
57
+ "layers.1.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
58
+ "layers.1.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
59
+ "layers.1.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
60
+ "layers.1.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
61
+ "layers.1.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
62
+ "layers.1.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
63
+ "layers.1.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
64
+ "layers.1.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
65
+ "layers.1.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
66
+ "layers.1.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
67
+ "layers.1.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
68
+ "layers.1.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
69
+ "layers.1.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
70
+ "layers.1.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
71
+ "layers.1.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
72
+ "layers.10.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
73
+ "layers.10.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
74
+ "layers.10.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
75
+ "layers.10.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
76
+ "layers.10.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
77
+ "layers.10.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
78
+ "layers.10.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
79
+ "layers.10.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
80
+ "layers.10.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
81
+ "layers.10.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
82
+ "layers.10.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
83
+ "layers.10.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
84
+ "layers.10.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
85
+ "layers.10.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
86
+ "layers.10.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
87
+ "layers.11.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
88
+ "layers.11.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
89
+ "layers.11.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
90
+ "layers.11.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
91
+ "layers.11.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
92
+ "layers.11.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
93
+ "layers.11.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
94
+ "layers.11.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
95
+ "layers.11.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
96
+ "layers.11.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
97
+ "layers.11.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
98
+ "layers.11.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
99
+ "layers.11.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
100
+ "layers.11.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
101
+ "layers.11.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
102
+ "layers.12.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
103
+ "layers.12.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
104
+ "layers.12.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
105
+ "layers.12.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
106
+ "layers.12.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
107
+ "layers.12.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
108
+ "layers.12.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
109
+ "layers.12.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
110
+ "layers.12.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
111
+ "layers.12.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
112
+ "layers.12.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
113
+ "layers.12.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
114
+ "layers.12.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
115
+ "layers.12.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
116
+ "layers.12.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
117
+ "layers.13.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
118
+ "layers.13.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
119
+ "layers.13.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
120
+ "layers.13.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
121
+ "layers.13.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
122
+ "layers.13.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
123
+ "layers.13.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
124
+ "layers.13.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
125
+ "layers.13.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
126
+ "layers.13.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
127
+ "layers.13.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
128
+ "layers.13.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
129
+ "layers.13.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
130
+ "layers.13.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
131
+ "layers.13.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
132
+ "layers.14.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
133
+ "layers.14.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
134
+ "layers.14.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
135
+ "layers.14.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
136
+ "layers.14.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
137
+ "layers.14.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
138
+ "layers.14.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
139
+ "layers.14.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
140
+ "layers.14.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
141
+ "layers.14.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
142
+ "layers.14.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
143
+ "layers.14.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
144
+ "layers.14.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
145
+ "layers.14.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
146
+ "layers.14.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
147
+ "layers.15.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
148
+ "layers.15.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
149
+ "layers.15.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
150
+ "layers.15.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
151
+ "layers.15.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
152
+ "layers.15.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
153
+ "layers.15.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
154
+ "layers.15.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
155
+ "layers.15.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
156
+ "layers.15.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
157
+ "layers.15.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
158
+ "layers.15.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
159
+ "layers.15.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
160
+ "layers.15.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
161
+ "layers.15.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
162
+ "layers.16.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
163
+ "layers.16.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
164
+ "layers.16.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
165
+ "layers.16.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
166
+ "layers.16.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
167
+ "layers.16.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
168
+ "layers.16.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
169
+ "layers.16.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
170
+ "layers.16.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
171
+ "layers.16.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
172
+ "layers.16.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
173
+ "layers.16.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
174
+ "layers.16.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
175
+ "layers.16.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
176
+ "layers.16.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
177
+ "layers.17.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
178
+ "layers.17.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
179
+ "layers.17.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
180
+ "layers.17.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
181
+ "layers.17.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
182
+ "layers.17.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
183
+ "layers.17.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
184
+ "layers.17.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
185
+ "layers.17.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
186
+ "layers.17.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
187
+ "layers.17.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
188
+ "layers.17.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
189
+ "layers.17.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
190
+ "layers.17.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
191
+ "layers.17.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
192
+ "layers.18.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
193
+ "layers.18.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
194
+ "layers.18.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
195
+ "layers.18.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
196
+ "layers.18.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
197
+ "layers.18.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
198
+ "layers.18.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
199
+ "layers.18.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
200
+ "layers.18.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
201
+ "layers.18.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
202
+ "layers.18.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
203
+ "layers.18.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
204
+ "layers.18.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
205
+ "layers.18.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
206
+ "layers.18.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
207
+ "layers.19.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
208
+ "layers.19.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
209
+ "layers.19.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
210
+ "layers.19.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
211
+ "layers.19.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
212
+ "layers.19.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
213
+ "layers.19.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
214
+ "layers.19.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
215
+ "layers.19.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
216
+ "layers.19.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
217
+ "layers.19.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
218
+ "layers.19.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
219
+ "layers.19.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
220
+ "layers.19.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
221
+ "layers.19.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
222
+ "layers.2.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
223
+ "layers.2.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
224
+ "layers.2.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
225
+ "layers.2.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
226
+ "layers.2.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
227
+ "layers.2.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
228
+ "layers.2.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
229
+ "layers.2.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
230
+ "layers.2.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
231
+ "layers.2.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
232
+ "layers.2.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
233
+ "layers.2.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
234
+ "layers.2.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
235
+ "layers.2.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
236
+ "layers.2.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
237
+ "layers.20.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
238
+ "layers.20.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
239
+ "layers.20.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
240
+ "layers.20.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
241
+ "layers.20.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
242
+ "layers.20.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
243
+ "layers.20.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
244
+ "layers.20.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
245
+ "layers.20.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
246
+ "layers.20.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
247
+ "layers.20.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
248
+ "layers.20.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
249
+ "layers.20.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
250
+ "layers.20.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
251
+ "layers.20.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
252
+ "layers.21.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
253
+ "layers.21.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
254
+ "layers.21.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
255
+ "layers.21.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
256
+ "layers.21.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
257
+ "layers.21.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
258
+ "layers.21.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
259
+ "layers.21.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
260
+ "layers.21.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
261
+ "layers.21.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
262
+ "layers.21.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
263
+ "layers.21.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
264
+ "layers.21.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
265
+ "layers.21.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
266
+ "layers.21.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
267
+ "layers.22.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
268
+ "layers.22.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
269
+ "layers.22.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
270
+ "layers.22.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
271
+ "layers.22.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
272
+ "layers.22.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
273
+ "layers.22.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
274
+ "layers.22.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
275
+ "layers.22.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
276
+ "layers.22.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
277
+ "layers.22.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
278
+ "layers.22.feed_forward.w2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
279
+ "layers.22.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
280
+ "layers.22.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
281
+ "layers.22.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
282
+ "layers.23.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
283
+ "layers.23.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
284
+ "layers.23.attention.norm_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
285
+ "layers.23.attention.norm_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
286
+ "layers.23.attention.to_k.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
287
+ "layers.23.attention.to_out.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
288
+ "layers.23.attention.to_q.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
289
+ "layers.23.attention.to_v.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
290
+ "layers.23.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
291
+ "layers.23.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
292
+ "layers.23.feed_forward.w1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
293
+ "layers.23.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
294
+ "layers.23.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
295
+ "layers.23.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
296
+ "layers.23.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
297
+ "layers.24.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
298
+ "layers.24.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
299
+ "layers.24.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
300
+ "layers.24.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
301
+ "layers.24.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
302
+ "layers.24.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
303
+ "layers.24.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
304
+ "layers.24.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
305
+ "layers.24.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
306
+ "layers.24.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
307
+ "layers.24.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
308
+ "layers.24.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
309
+ "layers.24.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
310
+ "layers.24.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
311
+ "layers.24.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
312
+ "layers.25.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
313
+ "layers.25.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
314
+ "layers.25.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
315
+ "layers.25.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
316
+ "layers.25.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
317
+ "layers.25.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
318
+ "layers.25.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
319
+ "layers.25.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
320
+ "layers.25.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
321
+ "layers.25.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
322
+ "layers.25.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
323
+ "layers.25.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
324
+ "layers.25.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
325
+ "layers.25.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
326
+ "layers.25.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
327
+ "layers.26.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
328
+ "layers.26.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
329
+ "layers.26.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
330
+ "layers.26.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
331
+ "layers.26.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
332
+ "layers.26.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
333
+ "layers.26.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
334
+ "layers.26.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
335
+ "layers.26.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
336
+ "layers.26.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
337
+ "layers.26.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
338
+ "layers.26.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
339
+ "layers.26.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
340
+ "layers.26.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
341
+ "layers.26.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
342
+ "layers.27.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
343
+ "layers.27.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
344
+ "layers.27.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
345
+ "layers.27.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
346
+ "layers.27.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
347
+ "layers.27.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
348
+ "layers.27.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
349
+ "layers.27.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
350
+ "layers.27.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
351
+ "layers.27.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
352
+ "layers.27.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
353
+ "layers.27.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
354
+ "layers.27.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
355
+ "layers.27.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
356
+ "layers.27.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
357
+ "layers.28.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
358
+ "layers.28.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
359
+ "layers.28.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
360
+ "layers.28.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
361
+ "layers.28.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
362
+ "layers.28.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
363
+ "layers.28.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
364
+ "layers.28.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
365
+ "layers.28.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
366
+ "layers.28.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
367
+ "layers.28.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
368
+ "layers.28.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
369
+ "layers.28.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
370
+ "layers.28.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
371
+ "layers.28.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
372
+ "layers.29.adaLN_modulation.0.bias": "diffusion_pytorch_model-00003-of-00003.safetensors",
373
+ "layers.29.adaLN_modulation.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
374
+ "layers.29.attention.norm_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
375
+ "layers.29.attention.norm_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
376
+ "layers.29.attention.to_k.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
377
+ "layers.29.attention.to_out.0.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
378
+ "layers.29.attention.to_q.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
379
+ "layers.29.attention.to_v.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
380
+ "layers.29.attention_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
381
+ "layers.29.attention_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
382
+ "layers.29.feed_forward.w1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
383
+ "layers.29.feed_forward.w2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
384
+ "layers.29.feed_forward.w3.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
385
+ "layers.29.ffn_norm1.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
386
+ "layers.29.ffn_norm2.weight": "diffusion_pytorch_model-00003-of-00003.safetensors",
387
+ "layers.3.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
388
+ "layers.3.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
389
+ "layers.3.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
390
+ "layers.3.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
391
+ "layers.3.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
392
+ "layers.3.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
393
+ "layers.3.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
394
+ "layers.3.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
395
+ "layers.3.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
396
+ "layers.3.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
397
+ "layers.3.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
398
+ "layers.3.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
399
+ "layers.3.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
400
+ "layers.3.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
401
+ "layers.3.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
402
+ "layers.4.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
403
+ "layers.4.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
404
+ "layers.4.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
405
+ "layers.4.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
406
+ "layers.4.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
407
+ "layers.4.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
408
+ "layers.4.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
409
+ "layers.4.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
410
+ "layers.4.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
411
+ "layers.4.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
412
+ "layers.4.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
413
+ "layers.4.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
414
+ "layers.4.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
415
+ "layers.4.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
416
+ "layers.4.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
417
+ "layers.5.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
418
+ "layers.5.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
419
+ "layers.5.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
420
+ "layers.5.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
421
+ "layers.5.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
422
+ "layers.5.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
423
+ "layers.5.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
424
+ "layers.5.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
425
+ "layers.5.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
426
+ "layers.5.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
427
+ "layers.5.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
428
+ "layers.5.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
429
+ "layers.5.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
430
+ "layers.5.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
431
+ "layers.5.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
432
+ "layers.6.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
433
+ "layers.6.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
434
+ "layers.6.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
435
+ "layers.6.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
436
+ "layers.6.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
437
+ "layers.6.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
438
+ "layers.6.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
439
+ "layers.6.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
440
+ "layers.6.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
441
+ "layers.6.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
442
+ "layers.6.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
443
+ "layers.6.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
444
+ "layers.6.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
445
+ "layers.6.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
446
+ "layers.6.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
447
+ "layers.7.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
448
+ "layers.7.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
449
+ "layers.7.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
450
+ "layers.7.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
451
+ "layers.7.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
452
+ "layers.7.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
453
+ "layers.7.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
454
+ "layers.7.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
455
+ "layers.7.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
456
+ "layers.7.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
457
+ "layers.7.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
458
+ "layers.7.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
459
+ "layers.7.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
460
+ "layers.7.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
461
+ "layers.7.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
462
+ "layers.8.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
463
+ "layers.8.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
464
+ "layers.8.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
465
+ "layers.8.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
466
+ "layers.8.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
467
+ "layers.8.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
468
+ "layers.8.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
469
+ "layers.8.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
470
+ "layers.8.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
471
+ "layers.8.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
472
+ "layers.8.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
473
+ "layers.8.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
474
+ "layers.8.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
475
+ "layers.8.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
476
+ "layers.8.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
477
+ "layers.9.adaLN_modulation.0.bias": "diffusion_pytorch_model-00002-of-00003.safetensors",
478
+ "layers.9.adaLN_modulation.0.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
479
+ "layers.9.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
480
+ "layers.9.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
481
+ "layers.9.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
482
+ "layers.9.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
483
+ "layers.9.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
484
+ "layers.9.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
485
+ "layers.9.attention_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
486
+ "layers.9.attention_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
487
+ "layers.9.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
488
+ "layers.9.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
489
+ "layers.9.feed_forward.w3.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
490
+ "layers.9.ffn_norm1.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
491
+ "layers.9.ffn_norm2.weight": "diffusion_pytorch_model-00002-of-00003.safetensors",
492
+ "noise_refiner.0.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
493
+ "noise_refiner.0.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
494
+ "noise_refiner.0.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
495
+ "noise_refiner.0.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
496
+ "noise_refiner.0.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
497
+ "noise_refiner.0.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
498
+ "noise_refiner.0.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
499
+ "noise_refiner.0.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
500
+ "noise_refiner.0.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
501
+ "noise_refiner.0.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
502
+ "noise_refiner.0.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
503
+ "noise_refiner.0.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
504
+ "noise_refiner.0.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
505
+ "noise_refiner.0.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
506
+ "noise_refiner.0.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
507
+ "noise_refiner.1.adaLN_modulation.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
508
+ "noise_refiner.1.adaLN_modulation.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
509
+ "noise_refiner.1.attention.norm_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
510
+ "noise_refiner.1.attention.norm_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
511
+ "noise_refiner.1.attention.to_k.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
512
+ "noise_refiner.1.attention.to_out.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
513
+ "noise_refiner.1.attention.to_q.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
514
+ "noise_refiner.1.attention.to_v.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
515
+ "noise_refiner.1.attention_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
516
+ "noise_refiner.1.attention_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
517
+ "noise_refiner.1.feed_forward.w1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
518
+ "noise_refiner.1.feed_forward.w2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
519
+ "noise_refiner.1.feed_forward.w3.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
520
+ "noise_refiner.1.ffn_norm1.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
521
+ "noise_refiner.1.ffn_norm2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
522
+ "t_embedder.mlp.0.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
523
+ "t_embedder.mlp.0.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
524
+ "t_embedder.mlp.2.bias": "diffusion_pytorch_model-00001-of-00003.safetensors",
525
+ "t_embedder.mlp.2.weight": "diffusion_pytorch_model-00001-of-00003.safetensors",
526
+ "x_pad_token": "diffusion_pytorch_model-00001-of-00003.safetensors"
527
+ }
528
+ }
vae/config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "_name_or_path": "flux-dev",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": true,
19
+ "in_channels": 3,
20
+ "latent_channels": 16,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "mid_block_add_attention": true,
25
+ "norm_num_groups": 32,
26
+ "out_channels": 3,
27
+ "sample_size": 1024,
28
+ "scaling_factor": 0.3611,
29
+ "shift_factor": 0.1159,
30
+ "up_block_types": [
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D",
33
+ "UpDecoderBlock2D",
34
+ "UpDecoderBlock2D"
35
+ ],
36
+ "use_post_quant_conv": false,
37
+ "use_quant_conv": false
38
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5b59a26851551b67ae1fe58d32e76486e1e812def4696a4bea97f16604d40a3
3
+ size 167666902