File size: 16,740 Bytes
1536e1d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 | ---
license: apache-2.0
language:
- en
tags:
- text-to-image
- image-generation
- diffusion
- anime
- z-image
- z-anime
- comfyui
- gguf
- fp8
- bf16
- aio
pipeline_tag: text-to-image
library_name: diffusers
base_model:
- Tongyi-MAI/Z-Image
base_model_relation: finetune
---
# π Z-Anime | Full Anime Fine-Tune on Z-Image Base
<div align="center">
<img src="images/cover.png" width="380" alt="Z-Anime" />
</div>
<div align="center">
**Full Fine-Tune β’ Rich Aesthetics β’ Strong Diversity β’ Full Negative Prompt Support**
**BF16 & FP8 & GGUF & AIO β’ Natural Language Prompts β’ 8GB VRAM**
</div>
---
## πΌοΈ Preview Gallery
<table>
<tr>
<td><img src="images/1.png" alt="Z-Anime preview 1" /></td>
<td><img src="images/2.png" alt="Z-Anime preview 2" /></td>
<td><img src="images/3.png" alt="Z-Anime preview 3" /></td>
</tr>
<tr>
<td><img src="images/4.png" alt="Z-Anime preview 4" /></td>
<td><img src="images/5.png" alt="Z-Anime preview 5" /></td>
<td><img src="images/6.png" alt="Z-Anime preview 6" /></td>
</tr>
<tr>
<td><img src="images/7.png" alt="Z-Anime preview 7" /></td>
<td><img src="images/8.png" alt="Z-Anime preview 8" /></td>
<td><img src="images/9.png" alt="Z-Anime preview 9" /></td>
</tr>
</table>
---
## β¨ What is Z-Anime?
**Z-Anime** is a full fine-tune of Alibaba's **Z-Image Base** architecture β **not a LoRA merge**, but a fully trained anime-focused model family built from the ground up.
Built on the **S3-DiT (Single-Stream Diffusion Transformer, 6B parameters)**, Z-Anime inherits the strong foundation of Z-Image Base: rich diversity, strong controllability, full negative prompt support, and a high ceiling for fine-tuning β now adapted for anime-style generation.
This repository contains the full **Z-Anime family**:
| Variant | Focus | Best For |
|---|---|---|
| π **Z-Anime Base** | Highest quality | Final renders, full control |
| β‘ **Z-Anime Distill-8-Step** | Speed + quality balance | Everyday generation |
| π **Z-Anime Distill-4-Step** | Maximum speed | Fast iteration, batches |
| π¦ **GGUF Variants** | Lower memory usage | Low VRAM / CPU / AMD-friendly workflows |
| π¦ **AIO Variants** | Single-file convenience | Easy ComfyUI setup |
| π **Diffusers Folder** | `from_pretrained()` ready | Python pipelines, further fine-tuning |
---
## π― Key Features
- β
Full fine-tune on Z-Image Base β **not** a LoRA merge
- β
Rich anime aesthetics with strong style diversity
- β
Natural language prompting β works best with descriptive prompts, not tag lists
- β
High diversity across characters, poses, compositions, and layouts
- β
LoRA training ready β strong base for further fine-tuning
- β
Partially NSFW capable
- β
8GB VRAM compatible
- β
GGUF variants available
- β
AIO variants available (Base, 4-Step, 8-Step)
---
## πΊοΈ Z-Anime Roadmap
### β
Released
#### π Z-Anime Base
Full fine-tune on Z-Image Base β **BF16 & FP8**
#### β‘ Z-Anime Distill-8-Step
**BF16 & FP8** β fast anime generation in **8 steps**, **CFG 1.0**
#### π Z-Anime Distill-4-Step
**BF16 & FP8** β ultra-fast anime generation in **4 steps**, **CFG 1.0**
#### π¦ GGUF Variants
Available for **low VRAM**, **CPU inference**, and **AMD-friendly** workflows.
- **Z-Anime-Base-Q8_0** β Q8_0 quantization (**~6.73 GB**)
- **Z-Anime-Base-Q4_K_S** β Q4_K_S quantization (**~4.2 GB**)
#### π¦ AIO Variants
All-in-one checkpoints with **image model + VAE + Text Encoder integrated** in a single file.
Available for **Base**, **Distill-4-Step** and **Distill-8-Step** β each in **BF16 & FP8**.
#### π§© VAE & Text Encoder
The required **VAE** (`ae.safetensors`) and **Text Encoder** (`qwen_3_4b.safetensors`) are also included in this repository for users running the standard (non-AIO) variants.
#### π Diffusers Folder
The full **Diffusers-format folder** (`diffusers/`) is included β drop-in compatible with `ZImagePipeline.from_pretrained()` for Python users who want to run inference outside ComfyUI or use Z-Anime as a starting point for further fine-tuning.
More updates coming β follow to stay notified! π
---
## π¦ Versions Overview
### π’ BF16 (~12GB)
Maximum precision. **BFloat16** format with minimal quality compromise. Best for final renders, careful work, and LoRA training.
### π‘ FP8 (~6GB)
Recommended for most users. Smaller files, faster downloads, and excellent quality with only minor tradeoffs compared to BF16.
### π΅ GGUF
Optimized for lightweight inference setups, especially useful for low VRAM, CPU inference, or alternative backends.
### π£ AIO
All-in-one checkpoints with **image model + Text Encoder + VAE integrated** into a single file for the easiest setup. Available for Base, Distill-4-Step and Distill-8-Step.
---
## π Z-Anime Base
The foundation of the Z-Anime family.
A full fine-tune with the **highest quality ceiling**, the **widest creative range**, and **full negative prompt support**.
### Recommended Settings
```yaml
steps: 28-50
cfg: 3.0-5.0 # up to 9.0 possible
sampler: euler_ancestral
scheduler: beta
negative_prompt: strongly recommended
```
### CFG Guide
- **3.0β5.0** β sweet spot for balanced quality and creativity
- **5.0β7.0** β tighter prompt adherence
- **7.0β9.0** β maximum control, but watch for oversaturation
- **Above 9.0** β not recommended
Negative prompts have **full effect** on Z-Anime Base and are highly recommended.
---
## β‘ Z-Anime Distill-8-Step
The sweet spot of the family.
Distilled from Z-Anime Base, this version delivers strong anime results in just **8 steps** while keeping most of the quality.
### Recommended Settings
```yaml
steps: 8
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
```
### CFG Guide
- Best at **CFG 1.0**
- Small increases to **1.3β1.5** are possible
- Do **not** go above **1.5** β artifacts may appear
Negative prompts have only **limited effect** at this distillation level. If your workflow includes **ConditioningZeroOut**, prefer that instead of a large negative prompt.
---
## π Z-Anime Distill-4-Step
The fastest Z-Anime variant.
Built for **maximum throughput** β ideal for rapid prototyping, quick batch generation, and speed-focused workflows.
### Recommended Settings
```yaml
steps: 4
cfg: 1.0 # max ~1.5
sampler: euler_ancestral
scheduler: beta
negative_prompt: limited effect
```
### Tips for 4-Step
- Stay at **CFG 1.0** for the most stable results
- Put the most important visual details **early** in the prompt
- An optional upscaler such as hires fix or SeedVR2 can help recover fine detail
---
## π Resolution Guide
| Use Case | Resolution |
|---|---|
| Portrait / character art | **832 Γ 1216** |
| Landscape / scenes / backgrounds | **1216 Γ 832** |
| Square / general purpose | **1024 Γ 1024** |
| Tall / full body / wallpaper | **768 Γ 1344** |
| Cinematic / wide scenes | **1920 Γ 1088** |
| Detailed portraits | **1024 Γ 1536** |
**Supported range:** approximately **512 Γ 512 to 2048 Γ 2048**, any aspect ratio.
All main variants are designed to run on **8GB VRAM**.
---
## π‘ Prompting Guide
**Natural language works best β not tag lists.**
### β
Good
```text
A young anime girl with long silver hair and golden eyes, wearing a traditional shrine maiden outfit with white haori and red hakama. She stands in a sunlit bamboo forest, cherry blossoms falling softly around her. Warm afternoon light filtering through the trees, detailed fabric shading, expressive face, calm serene expression, high quality anime illustration with fine line work.
```
### β Avoid
```text
anime girl, silver hair, shrine maiden, bamboo, cherry blossom, warm light
```
### Character Portraits
```text
Detailed anime portrait of [character], soft rim lighting, expressive eyes with detailed reflections, fine hair strands, clean linework, professional anime illustration quality.
```
### Action Scenes
```text
Dynamic anime [scene], dramatic angle, motion energy, speed lines, particle effects, cinematic composition, detailed shading, high quality anime art.
```
### Backgrounds & Landscapes
```text
Anime [location] at [time of day], [lighting], [atmosphere], beautiful background art, wallpaper quality, highly detailed environment.
```
---
## π§ Installation
### Step 1 β Download the version you want
Choose between:
- **Standard / Distill models** in **BF16** or **FP8** (+ VAE + Text Encoder)
- **GGUF variants** for low VRAM / CPU / AMD-friendly inference (+ VAE + Text Encoder)
- **AIO variants** for single-file convenience (no extra VAE / Text Encoder needed)
### Step 2 β Place the files
#### Standard BF16 / FP8 models
```text
ComfyUI/models/diffusion_models/
βββ z-anime-base-bf16.safetensors
βββ z-anime-base-fp8.safetensors
βββ z-anime-distill-8step-bf16.safetensors
βββ z-anime-distill-8step-fp8.safetensors
βββ z-anime-distill-4step-bf16.safetensors
βββ z-anime-distill-4step-fp8.safetensors
```
#### GGUF variants
```text
ComfyUI/models/unet/
βββ z-anime-base-q8_0.gguf
βββ z-anime-base-q4_k_s.gguf
```
#### Text Encoder
Two text encoders are included β pick **one**:
```text
ComfyUI/models/clip/
βββ qwen_3_4b-bf16.safetensors # default (Z-Image standard, BF16)
or
βββ qwen_3_4b-fp8.safetensors # default (Z-Image standard, FP8)
or
βββ qwen_3_4b-engineer-v4-bf16.safetensors # alternative (Engineer V4, BF16)
or
βββ qwen_3_4b-engineer-v4-fp8.safetensors # alternative (Engineer V4, FP8)
```
- **Default (`qwen_3_4b-*`)** β the standard Z-Image text encoder, repackaged as a single `.safetensors` file (BF16 + FP8). This is what the model was trained against.
- **Engineer V4 (`qwen_3_4b-engineer-v4-*`)** β an alternative full fine-tune of the Z-Image text encoder by **BennyDaBall**, drop-in compatible. Often produces more varied outputs from the same seed. See *Credits* below for the original repo.
#### VAE
```text
ComfyUI/models/vae/
βββ ae.safetensors
```
#### AIO variants
For the AIO versions, you only need the single checkpoint file β no extra VAE or Text Encoder required:
```text
ComfyUI/models/checkpoints/
βββ z-anime-base-aio-bf16.safetensors
βββ z-anime-base-aio-fp8.safetensors
βββ z-anime-distill-8step-aio-bf16.safetensors
βββ z-anime-distill-8step-aio-fp8.safetensors
βββ z-anime-distill-4step-aio-bf16.safetensors
βββ z-anime-distill-4step-aio-fp8.safetensors
```
### Step 3 β Load in ComfyUI
#### For standard BF16 / FP8 versions
Use:
- **Load Diffusion Model** for the model file
- **CLIP Loader** for the text encoder
- **VAE Loader** for the VAE
#### For GGUF versions
- Load the **GGUF model from the `models/unet/` folder**
- Use the same **CLIP** and **VAE** files as above
#### For AIO versions
Use a standard **Checkpoint Loader** β no extra CLIP or VAE loading required.
---
## π¦ Custom Nodes
- **rgthree-comfy**
- **ComfyUI-Lora-Manager**
- **ComfyUI-GGUF** *(only for the GGUF variants)*
- **ComfyUI-SeedVR2_VideoUpscaler** *(optional, only for SeedVR2 upscale)*
---
## π Using the Diffusers Folder
For Python users, the full Diffusers-format folder is included under `diffusers/` and can be loaded directly with the `subfolder` argument:
```python
import torch
from diffusers import ZImagePipeline
pipe = ZImagePipeline.from_pretrained(
"SeeSee21/Z-Anime",
subfolder="diffusers",
torch_dtype=torch.bfloat16,
).to("cuda")
image = pipe(
prompt="A young anime girl with long silver hair and golden eyes, "
"shrine maiden outfit, sunlit bamboo forest, cherry blossoms, "
"professional anime illustration, fine line work.",
num_inference_steps=40,
guidance_scale=4.0,
).images[0]
image.save("z-anime-output.png")
```
This format is also a clean starting point for further fine-tuning (LoRA or full fine-tune) with frameworks like **OneTrainer**, **diffusers**, or **kohya-ss**.
---
## π§© Official Workflow
<div align="center">
<img src="images/workflow-cover.png" width="380" alt="Z-Anime Workflow" />
</div>
A ready-to-use ComfyUI workflow that supports **all variants** (Base / Distill-8 / Distill-4, BF16 / FP8 / GGUF / AIO) is included in [`workflows/Z-Anime-Workflow-v1.json`](workflows/Z-Anime-Workflow-v1.json).
It includes:
- π¦ Model switch (Diffusion / GGUF / AIO loaders β toggle one at a time)
- π Optional LoRA loader
- βοΈ Positive + Negative prompt nodes (with default anime negative)
- π Resolution presets
- π¨ Generate + πΌ Optional 1.5Γ upscale with side-by-side compare
- π Built-in MarkdownNote guide with settings per variant
<div align="center">
<img src="images/workflow-overview.png" alt="Z-Anime Workflow overview" />
</div>
---
## π Repository Structure
```text
Z-Anime/
βββ README.md
βββ config.json
β
βββ diffusion_models/
β βββ z-anime-base-bf16.safetensors
β βββ z-anime-base-fp8.safetensors
β βββ z-anime-distill-8step-bf16.safetensors
β βββ z-anime-distill-8step-fp8.safetensors
β βββ z-anime-distill-4step-bf16.safetensors
β βββ z-anime-distill-4step-fp8.safetensors
β
βββ gguf/
β βββ z-anime-base-q8_0.gguf
β βββ z-anime-base-q4_k_s.gguf
β
βββ aio/
β βββ z-anime-base-aio-bf16.safetensors
β βββ z-anime-base-aio-fp8.safetensors
β βββ z-anime-distill-8step-aio-bf16.safetensors
β βββ z-anime-distill-8step-aio-fp8.safetensors
β βββ z-anime-distill-4step-aio-bf16.safetensors
β βββ z-anime-distill-4step-aio-fp8.safetensors
β
βββ text_encoder/
β βββ qwen_3_4b-bf16.safetensors # default
β βββ qwen_3_4b-fp8.safetensors # default
β βββ qwen_3_4b-engineer-v4-bf16.safetensors # alternative (BennyDaBall)
β βββ qwen_3_4b-engineer-v4-fp8.safetensors # alternative (BennyDaBall)
β
βββ vae/
β βββ ae.safetensors
β
βββ diffusers/
β βββ model_index.json
β βββ scheduler/
β βββ tokenizer/
β βββ text_encoder/
β βββ transformer/ (sharded safetensors + index)
β βββ vae/
β
βββ images/
β βββ cover.png
β βββ workflow-cover.png
β βββ workflow-overview.png
β βββ 1.png
β βββ 2.png
β βββ 3.png
β βββ 4.png
β βββ 5.png
β βββ 6.png
β βββ 7.png
β βββ 8.png
β βββ 9.png
βββ workflows/
βββ Z-Anime-Workflow-v1.json
```
---
## π Version History
### v1.0 β Initial Release
- **Z-Anime Base** released in **BF16 & FP8**
- **Z-Anime Distill-8-Step** released in **BF16 & FP8**
- **Z-Anime Distill-4-Step** released in **BF16 & FP8**
- **GGUF variants added**
- **Z-Anime-Base-Q8_0** β Q8_0 quantization (~6.73 GB)
- **Z-Anime-Base-Q4_K_S** β Q4_K_S quantization (~4.2 GB)
- **AIO variants added** β Base, Distill-4-Step and Distill-8-Step (each in BF16 & FP8)
- **VAE** (`ae.safetensors`) and **Text Encoder** (`qwen_3_4b.safetensors`) included
- Optimized for **euler_ancestral**, **euler + beta**, and simple practical use across the family
---
## π Links
- **CivitAI Page:** [civitai.red/models/2483351](https://civitai.red/models/2483351)
- **Base Model:** [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image)
- **Author:** [SeeSee21 on Hugging Face](https://huggingface.co/SeeSee21)
---
## π Credits
- **Base Architecture:** Tongyi Lab (Alibaba) β Z-Image
- **Fine-Tune:** SeeSee21
- **License:** Apache 2.0
- **Architecture:** S3-DiT (Single-Stream Diffusion Transformer, 6B parameters)
- **Base Model:** [`Tongyi-MAI/Z-Image`](https://huggingface.co/Tongyi-MAI/Z-Image)
- **Engineer V4 Text Encoder:** [`BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4`](https://huggingface.co/BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4) β full fine-tune with SMART training, included as alternative text encoder
---
## β€οΈ Notes
Z-Anime is an experimental anime-focused model family built to explore what a full fine-tune on Z-Image Base can achieve in this space.
It is already strong for anime aesthetics, character work, and fast iteration, and future versions will continue to improve diversity, character handling, prompting flexibility, and overall quality.
**Z-Anime β anime at its finest, powered by Z-Image Base. π**
|