Spaces:
Running on Zero
Running on Zero
fix(controlnet): crop control image to multiple-of-16 dims
Browse filesSame modulus-of-16 bug pattern as the upscale fix in 9514256, hit on the
ControlNet path instead. With an input image whose dims aren't already
multiples of 16, DiffSynth's VAE encode rounds the latent shape *down*
while the noise allocator for inpaint_mask rounds *up*, so the
torch.concat([control_latents, inpaint_mask, inpaint_latent], dim=1) in
the pipeline raises a shape mismatch (e.g. 45 vs 46 on dim 3).
Crop the control image (or fallback raw input) to (W//16*16, H//16*16)
before handing it to the pipe, mirroring the upscale fix.
Verified live with cn-input.webp (618x367) + Toon5_E10 LoRA on Z-Image
Turbo locally.
modes.py
CHANGED
|
@@ -111,6 +111,14 @@ def call_controlnet(pipe: Any, params: dict[str, Any]) -> tuple[Image.Image, dic
|
|
| 111 |
)
|
| 112 |
control_image = input_image
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
_swap_transformer(pipe, "Turbo")
|
| 115 |
|
| 116 |
cn_input = ControlNetInput(image=control_image, scale=float(params.get("controlnet_scale", 1.0)))
|
|
|
|
| 111 |
)
|
| 112 |
control_image = input_image
|
| 113 |
|
| 114 |
+
# Same modulus-of-16 dance as call_upscale: DiffSynth's VAE encode rounds *down*
|
| 115 |
+
# for control_latents while the noise allocator rounds *up* for inpaint_mask, so
|
| 116 |
+
# an unaligned image makes torch.concat on control_context raise.
|
| 117 |
+
w, h = control_image.size
|
| 118 |
+
aligned_w, aligned_h = (w // 16) * 16, (h // 16) * 16
|
| 119 |
+
if (aligned_w, aligned_h) != (w, h):
|
| 120 |
+
control_image = control_image.crop((0, 0, aligned_w, aligned_h))
|
| 121 |
+
|
| 122 |
_swap_transformer(pipe, "Turbo")
|
| 123 |
|
| 124 |
cn_input = ControlNetInput(image=control_image, scale=float(params.get("controlnet_scale", 1.0)))
|