Kudos

by mcfadyeni - opened Nov 4, 2025

Nov 4, 2025

•

edited Nov 4, 2025

🙇‍♂️ Just wanted to say thank you for the model, but also for the nice writeup on training! The part on latent degradation is especially interesting.

What is the second image in the README showing? Is that a timeline of decoded latents samples over the training run?

spacepxl

Owner Nov 4, 2025

That's a PCA vis of latents with increasing sdedit (img2img) degradation strength from left to right, showing how diffusion output latents are blurry/mushy compared to perfect encoded latents (left).

Urig

Nov 5, 2025

Great work.
I am trying to recreate this on Wan2.2-VAE with image. And i wonder if you have any intuition if it would work the same? since last layer of WAN2.2-VAE is already using pixel shuffle. Also, would you be willing to share the training recipe in order for me to try it on Wan2.2-VAE?

spacepxl

Owner Nov 5, 2025

@Urig You could try increasing the patch size from 2 to 4 on the decoder, but I didn't get good results from 4x pixelshuffle on the 2.1 vae in my testing.

Re: training code, I'll share it once I'm done with the video version. I think I put enough details in the readme already that someone who is comfortable with writing vae or image upscale training code could replicate it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment