Kudos

#3
by mcfadyeni - opened

πŸ™‡β€β™‚οΈ Just wanted to say thank you for the model, but also for the nice writeup on training! The part on latent degradation is especially interesting.

What is the second image in the README showing? Is that a timeline of decoded latents samples over the training run?

That's a PCA vis of latents with increasing sdedit (img2img) degradation strength from left to right, showing how diffusion output latents are blurry/mushy compared to perfect encoded latents (left).

Great work.
I am trying to recreate this on Wan2.2-VAE with image. And i wonder if you have any intuition if it would work the same? since last layer of WAN2.2-VAE is already using pixel shuffle. Also, would you be willing to share the training recipe in order for me to try it on Wan2.2-VAE?

@Urig You could try increasing the patch size from 2 to 4 on the decoder, but I didn't get good results from 4x pixelshuffle on the 2.1 vae in my testing.

Re: training code, I'll share it once I'm done with the video version. I think I put enough details in the readme already that someone who is comfortable with writing vae or image upscale training code could replicate it.

Sign up or log in to comment