data-archetype
/

mdiffae-v1

@@ -11,6 +11,13 @@ library_name: mdiffae
 # mdiffae_v1
 **mDiffAE** — **M**asked **Diff**usion **A**uto**E**ncoder.
 A fast, single-GPU-trainable diffusion autoencoder with a **64-channel**
 spatial bottleneck. Uses decoder token masking as an implicit regularizer

 # mdiffae_v1
+> **[mDiffAE v2](https://huggingface.co/data-archetype/mdiffae-v2) is now available and is the recommended version.** It offers substantially better reconstruction (+1.7 dB mean PSNR) with the same or better downstream convergence.
+>
+> | Version | Mean PSNR (2k images) | Bottleneck | Decoder |
+> |---|---|---|---|
+> | [**mDiffAE v2**](https://huggingface.co/data-archetype/mdiffae-v2) (recommended) | **35.81 dB** | 96ch (8x) | 8 blocks (skip-concat) |
+> | mDiffAE v1 (this repo) | 34.15 dB | 64ch (12x) | 4 blocks (flat) |
 **mDiffAE** — **M**asked **Diff**usion **A**uto**E**ncoder.
 A fast, single-GPU-trainable diffusion autoencoder with a **64-channel**
 spatial bottleneck. Uses decoder token masking as an implicit regularizer