--- license: apache-2.0 pipeline_tag: text-to-image --- # MVSplit-DiT (1000 layers) This repository contains the weights for the 1000-layer Diffusion Transformer (DiT) presented in the paper [Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). [Project Page](https://erwold.github.io/mv-split/) | [GitHub Repository](https://github.com/erwold/mv-split) ## Introduction Scaling Diffusion Transformers to extreme depths (hundreds or thousands of layers) introduces a structural vulnerability known as **Mean Mode Screaming (MMS)**. In this state, token representations homogenize, and centered variation is suppressed, leading to model collapse. MVSplit-DiT addresses this by using **Mean-Variance Split (MV-Split) Residuals**, which combine a separately gained centered residual update with a leaky trunk-mean replacement. This architecture enables the stable training of DiTs at boundary scales, such as the 1000-layer model provided here. ## Usage To use this model for image generation, please refer to the official [GitHub repository](https://github.com/erwold/mv-split) for installation instructions and requirements. ### Sampling You can generate images using the `sample.py` script. The model requires the DiT checkpoint from this repo, a FLUX.2 VAE, and a Qwen3 text encoder. ```bash # Custom prompt example python sample.py \ --checkpoint_path /path/to/model.pt \ --flux_vae_path /path/to/flux2_ae.safetensors \ --qwen_model_path Qwen/Qwen3-0.6B \ --prompt "a red panda climbing a bamboo stalk" \ --output_dir ./samples ``` ### Key sampling flags | Flag | Default | Meaning | |---|---|---| | `--image_size` | 256 | Square output side in pixels. | | `--num_inference_steps` | 35 | Euler steps for the flow-matching ODE. | | `--cfg_scale` | 2.0 | Classifier-free guidance. | | `--time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training). | ## Citation ```bibtex @article{lu2026mms, title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers}, author = {Lu, Pengqi}, journal = {arXiv preprint arXiv:2605.06169}, year = {2026}, } ```