# πŸ§ͺ LiquidGen: Liquid Neural Network Image Generator **A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.** LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β€” making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4). ## πŸ—οΈ Architecture ``` Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Predicted Velocity β†’ Euler ODE β†’ Clean Latent β†’ VAE Decoder β†’ Output Image ``` ### Key Components | Component | What it does | Replaces | |-----------|-------------|----------| | **LiquidTimeConstant** | `Ξ±Β·x + (1-Ξ±)Β·stimulus` with learnable decay Ξ± = exp(-softplus(ρ)) | Residual connections | | **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) | | **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) | | **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT | | **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual | ### Core Innovation: Liquid Time Constants From the CfC paper (Hasani et al., Nature Machine Intelligence 2022): ``` x_{t+1} = exp(-Ξ”t/Ο„_t) Β· x_t + (1 - exp(-Ξ”t/Ο„_t)) Β· h(x_t, u_t) ``` Our parallelizable version: ```python Ξ± = exp(-softplus(ρ)) # Per-channel learnable retention output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation ``` **No sequential ODE solving.** No attention. Fully parallelizable. ## πŸ“Š Model Sizes | Model | Params | VRAM (train) | Best For | |-------|--------|-------------|----------| | **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments | | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced | | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality | All models fit comfortably in **16GB VRAM** (Colab free tier T4 GPU). ## πŸš€ Quick Start ### Using the Colab Notebook Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab and follow the steps. It includes: - Complete model code (no external dependencies beyond PyTorch + diffusers) - Configurable training on WikiArt dataset (artistic paintings) - Support for 256px and 512px generation - Class-conditional generation (27 art styles) - Loss plotting and sample visualization ### Using the Python Scripts ```python from model import liquidgen_base import torch # Create model model = liquidgen_base(num_classes=27).cuda() print(f"Parameters: {model.count_params()/1e6:.1f}M") # Forward pass (predict velocity for flow matching) x = torch.randn(4, 16, 32, 32).cuda() # 256px latent t = torch.rand(4).cuda() # Timesteps labels = torch.randint(0, 27, (4,)).cuda() v = model(x, t, labels) # Predicted velocity ``` ## πŸ”§ Training ### Default Configuration ```python from train import TrainConfig, train config = TrainConfig( model_size="base", # "small", "base", or "large" image_size=256, # 256 or 512 dataset_name="huggan/wikiart", label_column="style", # 27 art styles num_classes=27, batch_size=8, gradient_accumulation_steps=4, learning_rate=1e-4, num_epochs=50, ) train(config) ``` ### Training Details - **VAE**: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0) - **Objective**: Flow matching (velocity prediction) β€” `v = noise - x_0` - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01) - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper) - **EMA**: 0.9999 decay - **Sampling**: Euler ODE, 50 steps, classifier-free guidance ## πŸ“ Files ``` β”œβ”€β”€ model.py # Complete LiquidGen model architecture β”œβ”€β”€ train.py # Training pipeline with FlowMatching + EMA β”œβ”€β”€ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook └── README.md # This file ``` ## πŸ”¬ Research Background This architecture synthesizes ideas from multiple research lineages: ### Liquid Neural Networks - **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β€” ODE-based neurons with input-dependent Ο„ - **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β€” Analytical solution eliminating ODE solvers - **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) β€” Sparse wiring: sensoryβ†’interβ†’commandβ†’motor ### Attention-Free Image Generation - **ZigMa** (ECCV 2024) β€” Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256) - **DiMSUM** (NeurIPS 2024) β€” Spatial-frequency Mamba (FID 2.11 ImageNet 256) - **DiffuSSM** (2023) β€” First attention-free diffusion model (FID 2.28 ImageNet 256) - **DiM** (2024) β€” Multi-directional Mamba with padding tokens ### Parallelization - **LiquidTAD** (2025) β€” Static decay Ξ±=exp(-softplus(ρ)) for fully parallel liquid dynamics (100Γ— speedup vs ODE) ### Flow Matching - **Flow Matching for Generative Modeling** (Lipman et al., 2023) - **SiT** (2024) β€” Scalable Interpolant Transformers ## πŸ“ Architecture Diagram ``` Input Latent [B, 16, H/8, W/8] β”‚ β”œβ”€β”€β”€ Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16] β”œβ”€β”€β”€ + Learnable Position Embedding β”œβ”€β”€β”€ Input Projection (DW-Conv + PW-Conv + GELU) β”‚ β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── save skip connections β”‚ β”œβ”€β”€ AdaGN (timestep conditioned) β”‚ β”œβ”€β”€ GatedDepthwiseStimulusConv (local spatial) β”‚ β”œβ”€β”€ + ZigzagScan1D (global context) β”‚ β”œβ”€β”€ LiquidTimeConstant #1 (CfC blend) β”‚ β”œβ”€β”€ AdaGN (timestep conditioned) β”‚ β”œβ”€β”€ ChannelMixMLP (GELU) β”‚ └── LiquidTimeConstant #2 (CfC blend) β”‚ β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── add skip connections β”‚ └── (same structure as above) β”‚ β”œβ”€β”€β”€ GroupNorm + Conv + GELU └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8] ``` ## ⚑ Key Design Decisions 1. **No Attention** β€” O(n) vs O(nΒ²). Enables training on longer sequences / higher resolution latents. 2. **Liquid Dynamics over Residual** β€” Instead of `x + f(x)`, we use `Ξ±Β·x + (1-Ξ±)Β·f(x)` where Ξ± is learned per-channel. This gives the model explicit control over how much old vs new information to retain. 3. **Zigzag Scanning** β€” Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries. 4. **Frozen Flux VAE** β€” 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM. 5. **Flow Matching** β€” Straighter ODE trajectories than DDPM β†’ fewer sampling steps needed, better quality. ## πŸ“œ License MIT ## πŸ™ Acknowledgments - MIT CSAIL for Liquid Neural Networks research - Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0) - WikiArt dataset contributors - ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights