| # π§ͺ LiquidGen: Liquid Neural Network Image Generator |
|
|
| **A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.** |
|
|
| LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4). |
|
|
| ## π Quick Start (Colab) |
|
|
| 1. Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab |
| 2. Select a dataset preset (see table below) |
| 3. Run all cells β latents are pre-cached automatically, then training starts |
|
|
| **Training is optimized for Colab free tier:** |
| - **Latent pre-caching**: Encode all images with VAE once β save to disk β train on pure tensors |
| - **No VAE during training** β saves ~1GB VRAM, enables larger batches (32+) |
| - **Small curated datasets** that download in seconds (not 5GB WikiArt!) |
|
|
| ### Dataset Presets |
|
|
| | Preset | Images | Download | Classes | Description | |
| |--------|--------|----------|---------|-------------| |
| | `paintings_mini` | ~200 | 1.7MB | 27 styles | Instant smoke test | |
| | `paintings` | ~8K | 204MB | 27 styles | **Recommended** β best quality/speed tradeoff | |
| | `cartoon` | ~2.5K | 181MB | unconditional | Cartoon/anime images | |
| | `flowers` | ~8K | 331MB | unconditional | Flower photography | |
| | `wikiart_stream` | ~80K | streaming | 27 styles | Full WikiArt via streaming (set `max_images`) | |
|
|
| ## ποΈ Architecture |
|
|
| ``` |
| Input Image β Flux VAE Encoder β Noisy Latent β LiquidGen Backbone β Predicted Velocity β Euler ODE β VAE Decoder β Output |
| ``` |
|
|
| ### Key Components |
|
|
| | Component | What it does | Replaces | |
| |-----------|-------------|----------| |
| | **LiquidTimeConstant** | `Ξ±Β·x + (1-Ξ±)Β·stimulus` with learnable decay Ξ± = exp(-softplus(Ο)) | Residual connections | |
| | **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) | |
| | **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) | |
| | **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT | |
| | **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual | |
|
|
| ### Core Innovation: Liquid Time Constants |
|
|
| From the CfC paper (Hasani et al., Nature Machine Intelligence 2022): |
| ``` |
| x_{t+1} = exp(-Ξt/Ο_t) Β· x_t + (1 - exp(-Ξt/Ο_t)) Β· h(x_t, u_t) |
| ``` |
|
|
| Our parallelizable version (inspired by LiquidTAD 2025): |
| ```python |
| Ξ± = exp(-softplus(Ο)) # Per-channel learnable retention |
| output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation |
| ``` |
|
|
| **No sequential ODE solving.** No attention. Fully parallelizable. |
|
|
| ## π Model Sizes |
|
|
| | Model | Params | VRAM (train) | Best For | |
| |-------|--------|-------------|----------| |
| | **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments | |
| | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced | |
| | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality | |
|
|
| All fit in **16GB VRAM** (Colab free T4). Training on cached latents = no VAE overhead. |
|
|
| ## π§ Training |
|
|
| ```python |
| from train import TrainConfig, train |
| |
| config = TrainConfig( |
| model_size="small", |
| dataset_preset="paintings", # 8K paintings, 204MB, 27 styles |
| image_size=256, |
| batch_size=32, # Large batches OK with cached latents! |
| num_epochs=100, |
| learning_rate=1e-4, |
| ) |
| train(config) |
| ``` |
|
|
| ### Training Pipeline |
| 1. **Pre-cache**: Load dataset β encode all images with frozen Flux VAE β save latents to disk β unload VAE |
| 2. **Train**: Load cached tensors β train LiquidGen backbone with flow matching β fast iterations! |
| 3. **Sample**: Load VAE only when generating sample images (lazy loading) |
|
|
| ### Details |
| - **VAE**: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0) |
| - **Objective**: Flow matching (velocity prediction) β `v = noise - x_0` |
| - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01) |
| - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper) |
| - **EMA**: 0.9999 decay |
| - **Sampling**: Euler ODE, 50 steps, classifier-free guidance |
| |
| ## π Files |
| |
| ``` |
| βββ model.py # LiquidGen model architecture (~55-280M params) |
| βββ train.py # Training pipeline with latent pre-caching |
| βββ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook |
| βββ README.md |
| ``` |
| |
| ## π Architecture Diagram |
| |
| ``` |
| Input Latent [B, 16, H/8, W/8] |
| β |
| ββββ Patch Embed (Conv2d, stride=2) βββ [B, D, H/16, W/16] |
| ββββ + Learnable Position Embedding |
| ββββ Input Projection (DW-Conv + PW-Conv + GELU) |
| β |
| ββββ LiquidBlock Γ (depth/2) βββ save skip connections |
| β βββ AdaGN (timestep conditioned) |
| β βββ GatedDepthwiseStimulusConv (local spatial) |
| β βββ + ZigzagScan1D (global context) |
| β βββ LiquidTimeConstant #1 (CfC blend) |
| β βββ AdaGN |
| β βββ ChannelMixMLP (GELU) |
| β βββ LiquidTimeConstant #2 (CfC blend) |
| β |
| ββββ LiquidBlock Γ (depth/2) βββ add skip connections |
| β |
| ββββ GroupNorm + Conv + GELU |
| ββββ Unpatchify (ConvTranspose2d) βββ [B, 16, H/8, W/8] |
| ``` |
| |
| ## π¬ Research Background |
| |
| ### Liquid Neural Networks |
| - **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β ODE-based neurons with input-dependent Ο |
| - **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β Analytical solution eliminating ODE solvers |
| - **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) β Sparse wiring: sensoryβinterβcommandβmotor |
| - **LiquidTAD** (2025) β Static decay Ξ±=exp(-softplus(Ο)) for fully parallel liquid dynamics (100Γ speedup) |
| |
| ### Attention-Free Image Generation |
| - **ZigMa** (ECCV 2024) β Zigzag scanning for SSM-based diffusion |
| - **DiMSUM** (NeurIPS 2024) β Spatial-frequency Mamba (FID 2.11 ImageNet 256) |
| - **DiffuSSM** (2023) β First attention-free diffusion model |
| - **DiM** (2024) β Multi-directional Mamba with padding tokens |
| |
| ### Flow Matching |
| - **Flow Matching for Generative Modeling** (Lipman et al., 2023) |
| - **SiT** (2024) β Scalable Interpolant Transformers |
| |
| ## β‘ Design Decisions |
| |
| 1. **No Attention** β O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely. |
| 2. **Liquid over Residual** β `Ξ±Β·x + (1-Ξ±)Β·f(x)` instead of `x + f(x)`. Explicit control over retention per channel. |
| 3. **Zigzag Scanning** β Preserves spatial continuity at row boundaries (critical insight from ZigMa). |
| 4. **Latent Pre-caching** β Encode once, train forever. No VAE overhead during training. |
| 5. **Flow Matching** β Straighter ODE trajectories β fewer sampling steps, better quality. |
| |
| ## π License |
| |
| MIT |
| |