LiquidGen / README.md
asdf98's picture
Add README with architecture docs and usage guide
fe0d9c3 verified
|
raw
history blame
7.14 kB

πŸ§ͺ LiquidGen: Liquid Neural Network Image Generator

A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.

LiquidGen replaces self-attention in diffusion models with Closed-form Continuous-depth (CfC) liquid dynamics β€” making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).

πŸ—οΈ Architecture

Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Predicted Velocity β†’ Euler ODE β†’ Clean Latent β†’ VAE Decoder β†’ Output Image

Key Components

Component What it does Replaces
LiquidTimeConstant α·x + (1-α)·stimulus with learnable decay α = exp(-softplus(ρ)) Residual connections
GatedDepthwiseStimulusConv Local spatial context via gated DW-conv Self-attention (local)
ZigzagScan1D Global context via zigzag-ordered 1D conv Self-attention (global)
AdaptiveGroupNorm Timestep conditioning via scale/shift AdaLN in DiT
U-Net Long Skips Skip connections from shallow to deep blocks Standard residual

Core Innovation: Liquid Time Constants

From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):

x_{t+1} = exp(-Ξ”t/Ο„_t) Β· x_t + (1 - exp(-Ξ”t/Ο„_t)) Β· h(x_t, u_t)

Our parallelizable version:

α = exp(-softplus(ρ))              # Per-channel learnable retention
output = Ξ± * state + (1 - Ξ±) * stimulus  # Exponential relaxation

No sequential ODE solving. No attention. Fully parallelizable.

πŸ“Š Model Sizes

Model Params VRAM (train) Best For
LiquidGen-S ~55M ~4-6 GB 256px, fast experiments
LiquidGen-B ~140M ~8-10 GB 256/512px, balanced
LiquidGen-L ~280M ~12-14 GB 512px, high quality

All models fit comfortably in 16GB VRAM (Colab free tier T4 GPU).

πŸš€ Quick Start

Using the Colab Notebook

Open LiquidGen_Colab_Notebook.ipynb in Google Colab and follow the steps. It includes:

  • Complete model code (no external dependencies beyond PyTorch + diffusers)
  • Configurable training on WikiArt dataset (artistic paintings)
  • Support for 256px and 512px generation
  • Class-conditional generation (27 art styles)
  • Loss plotting and sample visualization

Using the Python Scripts

from model import liquidgen_base
import torch

# Create model
model = liquidgen_base(num_classes=27).cuda()
print(f"Parameters: {model.count_params()/1e6:.1f}M")

# Forward pass (predict velocity for flow matching)
x = torch.randn(4, 16, 32, 32).cuda()  # 256px latent
t = torch.rand(4).cuda()                 # Timesteps
labels = torch.randint(0, 27, (4,)).cuda()
v = model(x, t, labels)                  # Predicted velocity

πŸ”§ Training

Default Configuration

from train import TrainConfig, train

config = TrainConfig(
    model_size="base",          # "small", "base", or "large"
    image_size=256,             # 256 or 512
    dataset_name="huggan/wikiart",
    label_column="style",       # 27 art styles
    num_classes=27,
    batch_size=8,
    gradient_accumulation_steps=4,
    learning_rate=1e-4,
    num_epochs=50,
)
train(config)

Training Details

  • VAE: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0)
  • Objective: Flow matching (velocity prediction) β€” v = noise - x_0
  • Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
  • Gradient clipping: 2.0 (critical for stability, from ZigMa paper)
  • EMA: 0.9999 decay
  • Sampling: Euler ODE, 50 steps, classifier-free guidance

πŸ“ Files

β”œβ”€β”€ model.py                    # Complete LiquidGen model architecture
β”œβ”€β”€ train.py                    # Training pipeline with FlowMatching + EMA
β”œβ”€β”€ LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
└── README.md                   # This file

πŸ”¬ Research Background

This architecture synthesizes ideas from multiple research lineages:

Liquid Neural Networks

  • Liquid Time-constant Networks (Hasani et al., NeurIPS 2020) β€” ODE-based neurons with input-dependent Ο„
  • Closed-form Continuous-depth Models (Hasani et al., Nature Machine Intelligence 2022) β€” Analytical solution eliminating ODE solvers
  • Neural Circuit Policies (Lechner et al., Nature Machine Intelligence 2020) β€” Sparse wiring: sensoryβ†’interβ†’commandβ†’motor

Attention-Free Image Generation

  • ZigMa (ECCV 2024) β€” Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256)
  • DiMSUM (NeurIPS 2024) β€” Spatial-frequency Mamba (FID 2.11 ImageNet 256)
  • DiffuSSM (2023) β€” First attention-free diffusion model (FID 2.28 ImageNet 256)
  • DiM (2024) β€” Multi-directional Mamba with padding tokens

Parallelization

  • LiquidTAD (2025) β€” Static decay Ξ±=exp(-softplus(ρ)) for fully parallel liquid dynamics (100Γ— speedup vs ODE)

Flow Matching

  • Flow Matching for Generative Modeling (Lipman et al., 2023)
  • SiT (2024) β€” Scalable Interpolant Transformers

πŸ“ Architecture Diagram

Input Latent [B, 16, H/8, W/8]
    β”‚
    β”œβ”€β”€β”€ Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16]
    β”œβ”€β”€β”€ + Learnable Position Embedding
    β”œβ”€β”€β”€ Input Projection (DW-Conv + PW-Conv + GELU)
    β”‚
    β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2)  ←── save skip connections
    β”‚       β”œβ”€β”€ AdaGN (timestep conditioned)
    β”‚       β”œβ”€β”€ GatedDepthwiseStimulusConv (local spatial)
    β”‚       β”œβ”€β”€ + ZigzagScan1D (global context)
    β”‚       β”œβ”€β”€ LiquidTimeConstant #1 (CfC blend)
    β”‚       β”œβ”€β”€ AdaGN (timestep conditioned)
    β”‚       β”œβ”€β”€ ChannelMixMLP (GELU)
    β”‚       └── LiquidTimeConstant #2 (CfC blend)
    β”‚
    β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2)  ←── add skip connections
    β”‚       └── (same structure as above)
    β”‚
    β”œβ”€β”€β”€ GroupNorm + Conv + GELU
    └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]

⚑ Key Design Decisions

  1. No Attention β€” O(n) vs O(nΒ²). Enables training on longer sequences / higher resolution latents.
  2. Liquid Dynamics over Residual β€” Instead of x + f(x), we use Ξ±Β·x + (1-Ξ±)Β·f(x) where Ξ± is learned per-channel. This gives the model explicit control over how much old vs new information to retain.
  3. Zigzag Scanning β€” Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries.
  4. Frozen Flux VAE β€” 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM.
  5. Flow Matching β€” Straighter ODE trajectories than DDPM β†’ fewer sampling steps needed, better quality.

πŸ“œ License

MIT

πŸ™ Acknowledgments

  • MIT CSAIL for Liquid Neural Networks research
  • Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0)
  • WikiArt dataset contributors
  • ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights