File size: 6,869 Bytes

# 🧪 LiquidGen: Liquid Neural Network Image Generator

**A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.**

LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics — making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).

## 🚀 Quick Start (Colab)

1. Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab
2. Select a dataset preset (see table below)
3. Run all cells — latents are pre-cached automatically, then training starts

**Training is optimized for Colab free tier:**
- **Latent pre-caching**: Encode all images with VAE once → save to disk → train on pure tensors
- **No VAE during training** → saves ~1GB VRAM, enables larger batches (32+)
- **Small curated datasets** that download in seconds (not 5GB WikiArt!)

### Dataset Presets

| Preset | Images | Download | Classes | Description |
|--------|--------|----------|---------|-------------|
| `paintings_mini` | ~200 | 1.7MB | 27 styles | Instant smoke test |
| `paintings` | ~8K | 204MB | 27 styles | **Recommended** — best quality/speed tradeoff |
| `cartoon` | ~2.5K | 181MB | unconditional | Cartoon/anime images |
| `flowers` | ~8K | 331MB | unconditional | Flower photography |
| `wikiart_stream` | ~80K | streaming | 27 styles | Full WikiArt via streaming (set `max_images`) |

## 🏗️ Architecture

```
Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Predicted Velocity → Euler ODE → VAE Decoder → Output
```

### Key Components

| Component | What it does | Replaces |
|-----------|-------------|----------|
| **LiquidTimeConstant** | `α·x + (1-α)·stimulus` with learnable decay α = exp(-softplus(ρ)) | Residual connections |
| **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) |
| **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) |
| **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT |
| **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual |

### Core Innovation: Liquid Time Constants

From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
```
x_{t+1} = exp(-Δt/τ_t) · x_t + (1 - exp(-Δt/τ_t)) · h(x_t, u_t)
```

Our parallelizable version (inspired by LiquidTAD 2025):
```python
α = exp(-softplus(ρ))              # Per-channel learnable retention
output = α * state + (1 - α) * stimulus  # Exponential relaxation
```

**No sequential ODE solving.** No attention. Fully parallelizable.

## 📊 Model Sizes

| Model | Params | VRAM (train) | Best For |
|-------|--------|-------------|----------|
| **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments |
| **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
| **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |

All fit in **16GB VRAM** (Colab free T4). Training on cached latents = no VAE overhead.

## 🔧 Training

```python
from train import TrainConfig, train

config = TrainConfig(
    model_size="small",
    dataset_preset="paintings",   # 8K paintings, 204MB, 27 styles
    image_size=256,
    batch_size=32,                # Large batches OK with cached latents!
    num_epochs=100,
    learning_rate=1e-4,
)
train(config)
```

### Training Pipeline
1. **Pre-cache**: Load dataset → encode all images with frozen Flux VAE → save latents to disk → unload VAE
2. **Train**: Load cached tensors → train LiquidGen backbone with flow matching → fast iterations!
3. **Sample**: Load VAE only when generating sample images (lazy loading)

### Details
- **VAE**: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
- **Objective**: Flow matching (velocity prediction) — `v = noise - x_0`
- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
- **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
- **EMA**: 0.9999 decay
- **Sampling**: Euler ODE, 50 steps, classifier-free guidance

## 📁 Files

```
├── model.py                        # LiquidGen model architecture (~55-280M params)
├── train.py                        # Training pipeline with latent pre-caching
├── LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
└── README.md
```

## 📐 Architecture Diagram

```
Input Latent [B, 16, H/8, W/8]
    │
    ├─── Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16]
    ├─── + Learnable Position Embedding
    ├─── Input Projection (DW-Conv + PW-Conv + GELU)
    │
    ├─── LiquidBlock × (depth/2)  ←── save skip connections
    │       ├── AdaGN (timestep conditioned)
    │       ├── GatedDepthwiseStimulusConv (local spatial)
    │       ├── + ZigzagScan1D (global context)  
    │       ├── LiquidTimeConstant #1 (CfC blend)
    │       ├── AdaGN
    │       ├── ChannelMixMLP (GELU)
    │       └── LiquidTimeConstant #2 (CfC blend)
    │
    ├─── LiquidBlock × (depth/2)  ←── add skip connections
    │
    ├─── GroupNorm + Conv + GELU
    └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
```

## 🔬 Research Background

### Liquid Neural Networks
- **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) — ODE-based neurons with input-dependent τ
- **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) — Analytical solution eliminating ODE solvers
- **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
- **LiquidTAD** (2025) — Static decay α=exp(-softplus(ρ)) for fully parallel liquid dynamics (100× speedup)

### Attention-Free Image Generation  
- **ZigMa** (ECCV 2024) — Zigzag scanning for SSM-based diffusion
- **DiMSUM** (NeurIPS 2024) — Spatial-frequency Mamba (FID 2.11 ImageNet 256)
- **DiffuSSM** (2023) — First attention-free diffusion model
- **DiM** (2024) — Multi-directional Mamba with padding tokens

### Flow Matching
- **Flow Matching for Generative Modeling** (Lipman et al., 2023)
- **SiT** (2024) — Scalable Interpolant Transformers

## ⚡ Design Decisions

1. **No Attention** — O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
2. **Liquid over Residual** — `α·x + (1-α)·f(x)` instead of `x + f(x)`. Explicit control over retention per channel.
3. **Zigzag Scanning** — Preserves spatial continuity at row boundaries (critical insight from ZigMa).
4. **Latent Pre-caching** — Encode once, train forever. No VAE overhead during training.
5. **Flow Matching** — Straighter ODE trajectories → fewer sampling steps, better quality.

## 📜 License

MIT