File size: 6,869 Bytes
fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 3063cf6 fe0d9c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | # π§ͺ LiquidGen: Liquid Neural Network Image Generator
**A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.**
LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
## π Quick Start (Colab)
1. Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab
2. Select a dataset preset (see table below)
3. Run all cells β latents are pre-cached automatically, then training starts
**Training is optimized for Colab free tier:**
- **Latent pre-caching**: Encode all images with VAE once β save to disk β train on pure tensors
- **No VAE during training** β saves ~1GB VRAM, enables larger batches (32+)
- **Small curated datasets** that download in seconds (not 5GB WikiArt!)
### Dataset Presets
| Preset | Images | Download | Classes | Description |
|--------|--------|----------|---------|-------------|
| `paintings_mini` | ~200 | 1.7MB | 27 styles | Instant smoke test |
| `paintings` | ~8K | 204MB | 27 styles | **Recommended** β best quality/speed tradeoff |
| `cartoon` | ~2.5K | 181MB | unconditional | Cartoon/anime images |
| `flowers` | ~8K | 331MB | unconditional | Flower photography |
| `wikiart_stream` | ~80K | streaming | 27 styles | Full WikiArt via streaming (set `max_images`) |
## ποΈ Architecture
```
Input Image β Flux VAE Encoder β Noisy Latent β LiquidGen Backbone β Predicted Velocity β Euler ODE β VAE Decoder β Output
```
### Key Components
| Component | What it does | Replaces |
|-----------|-------------|----------|
| **LiquidTimeConstant** | `Ξ±Β·x + (1-Ξ±)Β·stimulus` with learnable decay Ξ± = exp(-softplus(Ο)) | Residual connections |
| **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) |
| **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) |
| **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT |
| **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual |
### Core Innovation: Liquid Time Constants
From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
```
x_{t+1} = exp(-Ξt/Ο_t) Β· x_t + (1 - exp(-Ξt/Ο_t)) Β· h(x_t, u_t)
```
Our parallelizable version (inspired by LiquidTAD 2025):
```python
Ξ± = exp(-softplus(Ο)) # Per-channel learnable retention
output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
```
**No sequential ODE solving.** No attention. Fully parallelizable.
## π Model Sizes
| Model | Params | VRAM (train) | Best For |
|-------|--------|-------------|----------|
| **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments |
| **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
| **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
All fit in **16GB VRAM** (Colab free T4). Training on cached latents = no VAE overhead.
## π§ Training
```python
from train import TrainConfig, train
config = TrainConfig(
model_size="small",
dataset_preset="paintings", # 8K paintings, 204MB, 27 styles
image_size=256,
batch_size=32, # Large batches OK with cached latents!
num_epochs=100,
learning_rate=1e-4,
)
train(config)
```
### Training Pipeline
1. **Pre-cache**: Load dataset β encode all images with frozen Flux VAE β save latents to disk β unload VAE
2. **Train**: Load cached tensors β train LiquidGen backbone with flow matching β fast iterations!
3. **Sample**: Load VAE only when generating sample images (lazy loading)
### Details
- **VAE**: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
- **Objective**: Flow matching (velocity prediction) β `v = noise - x_0`
- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
- **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
- **EMA**: 0.9999 decay
- **Sampling**: Euler ODE, 50 steps, classifier-free guidance
## π Files
```
βββ model.py # LiquidGen model architecture (~55-280M params)
βββ train.py # Training pipeline with latent pre-caching
βββ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook
βββ README.md
```
## π Architecture Diagram
```
Input Latent [B, 16, H/8, W/8]
β
ββββ Patch Embed (Conv2d, stride=2) βββ [B, D, H/16, W/16]
ββββ + Learnable Position Embedding
ββββ Input Projection (DW-Conv + PW-Conv + GELU)
β
ββββ LiquidBlock Γ (depth/2) βββ save skip connections
β βββ AdaGN (timestep conditioned)
β βββ GatedDepthwiseStimulusConv (local spatial)
β βββ + ZigzagScan1D (global context)
β βββ LiquidTimeConstant #1 (CfC blend)
β βββ AdaGN
β βββ ChannelMixMLP (GELU)
β βββ LiquidTimeConstant #2 (CfC blend)
β
ββββ LiquidBlock Γ (depth/2) βββ add skip connections
β
ββββ GroupNorm + Conv + GELU
ββββ Unpatchify (ConvTranspose2d) βββ [B, 16, H/8, W/8]
```
## π¬ Research Background
### Liquid Neural Networks
- **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β ODE-based neurons with input-dependent Ο
- **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β Analytical solution eliminating ODE solvers
- **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) β Sparse wiring: sensoryβinterβcommandβmotor
- **LiquidTAD** (2025) β Static decay Ξ±=exp(-softplus(Ο)) for fully parallel liquid dynamics (100Γ speedup)
### Attention-Free Image Generation
- **ZigMa** (ECCV 2024) β Zigzag scanning for SSM-based diffusion
- **DiMSUM** (NeurIPS 2024) β Spatial-frequency Mamba (FID 2.11 ImageNet 256)
- **DiffuSSM** (2023) β First attention-free diffusion model
- **DiM** (2024) β Multi-directional Mamba with padding tokens
### Flow Matching
- **Flow Matching for Generative Modeling** (Lipman et al., 2023)
- **SiT** (2024) β Scalable Interpolant Transformers
## β‘ Design Decisions
1. **No Attention** β O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
2. **Liquid over Residual** β `Ξ±Β·x + (1-Ξ±)Β·f(x)` instead of `x + f(x)`. Explicit control over retention per channel.
3. **Zigzag Scanning** β Preserves spatial continuity at row boundaries (critical insight from ZigMa).
4. **Latent Pre-caching** β Encode once, train forever. No VAE overhead during training.
5. **Flow Matching** β Straighter ODE trajectories β fewer sampling steps, better quality.
## π License
MIT
|