asdf98
/

LiquidGen

Model card Files Files and versions

xet

Community

asdf98 commited on 8 days ago

Commit

fe0d9c3

verified ·

1 Parent(s): 8df5847

Add README with architecture docs and usage guide

Browse files

Files changed (1) hide show

README.md +177 -0

README.md ADDED Viewed

	@@ -0,0 +1,177 @@

+# 🧪 LiquidGen: Liquid Neural Network Image Generator
+**A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.**
+LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics — making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
+## 🏗️ Architecture
+```
+Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Predicted Velocity → Euler ODE → Clean Latent → VAE Decoder → Output Image
+```
+### Key Components
+| Component | What it does | Replaces |
+|-----------|-------------|----------|
+| **LiquidTimeConstant** | `α·x + (1-α)·stimulus` with learnable decay α = exp(-softplus(ρ)) | Residual connections |
+| **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) |
+| **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) |
+| **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT |
+| **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual |
+### Core Innovation: Liquid Time Constants
+From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
+```
+x_{t+1} = exp(-Δt/τ_t) · x_t + (1 - exp(-Δt/τ_t)) · h(x_t, u_t)
+```
+Our parallelizable version:
+```python
+α = exp(-softplus(ρ))              # Per-channel learnable retention
+output = α * state + (1 - α) * stimulus  # Exponential relaxation
+```
+**No sequential ODE solving.** No attention. Fully parallelizable.
+## 📊 Model Sizes
+| Model | Params | VRAM (train) | Best For |
+|-------|--------|-------------|----------|
+| **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments |
+| **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
+| **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
+All models fit comfortably in **16GB VRAM** (Colab free tier T4 GPU).
+## 🚀 Quick Start
+### Using the Colab Notebook
+Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab and follow the steps. It includes:
+- Complete model code (no external dependencies beyond PyTorch + diffusers)
+- Configurable training on WikiArt dataset (artistic paintings)
+- Support for 256px and 512px generation
+- Class-conditional generation (27 art styles)
+- Loss plotting and sample visualization
+### Using the Python Scripts
+```python
+from model import liquidgen_base
+import torch
+# Create model
+model = liquidgen_base(num_classes=27).cuda()
+print(f"Parameters: {model.count_params()/1e6:.1f}M")
+# Forward pass (predict velocity for flow matching)
+x = torch.randn(4, 16, 32, 32).cuda()  # 256px latent
+t = torch.rand(4).cuda()                 # Timesteps
+labels = torch.randint(0, 27, (4,)).cuda()
+v = model(x, t, labels)                  # Predicted velocity
+```
+## 🔧 Training
+### Default Configuration
+```python
+from train import TrainConfig, train
+config = TrainConfig(
+    model_size="base",          # "small", "base", or "large"
+    image_size=256,             # 256 or 512
+    dataset_name="huggan/wikiart",
+    label_column="style",       # 27 art styles
+    num_classes=27,
+    batch_size=8,
+    gradient_accumulation_steps=4,
+    learning_rate=1e-4,
+    num_epochs=50,
+)
+train(config)
+```
+### Training Details
+- **VAE**: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0)
+- **Objective**: Flow matching (velocity prediction) — `v = noise - x_0`
+- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
+- **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
+- **EMA**: 0.9999 decay
+- **Sampling**: Euler ODE, 50 steps, classifier-free guidance
+## 📁 Files
+```
+├── model.py                    # Complete LiquidGen model architecture
+├── train.py                    # Training pipeline with FlowMatching + EMA
+├── LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
+└── README.md                   # This file
+```
+## 🔬 Research Background
+This architecture synthesizes ideas from multiple research lineages:
+### Liquid Neural Networks
+- **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) — ODE-based neurons with input-dependent τ
+- **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) — Analytical solution eliminating ODE solvers
+- **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
+### Attention-Free Image Generation
+- **ZigMa** (ECCV 2024) — Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256)
+- **DiMSUM** (NeurIPS 2024) — Spatial-frequency Mamba (FID 2.11 ImageNet 256)
+- **DiffuSSM** (2023) — First attention-free diffusion model (FID 2.28 ImageNet 256)
+- **DiM** (2024) — Multi-directional Mamba with padding tokens
+### Parallelization
+- **LiquidTAD** (2025) — Static decay α=exp(-softplus(ρ)) for fully parallel liquid dynamics (100× speedup vs ODE)
+### Flow Matching
+- **Flow Matching for Generative Modeling** (Lipman et al., 2023)
+- **SiT** (2024) — Scalable Interpolant Transformers
+## 📐 Architecture Diagram
+```
+Input Latent [B, 16, H/8, W/8]
+    │
+    ├─── Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16]
+    ├─── + Learnable Position Embedding
+    ├─── Input Projection (DW-Conv + PW-Conv + GELU)
+    │
+    ├─── LiquidBlock × (depth/2)  ←── save skip connections
+    │       ├── AdaGN (timestep conditioned)
+    │       ├── GatedDepthwiseStimulusConv (local spatial)
+    │       ├── + ZigzagScan1D (global context)
+    │       ├── LiquidTimeConstant #1 (CfC blend)
+    │       ├── AdaGN (timestep conditioned)
+    │       ├── ChannelMixMLP (GELU)
+    │       └── LiquidTimeConstant #2 (CfC blend)
+    │
+    ├─── LiquidBlock × (depth/2)  ←── add skip connections
+    │       └── (same structure as above)
+    │
+    ├─── GroupNorm + Conv + GELU
+    └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
+```
+## ⚡ Key Design Decisions
+1. **No Attention** — O(n) vs O(n²). Enables training on longer sequences / higher resolution latents.
+2. **Liquid Dynamics over Residual** — Instead of `x + f(x)`, we use `α·x + (1-α)·f(x)` where α is learned per-channel. This gives the model explicit control over how much old vs new information to retain.
+3. **Zigzag Scanning** — Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries.
+4. **Frozen Flux VAE** — 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM.
+5. **Flow Matching** — Straighter ODE trajectories than DDPM → fewer sampling steps needed, better quality.
+## 📜 License
+MIT
+## 🙏 Acknowledgments
+- MIT CSAIL for Liquid Neural Networks research
+- Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0)
+- WikiArt dataset contributors
+- ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights