asdf98
/

LiquidGen

Model card Files Files and versions

xet

Community

asdf98 commited on 8 days ago

Commit

3063cf6

verified ·

1 Parent(s): cb6e243

Update README with Colab-optimized training workflow and dataset presets

Browse files

Files changed (1) hide show

README.md +65 -83

README.md CHANGED Viewed

@@ -4,10 +4,31 @@
 LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics — making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
 ## 🏗️ Architecture
 ```
-Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Predicted Velocity → Euler ODE → Clean Latent → VAE Decoder → Output Image
 ```
 ### Key Components
@@ -23,12 +44,11 @@ Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Pre
 ### Core Innovation: Liquid Time Constants
 From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
 ```
 x_{t+1} = exp(-Δt/τ_t) · x_t + (1 - exp(-Δt/τ_t)) · h(x_t, u_t)
 ```
-Our parallelizable version:
 ```python
 α = exp(-softplus(ρ))              # Per-channel learnable retention
 output = α * state + (1 - α) * stimulus  # Exponential relaxation
@@ -44,57 +64,31 @@ output = α * state + (1 - α) * stimulus  # Exponential relaxation
 | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
 | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
-All models fit comfortably in **16GB VRAM** (Colab free tier T4 GPU).
-## 🚀 Quick Start
-### Using the Colab Notebook
-Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab and follow the steps. It includes:
-- Complete model code (no external dependencies beyond PyTorch + diffusers)
-- Configurable training on WikiArt dataset (artistic paintings)
-- Support for 256px and 512px generation
-- Class-conditional generation (27 art styles)
-- Loss plotting and sample visualization
-### Using the Python Scripts
-```python
-from model import liquidgen_base
-import torch
-# Create model
-model = liquidgen_base(num_classes=27).cuda()
-print(f"Parameters: {model.count_params()/1e6:.1f}M")
-# Forward pass (predict velocity for flow matching)
-x = torch.randn(4, 16, 32, 32).cuda()  # 256px latent
-t = torch.rand(4).cuda()                 # Timesteps
-labels = torch.randint(0, 27, (4,)).cuda()
-v = model(x, t, labels)                  # Predicted velocity
-```
 ## 🔧 Training
-### Default Configuration
 ```python
 from train import TrainConfig, train
 config = TrainConfig(
-    model_size="base",          # "small", "base", or "large"
-    image_size=256,             # 256 or 512
-    dataset_name="huggan/wikiart",
-    label_column="style",       # 27 art styles
-    num_classes=27,
-    batch_size=8,
-    gradient_accumulation_steps=4,
     learning_rate=1e-4,
-    num_epochs=50,
 )
 train(config)
 ```
-### Training Details
-- **VAE**: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0)
 - **Objective**: Flow matching (velocity prediction) — `v = noise - x_0`
 - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
 - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
@@ -104,34 +98,12 @@ train(config)
 ## 📁 Files
 ```
-├── model.py                    # Complete LiquidGen model architecture
-├── train.py                    # Training pipeline with FlowMatching + EMA
 ├── LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
-└── README.md                   # This file
 ```
-## 🔬 Research Background
-This architecture synthesizes ideas from multiple research lineages:
-### Liquid Neural Networks
-- **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) — ODE-based neurons with input-dependent τ
-- **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) — Analytical solution eliminating ODE solvers
-- **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
-### Attention-Free Image Generation
-- **ZigMa** (ECCV 2024) — Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256)
-- **DiMSUM** (NeurIPS 2024) — Spatial-frequency Mamba (FID 2.11 ImageNet 256)
-- **DiffuSSM** (2023) — First attention-free diffusion model (FID 2.28 ImageNet 256)
-- **DiM** (2024) — Multi-directional Mamba with padding tokens
-### Parallelization
-- **LiquidTAD** (2025) — Static decay α=exp(-softplus(ρ)) for fully parallel liquid dynamics (100× speedup vs ODE)
-### Flow Matching
-- **Flow Matching for Generative Modeling** (Lipman et al., 2023)
-- **SiT** (2024) — Scalable Interpolant Transformers
 ## 📐 Architecture Diagram
 ```
@@ -144,34 +116,44 @@ Input Latent [B, 16, H/8, W/8]
     ├─── LiquidBlock × (depth/2)  ←── save skip connections
     │       ├── AdaGN (timestep conditioned)
     │       ├── GatedDepthwiseStimulusConv (local spatial)
-    │       ├── + ZigzagScan1D (global context)
     │       ├── LiquidTimeConstant #1 (CfC blend)
-    │       ├── AdaGN (timestep conditioned)
     │       ├── ChannelMixMLP (GELU)
     │       └── LiquidTimeConstant #2 (CfC blend)
     │
     ├─── LiquidBlock × (depth/2)  ←── add skip connections
-    │       └── (same structure as above)
     │
     ├─── GroupNorm + Conv + GELU
     └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
 ```
-## ⚡ Key Design Decisions
-1. **No Attention** — O(n) vs O(n²). Enables training on longer sequences / higher resolution latents.
-2. **Liquid Dynamics over Residual** — Instead of `x + f(x)`, we use `α·x + (1-α)·f(x)` where α is learned per-channel. This gives the model explicit control over how much old vs new information to retain.
-3. **Zigzag Scanning** — Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries.
-4. **Frozen Flux VAE** — 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM.
-5. **Flow Matching** — Straighter ODE trajectories than DDPM → fewer sampling steps needed, better quality.
 ## 📜 License
 MIT
-## 🙏 Acknowledgments
-- MIT CSAIL for Liquid Neural Networks research
-- Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0)
-- WikiArt dataset contributors
-- ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights

 LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics — making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
+## 🚀 Quick Start (Colab)
+1. Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab
+2. Select a dataset preset (see table below)
+3. Run all cells — latents are pre-cached automatically, then training starts
+**Training is optimized for Colab free tier:**
+- **Latent pre-caching**: Encode all images with VAE once → save to disk → train on pure tensors
+- **No VAE during training** → saves ~1GB VRAM, enables larger batches (32+)
+- **Small curated datasets** that download in seconds (not 5GB WikiArt!)
+### Dataset Presets
+| Preset | Images | Download | Classes | Description |
+|--------|--------|----------|---------|-------------|
+| `paintings_mini` | ~200 | 1.7MB | 27 styles | Instant smoke test |
+| `paintings` | ~8K | 204MB | 27 styles | **Recommended** — best quality/speed tradeoff |
+| `cartoon` | ~2.5K | 181MB | unconditional | Cartoon/anime images |
+| `flowers` | ~8K | 331MB | unconditional | Flower photography |
+| `wikiart_stream` | ~80K | streaming | 27 styles | Full WikiArt via streaming (set `max_images`) |
 ## 🏗️ Architecture
 ```
+Input Image → Flux VAE Encoder → Noisy Latent → LiquidGen Backbone → Predicted Velocity → Euler ODE → VAE Decoder → Output
 ```
 ### Key Components
 ### Core Innovation: Liquid Time Constants
 From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
 ```
 x_{t+1} = exp(-Δt/τ_t) · x_t + (1 - exp(-Δt/τ_t)) · h(x_t, u_t)
 ```
+Our parallelizable version (inspired by LiquidTAD 2025):
 ```python
 α = exp(-softplus(ρ))              # Per-channel learnable retention
 output = α * state + (1 - α) * stimulus  # Exponential relaxation
 | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
 | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
+All fit in **16GB VRAM** (Colab free T4). Training on cached latents = no VAE overhead.
 ## 🔧 Training
 ```python
 from train import TrainConfig, train
 config = TrainConfig(
+    model_size="small",
+    dataset_preset="paintings",   # 8K paintings, 204MB, 27 styles
+    image_size=256,
+    batch_size=32,                # Large batches OK with cached latents!
+    num_epochs=100,
     learning_rate=1e-4,
 )
 train(config)
 ```
+### Training Pipeline
+1. **Pre-cache**: Load dataset → encode all images with frozen Flux VAE → save latents to disk → unload VAE
+2. **Train**: Load cached tensors → train LiquidGen backbone with flow matching → fast iterations!
+3. **Sample**: Load VAE only when generating sample images (lazy loading)
+### Details
+- **VAE**: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
 - **Objective**: Flow matching (velocity prediction) — `v = noise - x_0`
 - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
 - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
 ## 📁 Files
 ```
+├── model.py                        # LiquidGen model architecture (~55-280M params)
+├── train.py                        # Training pipeline with latent pre-caching
 ├── LiquidGen_Colab_Notebook.ipynb  # Ready-to-run Colab notebook
+└── README.md
 ```
 ## 📐 Architecture Diagram
 ```
     ├─── LiquidBlock × (depth/2)  ←── save skip connections
     │       ├── AdaGN (timestep conditioned)
     │       ├── GatedDepthwiseStimulusConv (local spatial)
+    │       ├── + ZigzagScan1D (global context)
     │       ├── LiquidTimeConstant #1 (CfC blend)
+    │       ├── AdaGN
     │       ├── ChannelMixMLP (GELU)
     │       └── LiquidTimeConstant #2 (CfC blend)
     │
     ├─── LiquidBlock × (depth/2)  ←── add skip connections
     │
     ├─── GroupNorm + Conv + GELU
     └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
 ```
+## 🔬 Research Background
+### Liquid Neural Networks
+- **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) — ODE-based neurons with input-dependent τ
+- **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) — Analytical solution eliminating ODE solvers
+- **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
+- **LiquidTAD** (2025) — Static decay α=exp(-softplus(ρ)) for fully parallel liquid dynamics (100× speedup)
+### Attention-Free Image Generation
+- **ZigMa** (ECCV 2024) — Zigzag scanning for SSM-based diffusion
+- **DiMSUM** (NeurIPS 2024) — Spatial-frequency Mamba (FID 2.11 ImageNet 256)
+- **DiffuSSM** (2023) — First attention-free diffusion model
+- **DiM** (2024) — Multi-directional Mamba with padding tokens
+### Flow Matching
+- **Flow Matching for Generative Modeling** (Lipman et al., 2023)
+- **SiT** (2024) — Scalable Interpolant Transformers
+## ⚡ Design Decisions
+1. **No Attention** — O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
+2. **Liquid over Residual** — `α·x + (1-α)·f(x)` instead of `x + f(x)`. Explicit control over retention per channel.
+3. **Zigzag Scanning** — Preserves spatial continuity at row boundaries (critical insight from ZigMa).
+4. **Latent Pre-caching** — Encode once, train forever. No VAE overhead during training.
+5. **Flow Matching** — Straighter ODE trajectories → fewer sampling steps, better quality.
 ## 📜 License
 MIT