asdf98 commited on
Commit
3063cf6
Β·
verified Β·
1 Parent(s): cb6e243

Update README with Colab-optimized training workflow and dataset presets

Browse files
Files changed (1) hide show
  1. README.md +65 -83
README.md CHANGED
@@ -4,10 +4,31 @@
4
 
5
  LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β€” making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ## πŸ—οΈ Architecture
8
 
9
  ```
10
- Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Predicted Velocity β†’ Euler ODE β†’ Clean Latent β†’ VAE Decoder β†’ Output Image
11
  ```
12
 
13
  ### Key Components
@@ -23,12 +44,11 @@ Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Pre
23
  ### Core Innovation: Liquid Time Constants
24
 
25
  From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
26
-
27
  ```
28
  x_{t+1} = exp(-Ξ”t/Ο„_t) Β· x_t + (1 - exp(-Ξ”t/Ο„_t)) Β· h(x_t, u_t)
29
  ```
30
 
31
- Our parallelizable version:
32
  ```python
33
  Ξ± = exp(-softplus(ρ)) # Per-channel learnable retention
34
  output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
@@ -44,57 +64,31 @@ output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
44
  | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
45
  | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
46
 
47
- All models fit comfortably in **16GB VRAM** (Colab free tier T4 GPU).
48
-
49
- ## πŸš€ Quick Start
50
-
51
- ### Using the Colab Notebook
52
- Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab and follow the steps. It includes:
53
- - Complete model code (no external dependencies beyond PyTorch + diffusers)
54
- - Configurable training on WikiArt dataset (artistic paintings)
55
- - Support for 256px and 512px generation
56
- - Class-conditional generation (27 art styles)
57
- - Loss plotting and sample visualization
58
-
59
- ### Using the Python Scripts
60
-
61
- ```python
62
- from model import liquidgen_base
63
- import torch
64
-
65
- # Create model
66
- model = liquidgen_base(num_classes=27).cuda()
67
- print(f"Parameters: {model.count_params()/1e6:.1f}M")
68
-
69
- # Forward pass (predict velocity for flow matching)
70
- x = torch.randn(4, 16, 32, 32).cuda() # 256px latent
71
- t = torch.rand(4).cuda() # Timesteps
72
- labels = torch.randint(0, 27, (4,)).cuda()
73
- v = model(x, t, labels) # Predicted velocity
74
- ```
75
 
76
  ## πŸ”§ Training
77
 
78
- ### Default Configuration
79
  ```python
80
  from train import TrainConfig, train
81
 
82
  config = TrainConfig(
83
- model_size="base", # "small", "base", or "large"
84
- image_size=256, # 256 or 512
85
- dataset_name="huggan/wikiart",
86
- label_column="style", # 27 art styles
87
- num_classes=27,
88
- batch_size=8,
89
- gradient_accumulation_steps=4,
90
  learning_rate=1e-4,
91
- num_epochs=50,
92
  )
93
  train(config)
94
  ```
95
 
96
- ### Training Details
97
- - **VAE**: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0)
 
 
 
 
 
98
  - **Objective**: Flow matching (velocity prediction) β€” `v = noise - x_0`
99
  - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
100
  - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
@@ -104,34 +98,12 @@ train(config)
104
  ## πŸ“ Files
105
 
106
  ```
107
- β”œβ”€β”€ model.py # Complete LiquidGen model architecture
108
- β”œβ”€β”€ train.py # Training pipeline with FlowMatching + EMA
109
  β”œβ”€β”€ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook
110
- └── README.md # This file
111
  ```
112
 
113
- ## πŸ”¬ Research Background
114
-
115
- This architecture synthesizes ideas from multiple research lineages:
116
-
117
- ### Liquid Neural Networks
118
- - **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β€” ODE-based neurons with input-dependent Ο„
119
- - **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β€” Analytical solution eliminating ODE solvers
120
- - **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
121
-
122
- ### Attention-Free Image Generation
123
- - **ZigMa** (ECCV 2024) β€” Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256)
124
- - **DiMSUM** (NeurIPS 2024) β€” Spatial-frequency Mamba (FID 2.11 ImageNet 256)
125
- - **DiffuSSM** (2023) β€” First attention-free diffusion model (FID 2.28 ImageNet 256)
126
- - **DiM** (2024) β€” Multi-directional Mamba with padding tokens
127
-
128
- ### Parallelization
129
- - **LiquidTAD** (2025) β€” Static decay Ξ±=exp(-softplus(ρ)) for fully parallel liquid dynamics (100Γ— speedup vs ODE)
130
-
131
- ### Flow Matching
132
- - **Flow Matching for Generative Modeling** (Lipman et al., 2023)
133
- - **SiT** (2024) β€” Scalable Interpolant Transformers
134
-
135
  ## πŸ“ Architecture Diagram
136
 
137
  ```
@@ -144,34 +116,44 @@ Input Latent [B, 16, H/8, W/8]
144
  β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── save skip connections
145
  β”‚ β”œβ”€β”€ AdaGN (timestep conditioned)
146
  β”‚ β”œβ”€β”€ GatedDepthwiseStimulusConv (local spatial)
147
- β”‚ β”œβ”€β”€ + ZigzagScan1D (global context)
148
  β”‚ β”œβ”€β”€ LiquidTimeConstant #1 (CfC blend)
149
- β”‚ β”œβ”€β”€ AdaGN (timestep conditioned)
150
  β”‚ β”œβ”€β”€ ChannelMixMLP (GELU)
151
  β”‚ └── LiquidTimeConstant #2 (CfC blend)
152
  β”‚
153
  β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── add skip connections
154
- β”‚ └── (same structure as above)
155
  β”‚
156
  β”œβ”€β”€β”€ GroupNorm + Conv + GELU
157
  └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
158
  ```
159
 
160
- ## ⚑ Key Design Decisions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
- 1. **No Attention** β€” O(n) vs O(nΒ²). Enables training on longer sequences / higher resolution latents.
163
- 2. **Liquid Dynamics over Residual** β€” Instead of `x + f(x)`, we use `Ξ±Β·x + (1-Ξ±)Β·f(x)` where Ξ± is learned per-channel. This gives the model explicit control over how much old vs new information to retain.
164
- 3. **Zigzag Scanning** β€” Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries.
165
- 4. **Frozen Flux VAE** β€” 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM.
166
- 5. **Flow Matching** β€” Straighter ODE trajectories than DDPM β†’ fewer sampling steps needed, better quality.
 
 
167
 
168
  ## πŸ“œ License
169
 
170
  MIT
171
-
172
- ## πŸ™ Acknowledgments
173
-
174
- - MIT CSAIL for Liquid Neural Networks research
175
- - Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0)
176
- - WikiArt dataset contributors
177
- - ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights
 
4
 
5
  LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β€” making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
6
 
7
+ ## πŸš€ Quick Start (Colab)
8
+
9
+ 1. Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab
10
+ 2. Select a dataset preset (see table below)
11
+ 3. Run all cells β€” latents are pre-cached automatically, then training starts
12
+
13
+ **Training is optimized for Colab free tier:**
14
+ - **Latent pre-caching**: Encode all images with VAE once β†’ save to disk β†’ train on pure tensors
15
+ - **No VAE during training** β†’ saves ~1GB VRAM, enables larger batches (32+)
16
+ - **Small curated datasets** that download in seconds (not 5GB WikiArt!)
17
+
18
+ ### Dataset Presets
19
+
20
+ | Preset | Images | Download | Classes | Description |
21
+ |--------|--------|----------|---------|-------------|
22
+ | `paintings_mini` | ~200 | 1.7MB | 27 styles | Instant smoke test |
23
+ | `paintings` | ~8K | 204MB | 27 styles | **Recommended** β€” best quality/speed tradeoff |
24
+ | `cartoon` | ~2.5K | 181MB | unconditional | Cartoon/anime images |
25
+ | `flowers` | ~8K | 331MB | unconditional | Flower photography |
26
+ | `wikiart_stream` | ~80K | streaming | 27 styles | Full WikiArt via streaming (set `max_images`) |
27
+
28
  ## πŸ—οΈ Architecture
29
 
30
  ```
31
+ Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Predicted Velocity β†’ Euler ODE β†’ VAE Decoder β†’ Output
32
  ```
33
 
34
  ### Key Components
 
44
  ### Core Innovation: Liquid Time Constants
45
 
46
  From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
 
47
  ```
48
  x_{t+1} = exp(-Ξ”t/Ο„_t) Β· x_t + (1 - exp(-Ξ”t/Ο„_t)) Β· h(x_t, u_t)
49
  ```
50
 
51
+ Our parallelizable version (inspired by LiquidTAD 2025):
52
  ```python
53
  Ξ± = exp(-softplus(ρ)) # Per-channel learnable retention
54
  output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
 
64
  | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
65
  | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
66
 
67
+ All fit in **16GB VRAM** (Colab free T4). Training on cached latents = no VAE overhead.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  ## πŸ”§ Training
70
 
 
71
  ```python
72
  from train import TrainConfig, train
73
 
74
  config = TrainConfig(
75
+ model_size="small",
76
+ dataset_preset="paintings", # 8K paintings, 204MB, 27 styles
77
+ image_size=256,
78
+ batch_size=32, # Large batches OK with cached latents!
79
+ num_epochs=100,
 
 
80
  learning_rate=1e-4,
 
81
  )
82
  train(config)
83
  ```
84
 
85
+ ### Training Pipeline
86
+ 1. **Pre-cache**: Load dataset β†’ encode all images with frozen Flux VAE β†’ save latents to disk β†’ unload VAE
87
+ 2. **Train**: Load cached tensors β†’ train LiquidGen backbone with flow matching β†’ fast iterations!
88
+ 3. **Sample**: Load VAE only when generating sample images (lazy loading)
89
+
90
+ ### Details
91
+ - **VAE**: FLUX.1-schnell (frozen, 16ch latent, 8x compression, Apache 2.0)
92
  - **Objective**: Flow matching (velocity prediction) β€” `v = noise - x_0`
93
  - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
94
  - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
 
98
  ## πŸ“ Files
99
 
100
  ```
101
+ β”œβ”€β”€ model.py # LiquidGen model architecture (~55-280M params)
102
+ β”œβ”€β”€ train.py # Training pipeline with latent pre-caching
103
  β”œβ”€β”€ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook
104
+ └── README.md
105
  ```
106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ## πŸ“ Architecture Diagram
108
 
109
  ```
 
116
  β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── save skip connections
117
  β”‚ β”œβ”€β”€ AdaGN (timestep conditioned)
118
  β”‚ β”œβ”€β”€ GatedDepthwiseStimulusConv (local spatial)
119
+ β”‚ β”œβ”€β”€ + ZigzagScan1D (global context)
120
  β”‚ β”œβ”€β”€ LiquidTimeConstant #1 (CfC blend)
121
+ β”‚ β”œβ”€β”€ AdaGN
122
  β”‚ β”œβ”€β”€ ChannelMixMLP (GELU)
123
  β”‚ └── LiquidTimeConstant #2 (CfC blend)
124
  β”‚
125
  β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── add skip connections
 
126
  β”‚
127
  β”œβ”€β”€β”€ GroupNorm + Conv + GELU
128
  └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
129
  ```
130
 
131
+ ## πŸ”¬ Research Background
132
+
133
+ ### Liquid Neural Networks
134
+ - **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β€” ODE-based neurons with input-dependent Ο„
135
+ - **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β€” Analytical solution eliminating ODE solvers
136
+ - **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
137
+ - **LiquidTAD** (2025) β€” Static decay Ξ±=exp(-softplus(ρ)) for fully parallel liquid dynamics (100Γ— speedup)
138
+
139
+ ### Attention-Free Image Generation
140
+ - **ZigMa** (ECCV 2024) β€” Zigzag scanning for SSM-based diffusion
141
+ - **DiMSUM** (NeurIPS 2024) β€” Spatial-frequency Mamba (FID 2.11 ImageNet 256)
142
+ - **DiffuSSM** (2023) β€” First attention-free diffusion model
143
+ - **DiM** (2024) β€” Multi-directional Mamba with padding tokens
144
+
145
+ ### Flow Matching
146
+ - **Flow Matching for Generative Modeling** (Lipman et al., 2023)
147
+ - **SiT** (2024) β€” Scalable Interpolant Transformers
148
 
149
+ ## ⚑ Design Decisions
150
+
151
+ 1. **No Attention** β€” O(n) complexity. Liquid dynamics + zigzag conv replace self-attention entirely.
152
+ 2. **Liquid over Residual** β€” `Ξ±Β·x + (1-Ξ±)Β·f(x)` instead of `x + f(x)`. Explicit control over retention per channel.
153
+ 3. **Zigzag Scanning** β€” Preserves spatial continuity at row boundaries (critical insight from ZigMa).
154
+ 4. **Latent Pre-caching** β€” Encode once, train forever. No VAE overhead during training.
155
+ 5. **Flow Matching** β€” Straighter ODE trajectories β†’ fewer sampling steps, better quality.
156
 
157
  ## πŸ“œ License
158
 
159
  MIT