asdf98 commited on
Commit
fe0d9c3
Β·
verified Β·
1 Parent(s): 8df5847

Add README with architecture docs and usage guide

Browse files
Files changed (1) hide show
  1. README.md +177 -0
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ§ͺ LiquidGen: Liquid Neural Network Image Generator
2
+
3
+ **A novel attention-free image generation model based on Liquid Neural Network dynamics from MIT CSAIL.**
4
+
5
+ LiquidGen replaces self-attention in diffusion models with **Closed-form Continuous-depth (CfC)** liquid dynamics β€” making it fully parallelizable, memory-efficient, and trainable on a single consumer GPU (Colab free tier T4).
6
+
7
+ ## πŸ—οΈ Architecture
8
+
9
+ ```
10
+ Input Image β†’ Flux VAE Encoder β†’ Noisy Latent β†’ LiquidGen Backbone β†’ Predicted Velocity β†’ Euler ODE β†’ Clean Latent β†’ VAE Decoder β†’ Output Image
11
+ ```
12
+
13
+ ### Key Components
14
+
15
+ | Component | What it does | Replaces |
16
+ |-----------|-------------|----------|
17
+ | **LiquidTimeConstant** | `α·x + (1-α)·stimulus` with learnable decay α = exp(-softplus(ρ)) | Residual connections |
18
+ | **GatedDepthwiseStimulusConv** | Local spatial context via gated DW-conv | Self-attention (local) |
19
+ | **ZigzagScan1D** | Global context via zigzag-ordered 1D conv | Self-attention (global) |
20
+ | **AdaptiveGroupNorm** | Timestep conditioning via scale/shift | AdaLN in DiT |
21
+ | **U-Net Long Skips** | Skip connections from shallow to deep blocks | Standard residual |
22
+
23
+ ### Core Innovation: Liquid Time Constants
24
+
25
+ From the CfC paper (Hasani et al., Nature Machine Intelligence 2022):
26
+
27
+ ```
28
+ x_{t+1} = exp(-Ξ”t/Ο„_t) Β· x_t + (1 - exp(-Ξ”t/Ο„_t)) Β· h(x_t, u_t)
29
+ ```
30
+
31
+ Our parallelizable version:
32
+ ```python
33
+ α = exp(-softplus(ρ)) # Per-channel learnable retention
34
+ output = Ξ± * state + (1 - Ξ±) * stimulus # Exponential relaxation
35
+ ```
36
+
37
+ **No sequential ODE solving.** No attention. Fully parallelizable.
38
+
39
+ ## πŸ“Š Model Sizes
40
+
41
+ | Model | Params | VRAM (train) | Best For |
42
+ |-------|--------|-------------|----------|
43
+ | **LiquidGen-S** | ~55M | ~4-6 GB | 256px, fast experiments |
44
+ | **LiquidGen-B** | ~140M | ~8-10 GB | 256/512px, balanced |
45
+ | **LiquidGen-L** | ~280M | ~12-14 GB | 512px, high quality |
46
+
47
+ All models fit comfortably in **16GB VRAM** (Colab free tier T4 GPU).
48
+
49
+ ## πŸš€ Quick Start
50
+
51
+ ### Using the Colab Notebook
52
+ Open `LiquidGen_Colab_Notebook.ipynb` in Google Colab and follow the steps. It includes:
53
+ - Complete model code (no external dependencies beyond PyTorch + diffusers)
54
+ - Configurable training on WikiArt dataset (artistic paintings)
55
+ - Support for 256px and 512px generation
56
+ - Class-conditional generation (27 art styles)
57
+ - Loss plotting and sample visualization
58
+
59
+ ### Using the Python Scripts
60
+
61
+ ```python
62
+ from model import liquidgen_base
63
+ import torch
64
+
65
+ # Create model
66
+ model = liquidgen_base(num_classes=27).cuda()
67
+ print(f"Parameters: {model.count_params()/1e6:.1f}M")
68
+
69
+ # Forward pass (predict velocity for flow matching)
70
+ x = torch.randn(4, 16, 32, 32).cuda() # 256px latent
71
+ t = torch.rand(4).cuda() # Timesteps
72
+ labels = torch.randint(0, 27, (4,)).cuda()
73
+ v = model(x, t, labels) # Predicted velocity
74
+ ```
75
+
76
+ ## πŸ”§ Training
77
+
78
+ ### Default Configuration
79
+ ```python
80
+ from train import TrainConfig, train
81
+
82
+ config = TrainConfig(
83
+ model_size="base", # "small", "base", or "large"
84
+ image_size=256, # 256 or 512
85
+ dataset_name="huggan/wikiart",
86
+ label_column="style", # 27 art styles
87
+ num_classes=27,
88
+ batch_size=8,
89
+ gradient_accumulation_steps=4,
90
+ learning_rate=1e-4,
91
+ num_epochs=50,
92
+ )
93
+ train(config)
94
+ ```
95
+
96
+ ### Training Details
97
+ - **VAE**: FLUX.1-schnell (frozen, 16-channel latent, 8x compression, Apache 2.0)
98
+ - **Objective**: Flow matching (velocity prediction) β€” `v = noise - x_0`
99
+ - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
100
+ - **Gradient clipping**: 2.0 (critical for stability, from ZigMa paper)
101
+ - **EMA**: 0.9999 decay
102
+ - **Sampling**: Euler ODE, 50 steps, classifier-free guidance
103
+
104
+ ## πŸ“ Files
105
+
106
+ ```
107
+ β”œβ”€β”€ model.py # Complete LiquidGen model architecture
108
+ β”œβ”€β”€ train.py # Training pipeline with FlowMatching + EMA
109
+ β”œβ”€β”€ LiquidGen_Colab_Notebook.ipynb # Ready-to-run Colab notebook
110
+ └── README.md # This file
111
+ ```
112
+
113
+ ## πŸ”¬ Research Background
114
+
115
+ This architecture synthesizes ideas from multiple research lineages:
116
+
117
+ ### Liquid Neural Networks
118
+ - **Liquid Time-constant Networks** (Hasani et al., NeurIPS 2020) β€” ODE-based neurons with input-dependent Ο„
119
+ - **Closed-form Continuous-depth Models** (Hasani et al., Nature Machine Intelligence 2022) β€” Analytical solution eliminating ODE solvers
120
+ - **Neural Circuit Policies** (Lechner et al., Nature Machine Intelligence 2020) — Sparse wiring: sensory→inter→command→motor
121
+
122
+ ### Attention-Free Image Generation
123
+ - **ZigMa** (ECCV 2024) β€” Zigzag scanning for SSM-based diffusion (FID 14.27 CelebA-256)
124
+ - **DiMSUM** (NeurIPS 2024) β€” Spatial-frequency Mamba (FID 2.11 ImageNet 256)
125
+ - **DiffuSSM** (2023) β€” First attention-free diffusion model (FID 2.28 ImageNet 256)
126
+ - **DiM** (2024) β€” Multi-directional Mamba with padding tokens
127
+
128
+ ### Parallelization
129
+ - **LiquidTAD** (2025) β€” Static decay Ξ±=exp(-softplus(ρ)) for fully parallel liquid dynamics (100Γ— speedup vs ODE)
130
+
131
+ ### Flow Matching
132
+ - **Flow Matching for Generative Modeling** (Lipman et al., 2023)
133
+ - **SiT** (2024) β€” Scalable Interpolant Transformers
134
+
135
+ ## πŸ“ Architecture Diagram
136
+
137
+ ```
138
+ Input Latent [B, 16, H/8, W/8]
139
+ β”‚
140
+ β”œβ”€β”€β”€ Patch Embed (Conv2d, stride=2) ──→ [B, D, H/16, W/16]
141
+ β”œβ”€β”€β”€ + Learnable Position Embedding
142
+ β”œβ”€β”€β”€ Input Projection (DW-Conv + PW-Conv + GELU)
143
+ β”‚
144
+ β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── save skip connections
145
+ β”‚ β”œβ”€β”€ AdaGN (timestep conditioned)
146
+ β”‚ β”œβ”€β”€ GatedDepthwiseStimulusConv (local spatial)
147
+ β”‚ β”œβ”€β”€ + ZigzagScan1D (global context)
148
+ β”‚ β”œβ”€β”€ LiquidTimeConstant #1 (CfC blend)
149
+ β”‚ β”œβ”€β”€ AdaGN (timestep conditioned)
150
+ β”‚ β”œβ”€β”€ ChannelMixMLP (GELU)
151
+ β”‚ └── LiquidTimeConstant #2 (CfC blend)
152
+ β”‚
153
+ β”œβ”€β”€β”€ LiquidBlock Γ— (depth/2) ←── add skip connections
154
+ β”‚ └── (same structure as above)
155
+ β”‚
156
+ β”œβ”€β”€β”€ GroupNorm + Conv + GELU
157
+ └─── Unpatchify (ConvTranspose2d) ──→ [B, 16, H/8, W/8]
158
+ ```
159
+
160
+ ## ⚑ Key Design Decisions
161
+
162
+ 1. **No Attention** β€” O(n) vs O(nΒ²). Enables training on longer sequences / higher resolution latents.
163
+ 2. **Liquid Dynamics over Residual** β€” Instead of `x + f(x)`, we use `Ξ±Β·x + (1-Ξ±)Β·f(x)` where Ξ± is learned per-channel. This gives the model explicit control over how much old vs new information to retain.
164
+ 3. **Zigzag Scanning** β€” Preserves spatial continuity (adjacent pixels stay adjacent in sequence). Simple raster scan breaks this at row boundaries.
165
+ 4. **Frozen Flux VAE** β€” 16-channel latent with best-in-class reconstruction quality. Only 160MB, ~1GB VRAM.
166
+ 5. **Flow Matching** β€” Straighter ODE trajectories than DDPM β†’ fewer sampling steps needed, better quality.
167
+
168
+ ## πŸ“œ License
169
+
170
+ MIT
171
+
172
+ ## πŸ™ Acknowledgments
173
+
174
+ - MIT CSAIL for Liquid Neural Networks research
175
+ - Black Forest Labs for FLUX.1-schnell VAE (Apache 2.0)
176
+ - WikiArt dataset contributors
177
+ - ZigMa, DiMSUM, DiffuSSM, DiM authors for attention-free diffusion insights