Upload README.md

fafdff9 verified 27 days ago

6.88 kB

	# 🌊 LiquidDiffusion: Attention-Free Image Generation with Liquid Neural Networks

	A novel image generation architecture that replaces all attention mechanisms with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Networks.

	This is genuinely novel research — no existing paper uses CfC/LTC as a diffusion model backbone.

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/krystv/liquid-diffusion/blob/main/LiquidDiffusion_Training.ipynb)

	## 🔬 Key Innovations

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| No Attention \| All spatial mixing via multi-scale depthwise convolutions (3×3, 5×5, 7×7) + global average pooling \|
	\| Fully Parallelizable \| No sequential ODE solving — CfC closed-form solution eliminates the computational bottleneck of Neural ODEs \|
	\| CfC × Diffusion Bridge \| The diffusion noise level `t` IS the liquid time constant — natural mathematical correspondence \|
	\| Liquid Relaxation Residuals \| Time-aware skip connections: `α·input + (1-α)·output` where `α = exp(-λ·t)` adapts to noise level \|
	\| Fits 16GB VRAM \| Tiny model (8M params) fits in ~4GB; designed for Colab free tier T4 \|

	## 📐 Architecture

	```
	Input: noisy image [B, 3, H, W] + timestep t ∈ [0, 1]

	Time Embedding: Sinusoidal PE → MLP → t_emb [B, dim]

	Conv Stem: 3×3 conv → SiLU → 3×3 conv

	Encoder:
	Stage 1: [LiquidDiffusionBlock × N₁] → DownSample (stride-2 conv)
	Stage 2: [LiquidDiffusionBlock × N₂] → DownSample
	Stage 3: [LiquidDiffusionBlock × N₃]

	Bottleneck: [LiquidDiffusionBlock × 2]

	Decoder (mirror of encoder):
	Stage 3: UpSample → SkipFusion → [LiquidDiffusionBlock × N₃]
	Stage 2: UpSample → SkipFusion → [LiquidDiffusionBlock × N₂]
	Stage 1: [LiquidDiffusionBlock × N₁]

	Output: GroupNorm → SiLU → 3×3 conv → velocity prediction [B, 3, H, W]
	```

	### LiquidDiffusionBlock
	```
	x → AdaLN(t) → ParallelCfC(t) → +residual
	→ MultiScaleSpatialMix(t) → +residual
	→ AdaLN(t) → FeedForward → +residual
	```

	### ParallelCfC (Core Innovation)
	```python
	# CfC Eq.10 adapted for 2D spatial features:
	backbone = SiLU(Conv1x1(DWConv7x7(x))) # shared spatial context
	f = Conv1x1(backbone) # time-constant gate
	g = DWConv→SiLU→Conv1x1(backbone) # "from" state
	h = DWConv→SiLU→Conv1x1(backbone) # "to" state (attractor)
	gate = σ(time_a(t_emb) · f - time_b(t_emb)) # liquid time gate
	cfc_out = gate · g + (1-gate) · h # CfC interpolation

	# Liquid relaxation residual:
	α = exp(-softplus(ρ) · \|t\|) # time-aware weight
	output = α · input + (1-α) · cfc_out # noise-adaptive residual
	```

	## 📊 Model Configurations

	\| Config \| Channels \| Blocks \| Params \| Resolution \| VRAM (fp16) \|
	\|--------\|----------\|--------\|--------\|-----------\|-------------\|
	\| tiny \| [64, 128, 256] \| [2, 2, 4] \| ~8M \| 256×256 \| ~4GB \|
	\| small \| [96, 192, 384] \| [2, 3, 6] \| ~25M \| 256×256 \| ~8GB \|
	\| base \| [128, 256, 512] \| [2, 4, 8] \| ~65M \| 512×512 \| ~14GB \|
	\| large \| [128, 256, 512, 768] \| [2, 4, 8, 4] \| ~120M \| 512×512 \| ~24GB \|

	## 🏋️ Training

	### Rectified Flow (simplest effective objective)
	```
	x_t = (1-t) · x_data + t · noise, t ~ U[0,1]
	Loss = \|\|model(x_t, t) - (noise - x_data)\|\|²
	```
	No noise schedule. No variance. Just MSE on a straight-line velocity.

	### Sampling (Euler ODE)
	```python
	z = randn(B, 3, H, W) # start from noise
	for i in range(N, 0, -1):
	t = i / N
	z = z - model(z, t) / N # Euler step
	```
	Typically 25-50 steps.

	### Quick Start
	```python
	from liquid_diffusion import liquid_diffusion_tiny, RectifiedFlowTrainer

	model = liquid_diffusion_tiny()
	trainer = RectifiedFlowTrainer(model, lr=1e-4, device='cuda')

	# Training step
	images = get_batch() # [B, 3, 256, 256] in [-1, 1]
	metrics = trainer.train_step(images)
	print(f"Loss: {metrics['loss']:.4f}")

	# Generate
	samples = trainer.sample(batch_size=4, image_size=256, num_steps=50)
	```

	### Recommended Datasets
	- CelebA-HQ (`huggan/CelebA-HQ`) — 30K face images, 256px
	- Flowers-102 (`huggan/flowers-102-categories`) — botanical images
	- AFHQ — 15K animal faces (cats, dogs, wildlife)
	- Any folder of images

	## 🧮 Mathematical Foundation

	### Liquid Time-Constant Networks (LTC)
	Hasani et al., AAAI 2021 — [arxiv:2006.04439](https://arxiv.org/abs/2006.04439)

	The fundamental ODE:
	```
	dx/dt = -[1/τ + f(x,I,θ)] · x + f(x,I,θ) · A
	```
	Key: system time constant `τ_sys = τ/(1 + τ·f)` is input-dependent — neurons adapt their response speed.

	### CfC: Closed-form Solution
	Hasani et al., Nature Machine Intelligence 2022 — [arxiv:2106.13898](https://arxiv.org/abs/2106.13898)

	Solves the LTC ODE analytically:
	```
	x(t) = σ(-f(x,I;θf)·t) ⊙ g(x,I;θg) + [1 - σ(-f(x,I;θf)·t)] ⊙ h(x,I;θh)
	```
	Eliminates ODE solver → fully parallelizable, one order of magnitude faster.

	### Our CfC-Diffusion Bridge
	We observe that CfC's time parameter `t` and diffusion's noise level `t` serve analogous roles:
	- CfC: `t` controls interpolation between "from" (g) and "to" (h) states
	- Diffusion: `t` controls the noise level the denoiser must handle

	By using the diffusion timestep directly as CfC's time parameter:
	- `t≈0` (clean): gate ≈ 0.5 → balanced g/h → flexible detail processing
	- `t≈1` (noisy): gate saturates → specialized denoising behavior
	- The gate function `f` is input-dependent → each image region gets adaptive time response

	### Parallel Liquid Relaxation (from LiquidTAD)
	[arxiv:2604.18274](https://arxiv.org/abs/2604.18274)

	```
	α = exp(-softplus(ρ) · t_diff)
	output = α · input + (1-α) · gated_transform(input)
	```
	When `t` is large (noisy): α ≈ 0 → rely on CfC output (needs strong processing).
	When `t` is small (clean): α ≈ 1 → preserve input (only minor refinement needed).

	## 📚 References

	1. Hasani et al., "Liquid Time-constant Networks", AAAI 2021 — [arxiv:2006.04439](https://arxiv.org/abs/2006.04439)
	2. Hasani et al., "Closed-form Continuous-time Neural Networks", Nature MI 2022 — [arxiv:2106.13898](https://arxiv.org/abs/2106.13898)
	3. Lechner et al., "Neural Circuit Policies", Nature MI 2020
	4. LiquidTAD: Parallel liquid relaxation — [arxiv:2604.18274](https://arxiv.org/abs/2604.18274)
	5. USM: U-Shape Mamba for diffusion — [arxiv:2504.13499](https://arxiv.org/abs/2504.13499)
	6. DiffuSSM: Diffusion without attention — [arxiv:2311.18257](https://arxiv.org/abs/2311.18257)
	7. Liu et al., "Flow Straight and Fast: Rectified Flow", ICLR 2023 — [arxiv:2209.03003](https://arxiv.org/abs/2209.03003)
	8. Lee et al., "Improving the Training of Rectified Flows" — [arxiv:2405.20320](https://arxiv.org/abs/2405.20320)

	## License

	MIT