File size: 4,471 Bytes
1a08b06
a9fae37
1a08b06
a9fae37
1a08b06
a9fae37
1a08b06
a9fae37
1a08b06
 
 
20523ee
1a08b06
 
20523ee
a9fae37
20523ee
a9fae37
20523ee
 
 
 
fafdff9
1a08b06
fafdff9
a9fae37
20523ee
 
 
 
fafdff9
a9fae37
20523ee
 
 
 
 
a9fae37
20523ee
a9fae37
fafdff9
20523ee
 
 
 
 
a9fae37
 
20523ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9fae37
1a08b06
fafdff9
20523ee
 
 
 
 
fafdff9
20523ee
a9fae37
20523ee
1a08b06
20523ee
 
 
fafdff9
a9fae37
20523ee
 
1a08b06
a9fae37
1a08b06
 
 
20523ee
1a08b06
 
20523ee
1a08b06
a9fae37
1a08b06
a9fae37
 
1a08b06
 
 
20523ee
 
1a08b06
 
a9fae37
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# 🌊 LiquidDiffusion

**A novel attention-free image generation model based on Liquid Neural Networks**

## What is this?

LiquidDiffusion is a **first-of-its-kind** image generation model that replaces attention with **Parallel CfC (Closed-form Continuous-depth) blocks** from Liquid Neural Network research. No existing paper combines LNNs with image generation β€” this fills that gap.

### Key Properties
- βœ… **Zero attention layers** β€” fully convolutional + liquid time-gating
- βœ… **Fully parallelizable** β€” no ODE solvers, no sequential scanning, no recurrence
- βœ… **Latent space training** β€” uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
- βœ… **Fits 16GB VRAM** β€” tiny config runs 256px at batch=8 on T4 GPU
- βœ… **Simple training** β€” Rectified Flow (MSE velocity prediction, no noise schedule)
- βœ… **6 verified datasets** β€” all tested and working with streaming support

## Quick Start (Colab)

1. Open `LiquidDiffusion_Training.ipynb` in Colab
2. Select GPU runtime (T4)
3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β€” animal faces)
4. Run all cells β†’ training starts, samples generated every 500 steps

## Architecture

```
Pixel Image (3Γ—256Γ—256)
    β†’ [Frozen SD-VAE Encode] β†’ Latent (4Γ—32Γ—32)
    β†’ [LiquidDiffusion U-Net] β†’ Velocity prediction (4Γ—32Γ—32)
    β†’ [Frozen SD-VAE Decode] β†’ Generated Image (3Γ—256Γ—256)
```

Each **LiquidDiffusionBlock** contains:
1. **AdaLN** β€” timestep conditioning via learned scale/shift
2. **ParallelCfCBlock** β€” the core liquid neural network layer (CfC Eq.10)
3. **MultiScaleSpatialMix** β€” 3Γ—3+5Γ—5+7Γ—7 depthwise conv + global pooling (replaces attention)
4. **FeedForward** β€” channel mixing via 1Γ—1 conv

### The ParallelCfC Block

```python
# CfC Eq.10 adapted for images:
gate = Οƒ(time_a(t_emb) Β· f(features) - time_b(t_emb))   # liquid time-gating
out = gate Β· g(features) + (1 - gate) Β· h(features)       # CfC interpolation
Ξ± = exp(-Ξ» Β· |t_emb|)                                     # liquid relaxation
output = Ξ± Β· input + (1 - Ξ±) Β· out                         # time-aware residual
```

## Verified Datasets

All tested and working (with streaming support):

| Dataset | Images | Description | Native Resolution |
|---------|--------|-------------|-------------------|
| `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ—512 |
| `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ—218 |
| `huggan/flowers-102-categories` | 8K | Flower photographs | Variable |
| `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ—1280 |
| `huggan/anime-faces` | 63K | Anime faces | 64Γ—64 |
| `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ—512 |

## VAE

Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training):
- 4 latent channels, 8Γ— spatial downscale
- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
- ~160MB VRAM in fp16
- Scaling factor: 0.18215

## Model Configs

| Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
|--------|--------|---------------------|------------|
| tiny | ~23M | ~6 GB | ~12 GB |
| small | ~69M | ~10 GB | ~20 GB |
| base | ~154M | ~16 GB | ~30 GB |

## Training

**Objective**: Rectified Flow β€” simple MSE on velocity
```python
x_t = (1 - t) Β· x0 + t Β· noise     # linear interpolation
v_target = noise - x0                # constant velocity
loss = MSE(model(x_t, t), v_target)  # that's it!
```

**Sampling**: Euler ODE integration, 25-50 steps

## References

| Paper | Contribution |
|-------|-------------|
| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE |
| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion |
| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |

## Files

```
β”œβ”€β”€ liquid_diffusion/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ model.py             # Full model architecture
β”‚   └── trainer.py           # Trainer + dataset utilities
β”œβ”€β”€ LiquidDiffusion_Training.ipynb  # Complete Colab notebook
β”œβ”€β”€ test_model.py
└── README.md
```

## License
MIT