File size: 27,255 Bytes

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 🌊 LiquidDiffusion: Attention-Free Image Generation with Liquid Neural Networks\n",
        "\n",
        "**A novel image generation architecture** that replaces attention with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Networks.\n",
        "\n",
        "## Key Innovations\n",
        "- **No attention mechanism** — all spatial mixing via multi-scale depthwise convolutions\n",
        "- **Fully parallelizable** — no sequential ODE solving loops (unlike original LTC/Neural ODE)\n",
        "- **Diffusion timestep IS the liquid time constant** — natural CfC-diffusion bridge\n",
        "- **Liquid relaxation residuals** — time-aware skip connections that adapt to noise level\n",
        "- **Fits in 16GB VRAM** — designed for Colab free tier (T4 GPU)\n",
        "\n",
        "## Architecture Based On\n",
        "- [CfC Networks](https://arxiv.org/abs/2106.13898) (Hasani et al., Nature Machine Intelligence 2022)\n",
        "- [LiquidTAD](https://arxiv.org/abs/2604.18274) — parallel liquid relaxation\n",
        "- [USM](https://arxiv.org/abs/2504.13499) — U-Shape architecture for diffusion\n",
        "- [Rectified Flow](https://arxiv.org/abs/2209.03003) — simplest flow matching objective\n",
        "\n",
        "## Training: Rectified Flow\n",
        "```\n",
        "x_t = (1-t)*x0 + t*noise,  t ~ U[0,1]\n",
        "Loss = MSE(model(x_t, t), noise - x0)  # velocity prediction\n",
        "```\n",
        "That's it — no noise schedule, no variance, just MSE on a straight-line velocity."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🔧 Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Install dependencies\n",
        "!pip install -q torch torchvision datasets Pillow matplotlib tqdm accelerate"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Clone the repo\n",
        "!git clone https://huggingface.co/krystv/liquid-diffusion\n",
        "%cd liquid-diffusion"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import torch\n",
        "print(f'PyTorch: {torch.__version__}')\n",
        "print(f'CUDA available: {torch.cuda.is_available()}')\n",
        "if torch.cuda.is_available():\n",
        "    print(f'GPU: {torch.cuda.get_device_name(0)}')\n",
        "    print(f'VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 📐 Architecture Overview\n",
        "\n",
        "The core innovation is the **ParallelCfCBlock** — a parallelized version of CfC (Closed-form Continuous-depth) networks adapted for 2D image features:\n",
        "\n",
        "```\n",
        "CfC Equation (Hasani et al. 2022, Eq. 10):\n",
        "    x(t) = σ(-f·t) ⊙ g + (1 - σ(-f·t)) ⊙ h\n",
        "\n",
        "Our adaptation for image generation:\n",
        "    backbone = SiLU(PointwiseConv(DepthwiseConv(features)))  # shared spatial context\n",
        "    f = Conv1x1(backbone)                                     # time-constant gate\n",
        "    g = DWConv→SiLU→Conv1x1(backbone)                        # \"from\" state\n",
        "    h = DWConv→SiLU→Conv1x1(backbone)                        # \"to\" state (attractor)\n",
        "    gate = σ(time_a(t_emb) · f - time_b(t_emb))             # liquid time gate\n",
        "    cfc_out = gate · g + (1 - gate) · h                      # CfC interpolation\n",
        "    \n",
        "    # Liquid relaxation (from LiquidTAD):\n",
        "    α = exp(-softplus(ρ) · |t|)                              # time-aware residual weight\n",
        "    output = α · input + (1 - α) · cfc_out                   # adapts to noise level\n",
        "```\n",
        "\n",
        "The **diffusion timestep t** serves double duty:\n",
        "1. Standard: conditions the denoiser via AdaLN scale/shift\n",
        "2. Novel: acts as the CfC time parameter — controls interpolation between g and h\n",
        "\n",
        "This means: at low noise (t≈0), the gate is balanced → flexible processing.\n",
        "At high noise (t≈1), the gate saturates → specialized denoising."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🧪 Quick Test (verify model works)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Run the test suite\n",
        "!python test_model.py"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## ⚙️ Training Configuration\n",
        "\n",
        "Choose your config based on GPU and target resolution:\n",
        "\n",
        "| Config | Params | Resolution | Batch Size | VRAM | Training Time |\n",
        "|--------|--------|-----------|------------|------|---------------|\n",
        "| tiny | ~8M | 256×256 | 8 | ~6GB | ~3h (100K steps) |\n",
        "| small | ~25M | 256×256 | 4 | ~10GB | ~6h (100K steps) |\n",
        "| base | ~65M | 512×512 | 2 | ~14GB | ~12h (100K steps) |\n",
        "\n",
        "Recommended datasets:\n",
        "- `huggan/CelebA-HQ` — 30K high-quality face images (256px)\n",
        "- `huggan/flowers-102-categories` — flowers (various)\n",
        "- `lambdalabs/naruto-blip-captions` — anime style (~1K)\n",
        "- `Norod78/simpsons-blip-captions` — cartoon style\n",
        "- Any folder of images"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title Training Configuration {display-mode: \"form\"}\n",
        "\n",
        "#@markdown ### Model\n",
        "model_size = \"tiny\"  #@param [\"tiny\", \"small\", \"base\"]\n",
        "\n",
        "#@markdown ### Data\n",
        "dataset_name = \"huggan/CelebA-HQ\"  #@param {type:\"string\"}\n",
        "image_column = \"image\"  #@param {type:\"string\"}\n",
        "image_size = 256  #@param [64, 128, 256, 512] {type:\"integer\"}\n",
        "max_samples = 0  #@param {type:\"integer\"}\n",
        "\n",
        "#@markdown ### Training\n",
        "batch_size = 8  #@param {type:\"integer\"}\n",
        "learning_rate = 1e-4  #@param {type:\"number\"}\n",
        "weight_decay = 0.01  #@param {type:\"number\"}\n",
        "total_steps = 100000  #@param {type:\"integer\"}\n",
        "warmup_steps = 1000  #@param {type:\"integer\"}\n",
        "grad_clip = 1.0  #@param {type:\"number\"}\n",
        "ema_decay = 0.9999  #@param {type:\"number\"}\n",
        "time_sampling = \"logit_normal\"  #@param [\"uniform\", \"logit_normal\"]\n",
        "\n",
        "#@markdown ### Sampling & Logging\n",
        "sample_every = 2000  #@param {type:\"integer\"}\n",
        "save_every = 5000  #@param {type:\"integer\"}\n",
        "num_sample_steps = 50  #@param {type:\"integer\"}\n",
        "num_sample_images = 4  #@param {type:\"integer\"}\n",
        "\n",
        "#@markdown ### Hardware\n",
        "use_amp = True  #@param {type:\"boolean\"}\n",
        "amp_dtype = \"float16\"  #@param [\"float16\", \"bfloat16\"]\n",
        "num_workers = 2  #@param {type:\"integer\"}\n",
        "\n",
        "# Auto-adjust batch size for resolution\n",
        "if image_size >= 512 and batch_size > 4:\n",
        "    batch_size = min(batch_size, 2)\n",
        "    print(f\"Auto-reduced batch_size to {batch_size} for {image_size}px\")\n",
        "\n",
        "if max_samples == 0:\n",
        "    max_samples = None\n",
        "\n",
        "print(f\"\\nConfig: {model_size} model, {image_size}px, batch={batch_size}, lr={learning_rate}\")\n",
        "print(f\"Dataset: {dataset_name}, time_sampling={time_sampling}\")\n",
        "print(f\"Total steps: {total_steps:,}, AMP: {use_amp} ({amp_dtype})\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 📦 Load Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from datasets import load_dataset\n",
        "from liquid_diffusion.trainer import ImageDataset\n",
        "from torch.utils.data import DataLoader\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
        "# Load dataset\n",
        "print(f\"Loading {dataset_name}...\")\n",
        "dataset = ImageDataset(\n",
        "    source=dataset_name,\n",
        "    image_size=image_size,\n",
        "    image_column=image_column,\n",
        "    max_samples=max_samples,\n",
        ")\n",
        "print(f\"Dataset size: {len(dataset)} images\")\n",
        "\n",
        "dataloader = DataLoader(\n",
        "    dataset, batch_size=batch_size, shuffle=True,\n",
        "    num_workers=num_workers, pin_memory=True, drop_last=True,\n",
        ")\n",
        "\n",
        "# Show some samples\n",
        "sample_batch = next(iter(dataloader))\n",
        "fig, axes = plt.subplots(1, min(4, batch_size), figsize=(16, 4))\n",
        "for i, ax in enumerate(axes):\n",
        "    img = sample_batch[i].permute(1, 2, 0).numpy() * 0.5 + 0.5  # [-1,1] -> [0,1]\n",
        "    ax.imshow(np.clip(img, 0, 1))\n",
        "    ax.axis('off')\n",
        "plt.suptitle(f'Training samples ({image_size}×{image_size})')\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🏗️ Build Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from liquid_diffusion.model import (\n",
        "    liquid_diffusion_tiny, liquid_diffusion_small, liquid_diffusion_base\n",
        ")\n",
        "\n",
        "# Build model\n",
        "model_factories = {\n",
        "    'tiny': liquid_diffusion_tiny,\n",
        "    'small': liquid_diffusion_small,\n",
        "    'base': liquid_diffusion_base,\n",
        "}\n",
        "\n",
        "model = model_factories[model_size]()\n",
        "total_params, trainable_params = model.count_params()\n",
        "print(f\"Model: liquid_diffusion_{model_size}\")\n",
        "print(f\"Parameters: {total_params:,} ({total_params/1e6:.1f}M)\")\n",
        "print(f\"Trainable: {trainable_params:,}\")\n",
        "\n",
        "# Quick forward pass test\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "model = model.to(device)\n",
        "test_x = torch.randn(1, 3, image_size, image_size, device=device)\n",
        "test_t = torch.tensor([0.5], device=device)\n",
        "with torch.no_grad():\n",
        "    test_out = model(test_x, test_t)\n",
        "print(f\"Forward pass OK: {test_x.shape} → {test_out.shape}\")\n",
        "del test_x, test_out\n",
        "if device == 'cuda':\n",
        "    torch.cuda.empty_cache()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🚀 Train!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import time\n",
        "import math\n",
        "from tqdm.auto import tqdm\n",
        "from torchvision.utils import save_image, make_grid\n",
        "from liquid_diffusion.trainer import RectifiedFlowTrainer, get_cosine_schedule_with_warmup\n",
        "\n",
        "# Create output directories\n",
        "os.makedirs('checkpoints', exist_ok=True)\n",
        "os.makedirs('samples', exist_ok=True)\n",
        "\n",
        "# Build trainer\n",
        "trainer = RectifiedFlowTrainer(\n",
        "    model=model,\n",
        "    lr=learning_rate,\n",
        "    weight_decay=weight_decay,\n",
        "    ema_decay=ema_decay,\n",
        "    grad_clip=grad_clip,\n",
        "    time_sampling=time_sampling,\n",
        "    device=device,\n",
        "    use_amp=use_amp,\n",
        "    amp_dtype=amp_dtype,\n",
        ")\n",
        "\n",
        "# Learning rate scheduler\n",
        "scheduler = get_cosine_schedule_with_warmup(\n",
        "    trainer.optimizer, warmup_steps, total_steps\n",
        ")\n",
        "\n",
        "# Optional: resume from checkpoint\n",
        "resume_path = 'checkpoints/latest.pt'\n",
        "if os.path.exists(resume_path):\n",
        "    trainer.load_checkpoint(resume_path)\n",
        "    print(f\"Resumed from step {trainer.step}\")\n",
        "\n",
        "print(f\"\\n{'='*60}\")\n",
        "print(f\"Starting training: {total_steps:,} steps\")\n",
        "print(f\"Model: liquid_diffusion_{model_size} ({total_params/1e6:.1f}M params)\")\n",
        "print(f\"Resolution: {image_size}×{image_size}, Batch: {batch_size}\")\n",
        "print(f\"LR: {learning_rate}, Warmup: {warmup_steps}, AMP: {use_amp}\")\n",
        "print(f\"{'='*60}\\n\")\n",
        "\n",
        "# Training loop\n",
        "start_time = time.time()\n",
        "data_iter = iter(dataloader)\n",
        "pbar = tqdm(range(trainer.step, total_steps), desc='Training', dynamic_ncols=True)\n",
        "loss_history = []\n",
        "\n",
        "for step in pbar:\n",
        "    # Get batch (cycle through dataset)\n",
        "    try:\n",
        "        batch = next(data_iter)\n",
        "    except StopIteration:\n",
        "        data_iter = iter(dataloader)\n",
        "        batch = next(data_iter)\n",
        "    \n",
        "    x0 = batch.to(device)\n",
        "    \n",
        "    # Train step\n",
        "    metrics = trainer.train_step(x0)\n",
        "    scheduler.step()\n",
        "    \n",
        "    # Logging\n",
        "    loss_history.append(metrics['loss'])\n",
        "    avg_loss = sum(loss_history[-100:]) / len(loss_history[-100:])\n",
        "    lr_current = scheduler.get_last_lr()[0]\n",
        "    \n",
        "    pbar.set_postfix({\n",
        "        'loss': f\"{metrics['loss']:.4f}\",\n",
        "        'avg': f\"{avg_loss:.4f}\",\n",
        "        'lr': f\"{lr_current:.6f}\",\n",
        "        'gn': f\"{metrics['grad_norm']:.2f}\",\n",
        "    })\n",
        "    \n",
        "    # Generate samples\n",
        "    if (step + 1) % sample_every == 0 or step == 0:\n",
        "        print(f\"\\nGenerating samples at step {step+1}...\")\n",
        "        samples = trainer.sample(\n",
        "            batch_size=num_sample_images, image_size=image_size,\n",
        "            num_steps=num_sample_steps, use_ema=True\n",
        "        )\n",
        "        # Save grid\n",
        "        grid = make_grid(samples * 0.5 + 0.5, nrow=int(math.sqrt(num_sample_images)), padding=2)\n",
        "        save_image(grid, f'samples/step_{step+1:06d}.png')\n",
        "        \n",
        "        # Display\n",
        "        fig, axes = plt.subplots(1, num_sample_images, figsize=(4*num_sample_images, 4))\n",
        "        if num_sample_images == 1:\n",
        "            axes = [axes]\n",
        "        for i, ax in enumerate(axes):\n",
        "            img = samples[i].cpu().permute(1, 2, 0).numpy() * 0.5 + 0.5\n",
        "            ax.imshow(np.clip(img, 0, 1))\n",
        "            ax.axis('off')\n",
        "        plt.suptitle(f'Step {step+1} (EMA samples, {num_sample_steps} Euler steps)')\n",
        "        plt.tight_layout()\n",
        "        plt.show()\n",
        "    \n",
        "    # Save checkpoint\n",
        "    if (step + 1) % save_every == 0:\n",
        "        trainer.save_checkpoint(f'checkpoints/step_{step+1:06d}.pt', extra={'config': {\n",
        "            'model_size': model_size, 'image_size': image_size,\n",
        "            'batch_size': batch_size, 'learning_rate': learning_rate,\n",
        "        }})\n",
        "        trainer.save_checkpoint('checkpoints/latest.pt')\n",
        "        print(f\"Saved checkpoint at step {step+1}\")\n",
        "    \n",
        "    # Safety: check for NaN\n",
        "    if math.isnan(metrics['loss']):\n",
        "        print(\"\\n⚠️ NaN loss detected! Stopping training.\")\n",
        "        print(\"Try: reduce learning_rate, increase grad_clip, or use smaller model\")\n",
        "        break\n",
        "\n",
        "elapsed = time.time() - start_time\n",
        "print(f\"\\nTraining complete! {trainer.step:,} steps in {elapsed/3600:.1f}h\")\n",
        "print(f\"Final avg loss: {sum(loss_history[-100:])/len(loss_history[-100:]):.4f}\")\n",
        "\n",
        "# Final save\n",
        "trainer.save_checkpoint('checkpoints/final.pt')\n",
        "print(\"Saved final checkpoint.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 📊 Training Loss Curve"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
        "if loss_history:\n",
        "    # Smooth the loss\n",
        "    window = min(100, len(loss_history) // 5 + 1)\n",
        "    smoothed = np.convolve(loss_history, np.ones(window)/window, mode='valid')\n",
        "    \n",
        "    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))\n",
        "    \n",
        "    ax1.plot(loss_history, alpha=0.3, label='Raw')\n",
        "    ax1.plot(range(window-1, len(loss_history)), smoothed, label=f'Smoothed (w={window})')\n",
        "    ax1.set_xlabel('Step')\n",
        "    ax1.set_ylabel('Loss')\n",
        "    ax1.set_title('Training Loss')\n",
        "    ax1.legend()\n",
        "    ax1.grid(True, alpha=0.3)\n",
        "    \n",
        "    ax2.plot(loss_history[-min(1000, len(loss_history)):], alpha=0.5)\n",
        "    ax2.set_xlabel('Recent Steps')\n",
        "    ax2.set_ylabel('Loss')\n",
        "    ax2.set_title('Recent Loss (last 1000 steps)')\n",
        "    ax2.grid(True, alpha=0.3)\n",
        "    \n",
        "    plt.tight_layout()\n",
        "    plt.show()\n",
        "else:\n",
        "    print(\"No training history yet.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🎨 Generate Images"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#@title Generation Settings {display-mode: \"form\"}\n",
        "num_images = 8  #@param {type:\"integer\"}\n",
        "sampling_steps = 50  #@param [25, 50, 100, 200] {type:\"integer\"}\n",
        "use_ema_model = True  #@param {type:\"boolean\"}\n",
        "\n",
        "print(f\"Generating {num_images} images with {sampling_steps} Euler steps...\")\n",
        "samples = trainer.sample(\n",
        "    batch_size=num_images, image_size=image_size,\n",
        "    num_steps=sampling_steps, use_ema=use_ema_model,\n",
        ")\n",
        "\n",
        "# Display\n",
        "ncols = min(4, num_images)\n",
        "nrows = (num_images + ncols - 1) // ncols\n",
        "fig, axes = plt.subplots(nrows, ncols, figsize=(4*ncols, 4*nrows))\n",
        "if nrows == 1 and ncols == 1:\n",
        "    axes = [[axes]]\n",
        "elif nrows == 1:\n",
        "    axes = [axes]\n",
        "for i in range(num_images):\n",
        "    r, c = i // ncols, i % ncols\n",
        "    img = samples[i].cpu().permute(1, 2, 0).numpy() * 0.5 + 0.5\n",
        "    axes[r][c].imshow(np.clip(img, 0, 1))\n",
        "    axes[r][c].axis('off')\n",
        "# Hide unused axes\n",
        "for i in range(num_images, nrows * ncols):\n",
        "    r, c = i // ncols, i % ncols\n",
        "    axes[r][c].axis('off')\n",
        "plt.suptitle(f'LiquidDiffusion Samples ({sampling_steps} steps, {\"EMA\" if use_ema_model else \"online\"})')\n",
        "plt.tight_layout()\n",
        "plt.show()\n",
        "\n",
        "# Save\n",
        "grid = make_grid(samples * 0.5 + 0.5, nrow=ncols, padding=2)\n",
        "save_image(grid, 'samples/generated.png')\n",
        "print(\"Saved to samples/generated.png\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 🔬 Visualize the Denoising Process"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Show step-by-step denoising\n",
        "num_vis_steps = 10\n",
        "total_euler_steps = 50\n",
        "vis_interval = total_euler_steps // num_vis_steps\n",
        "\n",
        "model_vis = trainer.ema_model\n",
        "model_vis.eval()\n",
        "\n",
        "z = torch.randn(1, 3, image_size, image_size, device=device)\n",
        "dt = 1.0 / total_euler_steps\n",
        "intermediates = [z.clone()]\n",
        "\n",
        "with torch.no_grad():\n",
        "    for i in range(total_euler_steps, 0, -1):\n",
        "        t = torch.full((1,), i / total_euler_steps, device=device)\n",
        "        v = model_vis(z, t)\n",
        "        z = z - v * dt\n",
        "        if (total_euler_steps - i + 1) % vis_interval == 0:\n",
        "            intermediates.append(z.clone())\n",
        "\n",
        "intermediates.append(z.clamp(-1, 1))\n",
        "\n",
        "fig, axes = plt.subplots(1, len(intermediates), figsize=(3*len(intermediates), 3))\n",
        "for idx, (ax, img_t) in enumerate(zip(axes, intermediates)):\n",
        "    img = img_t[0].cpu().permute(1, 2, 0).numpy() * 0.5 + 0.5\n",
        "    ax.imshow(np.clip(img, 0, 1))\n",
        "    ax.axis('off')\n",
        "    if idx == 0:\n",
        "        ax.set_title('Noise (t=1)')\n",
        "    elif idx == len(intermediates) - 1:\n",
        "        ax.set_title('Output (t=0)')\n",
        "    else:\n",
        "        ax.set_title(f't={1-idx*vis_interval/total_euler_steps:.1f}')\n",
        "plt.suptitle('LiquidDiffusion Denoising Process')\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 💾 Save & Export Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Save final checkpoint\n",
        "trainer.save_checkpoint('checkpoints/final.pt', extra={\n",
        "    'config': {\n",
        "        'model_size': model_size,\n",
        "        'image_size': image_size,\n",
        "        'total_params': total_params,\n",
        "        'training_steps': trainer.step,\n",
        "        'dataset': dataset_name,\n",
        "    }\n",
        "})\n",
        "print(f\"Saved checkpoint: checkpoints/final.pt\")\n",
        "print(f\"Model: liquid_diffusion_{model_size} ({total_params/1e6:.1f}M params)\")\n",
        "print(f\"Trained for {trainer.step:,} steps on {dataset_name}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Optional: Push to Hugging Face Hub\n",
        "# Uncomment and fill in your details:\n",
        "\n",
        "# from huggingface_hub import HfApi, login\n",
        "# login()  # or use token\n",
        "# api = HfApi()\n",
        "# repo_id = \"your-username/liquid-diffusion-celebahq-256\"  # change this\n",
        "# api.create_repo(repo_id, exist_ok=True)\n",
        "# api.upload_file('checkpoints/final.pt', 'model.pt', repo_id)\n",
        "# api.upload_folder('liquid_diffusion/', 'liquid_diffusion/', repo_id)\n",
        "# print(f\"Uploaded to https://huggingface.co/{repo_id}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 📚 Architecture Details & Theory\n",
        "\n",
        "### Why Liquid Neural Networks for Image Generation?\n",
        "\n",
        "**Liquid Time-Constant (LTC) Networks** (Hasani et al., 2020) define neurons with input-dependent time constants:\n",
        "\n",
        "```\n",
        "dx/dt = -[1/τ + f(x,I,θ)] · x + f(x,I,θ) · A\n",
        "```\n",
        "\n",
        "The system time constant `τ_sys = τ/(1 + τ·f)` adapts dynamically based on input — the neuron speeds up or slows down its response depending on what it sees. This is the \"liquid\" property.\n",
        "\n",
        "**CfC (Closed-form Continuous-depth)** networks (Hasani et al., 2022) solve this ODE in closed form:\n",
        "\n",
        "```\n",
        "x(t) = σ(-f·t) ⊙ g + (1 - σ(-f·t)) ⊙ h\n",
        "```\n",
        "\n",
        "This eliminates the ODE solver — making CfC **fully parallelizable** while preserving the adaptive time constant behavior.\n",
        "\n",
        "### Our Innovation: CfC × Diffusion Timestep\n",
        "\n",
        "In diffusion models, the network must process images at different noise levels `t ∈ [0,1]`. We observe that:\n",
        "\n",
        "1. CfC's time parameter `t` controls interpolation between two learned states\n",
        "2. Diffusion's noise level `t` controls how the denoiser should behave\n",
        "3. **These are the same concept** — the CfC time parameter IS the diffusion timestep\n",
        "\n",
        "This gives us:\n",
        "- At `t≈0` (clean images): σ(-f·t)≈0.5, balanced processing for detail refinement\n",
        "- At `t≈1` (noisy images): σ(-f·t) saturates, specialized denoising\n",
        "- The gate `f` is **input-dependent** — different image content gets different time responses\n",
        "\n",
        "### References\n",
        "\n",
        "1. Hasani et al., \"Liquid Time-constant Networks\" (AAAI 2021) — arxiv:2006.04439\n",
        "2. Hasani et al., \"Closed-form Continuous-time Neural Networks\" (Nature MI 2022) — arxiv:2106.13898\n",
        "3. LiquidTAD: Parallel liquid relaxation — arxiv:2604.18274\n",
        "4. USM: U-Shape Mamba for diffusion — arxiv:2504.13499\n",
        "5. DiffuSSM: Diffusion without attention — arxiv:2311.18257\n",
        "6. Liu et al., \"Flow Straight and Fast: Rectified Flow\" (ICLR 2023) — arxiv:2209.03003"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.10.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}