Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

This is the official model repository for M-GDM, introduced in the paper Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model, accepted at CVPR 2025.

M-GDM (Metadata-Guided Diffusion Model) is designed for the blind recovery of videos corrupted during bitstream transmission or storage. It leverages intrinsic video metadata—such as motion vectors and frame types—to identify corrupted regions and guide the diffusion-based restoration process.

Project Page | Paper | Code

Method Overview

The M-GDM pipeline consists of two main stages:

Stage 1 — Metadata-guided Diffusion UNet: This stage is conditioned on the corrupted frame, per-pixel motion vectors, and frame-type embeddings. It generates a restored video output and a predicted corruption mask.
Stage 2 — Post-Refinement Module (PRM): This module uses stacked residual Swin Transformer blocks to refine the diffusion output, ensuring consistency between the recovered regions and the original intact pixels.

Usage

For full installation and data preparation instructions, please visit the official GitHub repository.

Installation

git clone https://github.com/Shuyun-Wang/M-GDM.git
cd M-GDM
pip install -r requirements.txt

Download Checkpoints

You can download the model weights directly from this Hugging Face repository:

hf download Shuyun-Wang/M-GDM \
    --exclude "DAVIS.tar.gz" \
    --local-dir checkpoints

Inference

To run a quick test on corrupted frames:

python inference.py

Citation

@InProceedings{Wang_2025_CVPR,
    author    = {Wang, Shuyun and Zhang, Hu and Shen, Xin and Wang, Dadong and Yu, Xin},
    title     = {Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {22975-22984}
}

Acknowledgements

The authors acknowledge the use of code and datasets from E2FGVI, SwinIR, LGVI, and the BSCV dataset.

Downloads last month: -

Paper for Shuyun-Wang/M-GDM

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

Paper • 2604.13906 • Published 2 days ago