Video-to-Video
Diffusers
Safetensors
W-Shuoyan commited on
Commit
f032d59
Β·
verified Β·
1 Parent(s): 82bcd1b

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +100 -0
  3. figures/framework.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/framework.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion (arXiv 2026)
2
+
3
+ **Authors**: [Shuoyan Wei](https://github.com/W-Shuoyan)<sup>1</sup>, [Feng Li](https://lifengcs.github.io/)<sup>2,\*</sup>, Chen Zhou<sup>1</sup>, [Runmin Cong](https://rmcong.github.io)<sup>3</sup>, [Yao Zhao](https://scholar.google.com/citations?user=474TbQYAAAAJ&hl=en&oi=ao)<sup>1</sup>, [Huihui Bai](https://scholar.google.com/citations?user=iXuCUcQAAAAJ&hl=en&oi=ao)<sup>1</sup>
4
+
5
+ <sup>1</sup>*Beijing Jiaotong University*, <sup>2</sup>*Hefei University of Technology*, <sup>3</sup>*Shandong University*
6
+
7
+ <small><sup>\*</sup>Corresponding Author</small>
8
+
9
+ [![arXiv](https://img.shields.io/badge/arXiv-2601.20308-da282a)](https://arxiv.org/abs/2601.20308)
10
+ [![Hugging Face](https://img.shields.io/badge/πŸ€—-%20Hugging%20Face-yellow)](https://huggingface.co/W-Shuoyan/OSDEnhancer)
11
+ [![GitHub Stars](https://img.shields.io/github/stars/W-Shuoyan/OSDEnhancer?style=social)](https://github.com/W-Shuoyan/OSDEnhancer)
12
+
13
+ This repository contains the reference code for the paper "[**Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion**](https://arxiv.org/pdf/2601.20308)".
14
+
15
+ ---
16
+
17
+
18
+ ![HEAD](figures/framework.png)
19
+
20
+ **In this paper, we propose OSDEnhancer, the first framework that achieves real-world STVSR in one-step diffusion.** Given a low-resolution and low-frame-rate video as input, OSDEnhancer generates a high-resolution and high-frame-rate video.
21
+
22
+ OSDEnhancer begins with a linear initialization to establish essential spatiotemporal structures and adapt the model for one-step reconstruction. It then applies a divide-and-conquer strategy, introducing the temporal coherence (TC) and texture enrichment (TE) LoRAs that progressively specialize in inter-frame dynamics modeling and fine-grained texture recovery, respectively, while collaborating during inference for enhanced overall performance. A bidirectional VAE decoder employs deformable recurrent blocks to leverage the multi-scale structure of the vanilla VAE, enhancing latent-to-pixel reconstruction through joint multi-scale deformable aggregation and inter-frame feature propagation.
23
+
24
+ ## πŸ”ˆNews
25
+
26
+ - βœ… **[May 2026]** The inference code and pretrained checkpoints are now available πŸ‘‰ [![GitHub Stars](https://img.shields.io/github/stars/W-Shuoyan/OSDEnhancer?style=social)](https://github.com/W-Shuoyan/OSDEnhancer) [![Hugging Face](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-OSDEnhancer-yellow)](https://huggingface.co/W-Shuoyan/OSDEnhancer)
27
+ - βœ… **[Jan 2026]** The arXiv version of our paper has been released πŸ‘‰ [![arXiv](https://img.shields.io/badge/arXiv-2601.20308-da282a)](https://arxiv.org/abs/2601.20308)
28
+
29
+
30
+ ## πŸ“š Installation
31
+
32
+ ```shell
33
+ git clone https://github.com/W-Shuoyan/OSDEnhancer.git
34
+ cd OSDEnhancer
35
+ conda create -n OSDEnhancer python=3.10
36
+ conda activate OSDEnhancer
37
+ pip install torch==2.8.0+cu128 torchvision==0.23.0+cu128 --index-url https://download.pytorch.org/whl/cu128
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ ## πŸš€ Usage
42
+
43
+ ### Pretrained Checkpoints
44
+ Download the pretrained checkpoint below.
45
+ | Model Name| Base Model | Download Link πŸ”—|
46
+ |---|---|---|
47
+ | OSDEnhancer-v1.0 | [CogVideoX1.5-5B](https://huggingface.co/zai-org/CogVideoX1.5-5B) | [πŸ€— Hugging Face](https://huggingface.co/W-Shuoyan/OSDEnhancer) |
48
+
49
+
50
+ The checkpoint directory should be organized as follows:
51
+ ```text
52
+ ckpt/
53
+ β”œβ”€β”€ transformer/
54
+ β”‚ β”œβ”€β”€ config.json
55
+ β”‚ β”œβ”€β”€ diffusion_pytorch_model-00001-of-00002.safetensors
56
+ β”‚ β”œβ”€β”€ diffusion_pytorch_model-00002-of-00002.safetensors
57
+ β”‚ └── diffusion_pytorch_model.safetensors.index.json
58
+ β”œβ”€β”€ vae/
59
+ β”‚ β”œβ”€β”€ config.json
60
+ β”‚ └── diffusion_pytorch_model.safetensors
61
+ β”œβ”€β”€ scheduler/
62
+ β”‚ └── scheduler_config.json
63
+ └── prompt_embeddings/
64
+ └── empty.safetensors
65
+ ```
66
+
67
+ ### Inference
68
+
69
+ Run OSDEnhancer on an input video:
70
+
71
+ ```bash
72
+ python inference.py \
73
+ --input demo/input.mp4 \ # Path to the input MP4 video
74
+ --output demo/output.mp4 \ # Path to save the enhanced MP4 video
75
+ --ckpt_path ckpt \ # Path to the pretrained checkpoint directory
76
+ --spatial_scale 4 \ # Spatial upsampling scale
77
+ --temporal_scale 2 # Temporal upsampling scale
78
+ ```
79
+ We recommend setting `spatial_scale = 4` and `temporal_scale = 2`. For long videos or high-resolution inputs, enable chunk-based inference by additionally setting `--chunk_num` and `--overlap`, where `--chunk_num` should satisfy the form of `8N+1`.
80
+
81
+ ## πŸ“§ Contact
82
+
83
+ If you meet any problems, please feel free to contact us via email: shuoyan.wei@bjtu.edu.cn
84
+
85
+ ## πŸ’‘ Cite
86
+
87
+ If you find this work useful for your research, please consider citing our paper 😊
88
+
89
+ ```shell
90
+ @article{wei2026osdenhancer,
91
+ title={Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion},
92
+ author={Wei, Shuoyan and Li, Feng and Zhou, Chen and Cong, Runmin and Zhao, Yao and Bai, Huihui},
93
+ journal={arXiv preprint arXiv:2601.20308},
94
+ year={2026}
95
+ }
96
+ ```
97
+
98
+ ## πŸ“• Acknowledgement
99
+
100
+ OSDEnhancer is built upon [CogVideoX](https://github.com/zai-org/CogVideo). We also sincerely thank the authors of [DOVE](https://github.com/zhengchen1999/DOVE), [EvEnhancer](https://github.com/W-Shuoyan/EvEnhancer), and [RealBasicVSR](https://github.com/ckkelvinchan/realbasicvsr) for their excellent open-source implementations, which provided valuable references for this project.
figures/framework.png ADDED

Git LFS Details

  • SHA256: ad0ebd00bb2a86422511c78a40fa45da000478d104c8926f22251d2101814e09
  • Pointer size: 132 Bytes
  • Size of remote file: 3.04 MB