Jamichsu
/

Stream-DiffVSR

StreamDiffVSRPipeline

Model card Files Files and versions

Stream-DiffVSR / README.md

Jamichsu's picture

Update README.md

1a13be9 verified 3 months ago

|

history blame contribute delete

3.18 kB

	---
	library_name: diffusers
	pipeline_tag: image-to-image
	---

	# Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

	Stream-DiffVSR is a causally conditioned diffusion framework designed for efficient online Video Super-Resolution (VSR). It operates strictly on past frames to maintain low latency, making it suitable for real-time deployment.

	[[Paper](https://huggingface.co/papers/2512.23709)] [[Project Page](https://jamichss.github.io/stream-diffvsr-project-page/)] [[GitHub](https://github.com/jamichss/Stream-DiffVSR)]

	## Description
	Diffusion-based VSR methods often struggle with latency due to multi-step denoising and reliance on future frames. Stream-DiffVSR addresses this with:
	- Causal Conditioning: Operates only on past frames for online processing.
	- Four-step Distilled Denoiser: Enables fast inference without sacrificing quality.
	- Auto-regressive Temporal Guidance (ARTG): Injects motion-aligned cues during denoising.
	- Lightweight Temporal Decoder: Enhances temporal coherence and fine details.

	Stream-DiffVSR can process 720p frames in 0.328 seconds on an RTX 4090, achieving significant latency reductions compared to prior diffusion-based VSR methods.

	## Usage

	### Installation
	```bash
	git clone https://github.com/jamichss/Stream-DiffVSR.git
	cd Stream-DiffVSR
	conda env create -f requirements.yml
	conda activate stream-diffvsr
	```

	### Inference
	You can run inference using the following command. The script will automatically fetch the necessary weights from this repository.

	```bash
	python inference.py \
	--model_id 'Jamichsu/Stream-DiffVSR' \
	--out_path 'YOUR_OUTPUT_PATH' \
	--in_path 'YOUR_INPUT_PATH' \
	--num_inference_steps 4
	```

	The expected file structure for the inference input data is as follows:
	```
	YOUR_INPUT_PATH/
	├── seq1/
	│ ├── frame_0001.png
	│ ├── frame_0002.png
	│ └── ...
	├── seq2/
	│ ├── frame_0001.png
	│ ├── frame_0002.png
	│ └── ...
	```

	For NVIDIA TensorRT acceleration:
	```bash
	python inference.py \
	--model_id 'Jamichsu/Stream-DiffVSR' \
	--out_path 'YOUR_OUTPUT_PATH' \
	--in_path 'YOUR_INPUT_PATH' \
	--num_inference_steps 4 \
	--enable_tensorrt \
	--image_height <YOUR_TARGET_HEIGHT> \
	--image_width <YOUR_TARGET_WIDTH>
	```

	## Note

	The provided checkpoint is a toy / proof-of-concept model trained on a limited amount of data. As a result, it does not yet cover the full diversity of real-world videos.

	This checkpoint is mainly intended to demonstrate the overall pipeline and low-latency feasibility, rather than to deliver production-level upscaling quality.

	Artifacts and inconsistent visual quality are therefore expected at this stage.


	## Citation
	If you find this work useful, please cite:
	```bibtex
	@article{shiu2025stream,
	title={Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion},
	author={Shiu, Hau-Shiang and Lin, Chin-Yang and Wang, Zhixiang and Hsiao, Chi-Wei and Yu, Po-Fan and Chen, Yu-Chih and Liu, Yu-Lun},
	journal={arXiv preprint arXiv:2512.23709},
	year={2025}
	}
	```