Diffusers
Safetensors
English
video
generation
wangrongsheng commited on
Commit
817b500
·
verified ·
1 Parent(s): c60e366
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -1,4 +1,30 @@
1
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  #### Install
4
 
@@ -12,4 +38,27 @@ pip install -r requirements.txt
12
  python inference.py --prompt "A doctor examining a patient" --output exam.mp4
13
  python inference.py --batch prompts.json --output_dir results/
14
  python inference.py --prompt "..." --gpu 1 --output result.mp4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - FreedomIntelligence/MedVideoCap-55K
5
+ language:
6
+ - en
7
+ base_model:
8
+ - Wan-AI/Wan2.1-T2V-1.3B
9
+ tags:
10
+ - video
11
+ - generation
12
+ ---
13
+ # Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
14
+
15
+ ![](https://i.imgur.com/waxVImv.png)
16
+
17
+ <p align="center">
18
+ [📃 <a href="https://arxiv.org/abs/2507.05675" target="_blank">Paper</a>] | [🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/MedVideoCap-55K" target="_blank">Dataset</a>] | [🤗 <a href="https://huggingface.co/FreedomIntelligence/MedGen-1.3B" target="_blank">MedGen-1.3B</a>] | [🤗 <a href="https://huggingface.co/FreedomIntelligence/MedGen-14B" target="_blank">MedGen-14B</a>] | [🚀 <a href="https://huggingface.co/blog/wangrongsheng/medvideocap-55k" target="_blank">Blog</a>]
19
+ </p>
20
+
21
+ ## ⚡ Introduction
22
+
23
+ Recent advances in video generation have shown remarkable progress in open-domain settings, yet medical video generation remains largely underexplored. Medical videos are critical for applications such as clinical training, education, and simulation, requiring not only high visual fidelity but also strict medical accuracy. However, current models often produce unrealistic or erroneous content when applied to medical prompts, largely due to the lack of large-scale, high-quality datasets tailored to the medical domain. To address this gap, we introduce **MedVideoCap-55K**, the first large-scale, diverse, and caption-rich dataset for medical video generation. It comprises over 55,000 curated clips spanning real-world medical scenarios, providing a strong foundation for training generalist medical video generation models. Built upon this dataset, we develop **MedGen**, which achieves leading performance among open-source models and rivals commercial systems across multiple benchmarks in both visual quality and medical accuracy.
24
+ We hope our dataset and model can serve as a valuable resource and help catalyze further research in medical video generation.
25
+
26
+
27
+ ## 🚀Quick Start
28
 
29
  #### Install
30
 
 
38
  python inference.py --prompt "A doctor examining a patient" --output exam.mp4
39
  python inference.py --batch prompts.json --output_dir results/
40
  python inference.py --prompt "..." --gpu 1 --output result.mp4
41
+ ```
42
+
43
+ ## 🤩 Acknowledgement
44
+
45
+ Our works are inspired by the following works.
46
+
47
+ - [FastVideo](https://github.com/hao-ai-lab/FastVideo): a lightweight framework for accelerating large video diffusion models.
48
+ - [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): an open-source Diffusion model engine developed.
49
+ - [VBench](https://github.com/Vchitect/VBench): a comprehensive benchmark suite for video generative models.
50
+ - [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore): a automatic metrics to simulate fine-grained human feedback for video generation.
51
+
52
+ ## 📖 Citation
53
+
54
+ ```bibtex
55
+ @misc{wang2025medgenunlockingmedicalvideo,
56
+ title={MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos},
57
+ author={Rongsheng Wang and Junying Chen and Ke Ji and Zhenyang Cai and Shunian Chen and Yunjin Yang and Benyou Wang},
58
+ year={2025},
59
+ eprint={2507.05675},
60
+ archivePrefix={arXiv},
61
+ primaryClass={cs.CV},
62
+ url={https://arxiv.org/abs/2507.05675},
63
+ }
64
  ```