Instructions to use DarthZhu/VideoRLVR-Wan2.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use DarthZhu/VideoRLVR-Wan2.2 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("DarthZhu/VideoRLVR-Wan2.2", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-video
base_model:
- DarthZhu/VideoRLVR-Wan2.2-Base
datasets:
- DarthZhu/VideoRLVR-Data
VideoRLVR
VideoRLVR is a reinforcement learning (RL) recipe for training video reasoning models with verifiable rewards. This model is a reinforcement-learning optimized version of Wan2.2-TI2V-5B, presented in the paper Video Models Can Reason with Verifiable Rewards.
The model uses an SDE-GRPO optimization backbone and rule-based feedback to improve visual reasoning in complex, procedurally generated tasks such as Maze, FlowFree, and Sokoban.
- Paper: Video Models Can Reason with Verifiable Rewards
- Project Page: https://darthzhu.github.io/VideoRLVR-page/
- Code: https://github.com/luka-group/VideoRLVR
Method Overview
VideoRLVR formulates video reasoning as the generation of verifiable visual trajectories. Key components include:
- SDE-GRPO: An optimization backbone for video diffusion models.
- Dense Decomposed Rewards: Verifiable, rule-based feedback to guide the model.
- Early-Step Focus: A strategy that restricts policy optimization to the early denoising phase, significantly reducing training latency while preserving performance.
Citation
@article{zhu2026video,
title={Video Models Can Reason with Verifiable Rewards},
author={Tinghui Zhu and Sheng Zhang and James Y. Huang and Selena Song and Xiaofei Wen and Yuankai Li and Hoifung Poon and Muhao Chen},
journal={arXiv preprint arXiv:2605.15458},
year={2026}
}