Instructions to use ddz16/Qwen3-VL-4B-CRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ddz16/Qwen3-VL-4B-CRPO with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ddz16/Qwen3-VL-4B-CRPO") model = AutoModelForImageTextToText.from_pretrained("ddz16/Qwen3-VL-4B-CRPO") - Notebooks
- Google Colab
- Kaggle
Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
This repository contains the model weights for CRPO, a dual-branch reinforcement learning framework designed to improve the spatiotemporal sensitivity of Video LLMs.
- Paper: Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning
- Project Page: https://ddz16.github.io/crpo.github.io/
- Code: https://github.com/ddz16/CRPO
Introduction
Video Large Language Models (Video LLMs) often rely on shortcuts, such as single-frame cues and language priors, rather than tracking spatiotemporal dynamics. Counterfactual Relational Policy Optimization (CRPO) addresses this by using a dual-branch RL framework.
CRPO constructs counterfactual videos (e.g., through horizontal flips and temporal reversals) and introduces a Counterfactual Relation Reward (CRR). This reward encourages the model's answers to change for dynamic questions when the visual world changes, and to remain unchanged for static questions, making it difficult for shortcut-based policies to be consistently rewarded.
Evaluation
The model was evaluated using DyBench, a paired counterfactual video benchmark with over 3,000 videos covering:
- Reversible dynamics
- Moving directions
- Event sequences
Experiments show that CRPO significantly outperforms prior RL methods on spatiotemporal-sensitive evaluations while maintaining competitive general video performance. On Qwen3-VL-8B, CRPO improves DyBench pair-accuracy, indicating improved sensitivity to video dynamics rather than reliance on static shortcuts.
- Downloads last month
- 143