File size: 2,046 Bytes
f97b947
 
 
1ecfe02
f97b947
 
 
 
 
 
 
 
 
 
 
 
 
7e58837
bed26cd
f97b947
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
datasets:
- Kwai-Keye/VideoTemp-o3
language:
- en
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: video-text-to-text
---

<h2 align="center"> <b>VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos</b></h2>
  
<div align="center" style="font-size: 15pt">

<a href='https://liuwq-bit.github.io/VideoTemp-o3'><img src='https://img.shields.io/badge/Project-Page-green'></a>
<a href='https://arxiv.org/abs/2602.07801'><img src='https://img.shields.io/badge/Arxiv-2602.07801-red'></a>
<a href='https://github.com/Kwai-Keye/VideoTemp-o3'><img src='https://img.shields.io/badge/Code-Github-blue?logo=github'></a>
<br>
<a href='https://huggingface.co/Kwai-Keye/VideoTemp-o3'><img src='https://img.shields.io/badge/Model-VideoTemp o3-orange'></a>
<a href='https://huggingface.co/datasets/Kwai-Keye/VideoTemp-o3'><img src='https://img.shields.io/badge/Dataset-SFT & RL-yellow'></a>
<a href='https://huggingface.co/datasets/Kwai-Keye/VideoTemp-Bench'><img src='https://img.shields.io/badge/Benchmark-VideoTemp Bench-blue'></a>

</div>

![image](https://cdn-uploads.huggingface.co/production/uploads/65a6797f838b3acc5358f583/qAeCGmvo-IjTIhifmEt7x.png)

Illustration of the agentic pipeline in VideoTemp-o3. Given a video QA pair, the model performs on-demand temporal grounding to locate the most relevant segment, then refines it iteratively. Finally, it produces a reliable answer grounded in the pertinent visual evidence.

## Citation

If you find our work useful, please consider citing:

```bibtex
@article{liu2026videotemp,
  title={VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos},
  author={Liu, Wenqi and Wang, Yunxiao and Ma, Shijie and Liu, Meng and Su, Qile and Zhang, Tianke and Fan, Haonan and Liu, Changyi and Jiang, Kaiyu and Chen, Jiankang and Tang, Kaiyu and Wen, Bin and Yang, Fan and Gao, Tingting and Li, Han and Wei, Yinwei and Song, Xuemeng},
  journal={arXiv preprint arXiv:2602.07801},
  year={2026}
}
```