LongVideoAgent Qwen3-4B

This repository hosts the released LLM checkpoint for LongVideoAgent, a multi-agent framework for long-video question answering. This model is a Qwen3-4B-based checkpoint used in the LongVideoAgent project.

Overview

This model is trained based on the official repository: longvideoagent/LongVideoAgent.

LongVideoAgent utilizes a multi-agent collaboration framework to decompose complex long-video reasoning into specialized roles. For detailed methodology and agent architecture, please refer to our paper on arXiv: https://arxiv.org/abs/2512.20618.

This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.

Performance

On the LongTVQA+ test set, this model achieves an accuracy of 72%, while gpt-4o-mini achieves 74% on the same benchmark.

This demonstrates that our model delivers strong performance, achieving reasoning capabilities comparable to advanced closed-source models while utilizing a significantly smaller parameter size.

Intended Use

Use this model for:

Research on long-video question answering
Reproducing LongVideoAgent experiments
Studying agentic reasoning over long videos

This checkpoint is not a general-purpose video model by itself. For inference and evaluation, please use the official repository:

https://github.com/longvideoagent/LongVideoAgent

Usage

Note on Context Length: This model natively supports a context length of 262,144. If you experience Out-Of-Memory (OOM) errors or have limited VRAM during inference, you can reduce the maximum context length in your vLLM parameters. For example: max_model_len=120000.

Please follow the setup and inference instructions in the official repository and project documentation:

If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.

Citation

@misc{liu2025longvideoagentmultiagentreasoninglong,
  title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},
  author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},
  year={2025},
  eprint={2512.20618},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={[https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618)},
}