VideoSEAL_8B / README.md

nielsr HF Staff

Add paper link and citation

07c952a verified 1 day ago

4.15 kB

base_model: Qwen/Qwen3-8B
language:
  - en
library_name: transformers
license: apache-2.0
pipeline_tag: video-text-to-text
tags:
  - video-understanding
  - long-video-understanding
  - agentic-llm
  - video-question-answering
  - vision-language-model
  - grpo
  - reinforcement-learning
  - icml-2026

🎬 VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

🤗 HuggingFace model: CewEhao/VideoSEAL_8B · 💻 Code: Echochef/VideoSEAL · 📄 Paper: 2605.12571

👉 Introduction

This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).

VideoSEAL is an agentic framework for long-video question answering. It separates the planner role (deciding which evidence to gather) from the answerer role (judging the evidence), mitigating the "evidence misalignment" where models produce correct answers not supported by retrieved evidence.

VideoSEAL provides offline build utilities for long video indexing:

OCR subtitles (SRT) → OCR captions + (optional) embeddings
Clip captions (VLM) → clip captions + (optional) embeddings
Merge into a unified semantic index under indexes/semantic/<video_id>/
(Optional) generate a global full_story.txt summary

📦 Layout

🧰 Shell entrypoints: scripts/
🐍 Python package: videoseal/
✅ Tests: test/
🧩 OCR toolchain (vendored): third_party/video-subtitle-extractor/

⚙️ Configuration

Defaults live in the scripts under scripts/.
Put real API keys/endpoints in your shell environment / job launcher.

🏗️ Run offline build

cd /path/to/VideoSEAL

export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh

✅ Run tests

/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v

🏋️ GRPO training (video tool workflow)

This repo vendors a minimal copy of the rllm/ + verl/ Python packages (under the repo root) to make the video tool-agent GRPO workflow runnable without an extra repo checkout.

🧪 Training environment (conda)

conda create -n videoseal python=3.12 -y
conda activate videoseal

pip install vllm==0.11.0

cd rllm
pip install -e .

cd ../verl
pip install -e .

🚀 Launcher

scripts/train/run_video_workflow_grpo.sh

🧩 Example

cd /path/to/VideoSEAL

# Export real API keys/endpoints in your environment before launching.

TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train

🔎 Quick checks

./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py

📜 Citation

@inproceedings{videoseal2026,
  title={VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority},
  author={Dongyang Liu and others},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026},
  url={https://huggingface.co/papers/2605.12571}
}