VideoSEAL_8B / README.md
nielsr's picture
nielsr HF Staff
Add paper link and citation
07c952a verified
|
raw
history blame
4.15 kB
metadata
base_model: Qwen/Qwen3-8B
language:
  - en
library_name: transformers
license: apache-2.0
pipeline_tag: video-text-to-text
tags:
  - video-understanding
  - long-video-understanding
  - agentic-llm
  - video-question-answering
  - vision-language-model
  - grpo
  - reinforcement-learning
  - icml-2026

🎬 VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

Paper Code HF Model ICML 2026

πŸ€— HuggingFace model: CewEhao/VideoSEAL_8B  Β·  πŸ’» Code: Echochef/VideoSEAL  Β·  πŸ“„ Paper: 2605.12571

πŸ‘‰ Introduction

This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).

VideoSEAL is an agentic framework for long-video question answering. It separates the planner role (deciding which evidence to gather) from the answerer role (judging the evidence), mitigating the "evidence misalignment" where models produce correct answers not supported by retrieved evidence.

VideoSEAL provides offline build utilities for long video indexing:

  • OCR subtitles (SRT) β†’ OCR captions + (optional) embeddings
  • Clip captions (VLM) β†’ clip captions + (optional) embeddings
  • Merge into a unified semantic index under indexes/semantic/<video_id>/
  • (Optional) generate a global full_story.txt summary

πŸ“¦ Layout

  • 🧰 Shell entrypoints: scripts/
  • 🐍 Python package: videoseal/
  • βœ… Tests: test/
  • 🧩 OCR toolchain (vendored): third_party/video-subtitle-extractor/

βš™οΈ Configuration

  • Defaults live in the scripts under scripts/.
  • Put real API keys/endpoints in your shell environment / job launcher.

πŸ—οΈ Run offline build

cd /path/to/VideoSEAL

export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh

βœ… Run tests

/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v

πŸ‹οΈ GRPO training (video tool workflow)

This repo vendors a minimal copy of the rllm/ + verl/ Python packages (under the repo root) to make the video tool-agent GRPO workflow runnable without an extra repo checkout.

πŸ§ͺ Training environment (conda)

conda create -n videoseal python=3.12 -y
conda activate videoseal

pip install vllm==0.11.0

cd rllm
pip install -e .

cd ../verl
pip install -e .

πŸš€ Launcher

  • scripts/train/run_video_workflow_grpo.sh

🧩 Example

cd /path/to/VideoSEAL

# Export real API keys/endpoints in your environment before launching.

TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train

πŸ”Ž Quick checks

./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py

πŸ“œ Citation

@inproceedings{videoseal2026,
  title={VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority},
  author={Dongyang Liu and others},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026},
  url={https://huggingface.co/papers/2605.12571}
}