--- base_model: Qwen/Qwen3-8B language: - en library_name: transformers license: apache-2.0 pipeline_tag: video-text-to-text tags: - video-understanding - long-video-understanding - agentic-llm - video-question-answering - vision-language-model - grpo - reinforcement-learning - icml-2026 ---

๐ŸŽฌ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

Paper Code HF Model ICML 2026

๐Ÿค— HuggingFace model: CewEhao/VideoSEAL_8B  ยท  ๐Ÿ’ป Code: Echochef/VideoSEAL  ยท  ๐Ÿ“„ Paper: 2605.12571

## ๐Ÿ‘‰ Introduction This is the official model card for **VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority** (ICML 2026). VideoSEAL is an agentic framework for long-video question answering. It separates the *planner* role (deciding which evidence to gather) from the *answerer* role (judging the evidence), mitigating the "evidence misalignment" where models produce correct answers not supported by retrieved evidence. VideoSEAL provides offline build utilities for long video indexing: - OCR subtitles (SRT) โ†’ OCR captions + (optional) embeddings - Clip captions (VLM) โ†’ clip captions + (optional) embeddings - Merge into a unified semantic index under `indexes/semantic//` - (Optional) generate a global `full_story.txt` summary ## ๐Ÿ“ฆ Layout - ๐Ÿงฐ Shell entrypoints: `scripts/` - ๐Ÿ Python package: `videoseal/` - โœ… Tests: `test/` - ๐Ÿงฉ OCR toolchain (vendored): `third_party/video-subtitle-extractor/` ## โš™๏ธ Configuration - Defaults live in the scripts under `scripts/`. - Put real API keys/endpoints in your shell environment / job launcher. ## ๐Ÿ—๏ธ Run offline build ```bash cd /path/to/VideoSEAL export MLLM_API_KEY="sk_your_api_key" export EMBEDDING_API_KEY="sk_your_api_key" export AGENT_LLM_API_KEY="sk_your_api_key" export VISUAL_INSPECT_API_KEY="sk_your_api_key" VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh ``` ## โœ… Run tests ```bash /root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v ``` ## ๐Ÿ‹๏ธ GRPO training (video tool workflow) This repo vendors a minimal copy of the `rllm/` + `verl/` Python packages (under the repo root) to make the video tool-agent GRPO workflow runnable without an extra repo checkout. ### ๐Ÿงช Training environment (conda) ```bash conda create -n videoseal python=3.12 -y conda activate videoseal pip install vllm==0.11.0 cd rllm pip install -e . cd ../verl pip install -e . ``` ### ๐Ÿš€ Launcher - `scripts/train/run_video_workflow_grpo.sh` ### ๐Ÿงฉ Example ```bash cd /path/to/VideoSEAL # Export real API keys/endpoints in your environment before launching. TRAIN_PARQUET='["/path/to/train.parquet"]' \ VAL_PARQUET='/path/to/val.parquet' \ MODEL_PATH='Qwen/Qwen3-8B' \ ./scripts/train/run_video_workflow_grpo.sh train ``` ### ๐Ÿ”Ž Quick checks ```bash ./scripts/train/run_video_workflow_grpo.sh test-reward pytest -q tests/rewards/test_video_reward_tool_env_integration.py ``` ## ๐Ÿ“œ Citation ```bibtex @inproceedings{videoseal2026, title={VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority}, author={Dongyang Liu and others}, booktitle={International Conference on Machine Learning (ICML)}, year={2026}, url={https://huggingface.co/papers/2605.12571} } ```