--- base_model: Qwen/Qwen3-8B language: - en library_name: transformers license: apache-2.0 pipeline_tag: video-text-to-text tags: - video-understanding - long-video-understanding - agentic-llm - video-question-answering - vision-language-model - grpo - reinforcement-learning - icml-2026 ---

🎬 VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

🤗 HuggingFace model: CewEhao/VideoSEAL_8B · 💻 Code: Echochef/VideoSEAL · 📄 Paper: 2605.12571

## 👉 Introduction This is the official model card for **VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority** (ICML 2026). VideoSEAL is an agentic framework for long-video question answering. It separates the *planner* role (deciding which evidence to gather) from the *answerer* role (judging the evidence), mitigating the "evidence misalignment" where models produce correct answers not supported by retrieved evidence. VideoSEAL provides offline build utilities for long video indexing: - OCR subtitles (SRT) → OCR captions + (optional) embeddings - Clip captions (VLM) → clip captions + (optional) embeddings - Merge into a unified semantic index under `indexes/semantic//` - (Optional) generate a global `full_story.txt` summary ## 📦 Layout - 🧰 Shell entrypoints: `scripts/` - 🐍 Python package: `videoseal/` - ✅ Tests: `test/` - 🧩 OCR toolchain (vendored): `third_party/video-subtitle-extractor/` ## ⚙️ Configuration - Defaults live in the scripts under `scripts/`. - Put real API keys/endpoints in your shell environment / job launcher. ## 🏗️ Run offline build ```bash cd /path/to/VideoSEAL export MLLM_API_KEY="sk_your_api_key" export EMBEDDING_API_KEY="sk_your_api_key" export AGENT_LLM_API_KEY="sk_your_api_key" export VISUAL_INSPECT_API_KEY="sk_your_api_key" VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh ``` ## ✅ Run tests ```bash /root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v ``` ## 🏋️ GRPO training (video tool workflow) This repo vendors a minimal copy of the `rllm/` + `verl/` Python packages (under the repo root) to make the video tool-agent GRPO workflow runnable without an extra repo checkout. ### 🧪 Training environment (conda) ```bash conda create -n videoseal python=3.12 -y conda activate videoseal pip install vllm==0.11.0 cd rllm pip install -e . cd ../verl pip install -e . ``` ### 🚀 Launcher - `scripts/train/run_video_workflow_grpo.sh` ### 🧩 Example ```bash cd /path/to/VideoSEAL # Export real API keys/endpoints in your environment before launching. TRAIN_PARQUET='["/path/to/train.parquet"]' \ VAL_PARQUET='/path/to/val.parquet' \ MODEL_PATH='Qwen/Qwen3-8B' \ ./scripts/train/run_video_workflow_grpo.sh train ``` ### 🔎 Quick checks ```bash ./scripts/train/run_video_workflow_grpo.sh test-reward pytest -q tests/rewards/test_video_reward_tool_env_integration.py ``` ## 📜 Citation ```bibtex @inproceedings{videoseal2026, title={VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority}, author={Dongyang Liu and others}, booktitle={International Conference on Machine Learning (ICML)}, year={2026}, url={https://huggingface.co/papers/2605.12571} } ```