Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
base_model: Qwen/Qwen3-8B
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: video-text-to-text
tags:
- video-understanding
- long-video-understanding
- agentic-llm
- video-question-answering
- vision-language-model
- grpo
- reinforcement-learning
- icml-2026
π¬ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
π€ HuggingFace model: CewEhao/VideoSEAL_8B Β· π» Code: Echochef/VideoSEAL Β· π Paper: 2605.12571
π Introduction
This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).
VideoSEAL is an agentic framework for long-video question answering. It separates the planner role (deciding which evidence to gather) from the answerer role (judging the evidence), mitigating the "evidence misalignment" where models produce correct answers not supported by retrieved evidence.
VideoSEAL provides offline build utilities for long video indexing:
- OCR subtitles (SRT) β OCR captions + (optional) embeddings
- Clip captions (VLM) β clip captions + (optional) embeddings
- Merge into a unified semantic index under
indexes/semantic/<video_id>/ - (Optional) generate a global
full_story.txtsummary
π¦ Layout
- π§° Shell entrypoints:
scripts/ - π Python package:
videoseal/ - β
Tests:
test/ - π§© OCR toolchain (vendored):
third_party/video-subtitle-extractor/
βοΈ Configuration
- Defaults live in the scripts under
scripts/. - Put real API keys/endpoints in your shell environment / job launcher.
ποΈ Run offline build
cd /path/to/VideoSEAL
export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh
β Run tests
/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v
ποΈ GRPO training (video tool workflow)
This repo vendors a minimal copy of the rllm/ + verl/ Python packages (under the repo root)
to make the video tool-agent GRPO workflow runnable without an extra repo checkout.
π§ͺ Training environment (conda)
conda create -n videoseal python=3.12 -y
conda activate videoseal
pip install vllm==0.11.0
cd rllm
pip install -e .
cd ../verl
pip install -e .
π Launcher
scripts/train/run_video_workflow_grpo.sh
π§© Example
cd /path/to/VideoSEAL
# Export real API keys/endpoints in your environment before launching.
TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train
π Quick checks
./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py
π Citation
@inproceedings{videoseal2026,
title={VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority},
author={Dongyang Liu and others},
booktitle={International Conference on Machine Learning (ICML)},
year={2026},
url={https://huggingface.co/papers/2605.12571}
}