Video-Text-to-Text
Transformers
Safetensors
English
qwen3
text-generation
video-understanding
long-video-understanding
agentic-llm
video-question-answering
vision-language-model
grpo
reinforcement-learning
icml-2026
text-generation-inference
Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
๐ฌ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
๐ค HuggingFace model: CewEhao/VideoSEAL_8B ยท ๐ป Code: Echochef/VideoSEAL
๐ Introduction
This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).
VideoSEAL provides offline build utilities for long video indexing:
- OCR subtitles (SRT) โ OCR captions + (optional) embeddings
- Clip captions (VLM) โ clip captions + (optional) embeddings
- Merge into a unified semantic index under
indexes/semantic/<video_id>/ - (Optional) generate a global
full_story.txtsummary
๐ฆ Layout
- ๐งฐ Shell entrypoints:
scripts/ - ๐ Python package:
videoseal/ - โ
Tests:
test/ - ๐งฉ OCR toolchain (vendored):
third_party/video-subtitle-extractor/
โ๏ธ Configuration
- Defaults live in the scripts under
scripts/. - Put real API keys/endpoints in your shell environment / job launcher.
๐๏ธ Run offline build
cd /path/to/VideoSEAL
export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh
โ Run tests
/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v
๐๏ธ GRPO training (video tool workflow)
This repo vendors a minimal copy of the rllm/ + verl/ Python packages (under the repo root)
to make the video tool-agent GRPO workflow runnable without an extra repo checkout.
๐งช Training environment (conda)
conda create -n videoseal python=3.12 -y
conda activate videoseal
pip install vllm==0.11.0
cd rllm
pip install -e .
cd ../verl
pip install -e .
๐ Launcher
scripts/train/run_video_workflow_grpo.sh
๐งฉ Example
cd /path/to/VideoSEAL
# Export real API keys/endpoints in your environment before launching.
TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train
๐ Quick checks
./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py
- Downloads last month
- 46
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support