Video-Text-to-Text
Transformers
Safetensors
English
qwen3
text-generation
video-understanding
long-video-understanding
agentic-llm
video-question-answering
vision-language-model
grpo
reinforcement-learning
icml-2026
text-generation-inference
Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: video-text-to-text | |
| base_model: Qwen/Qwen3-8B | |
| language: | |
| - en | |
| tags: | |
| - video-understanding | |
| - long-video-understanding | |
| - agentic-llm | |
| - video-question-answering | |
| - vision-language-model | |
| - grpo | |
| - reinforcement-learning | |
| - icml-2026 | |
| <h2 align="center">π¬ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority</h2> | |
| <p align="center"> | |
| <a href="https://github.com/Echochef/VideoSEAL"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-black?logo=github"></a> | |
| <a href="https://huggingface.co/CewEhao/VideoSEAL_8B"><img alt="HF Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-VideoSEAL__8B-yellow"></a> | |
| <img alt="ICML 2026" src="https://img.shields.io/badge/ICML-2026-blue"> | |
| </p> | |
| <p align="center"> | |
| π€ HuggingFace model: | |
| <a href="https://huggingface.co/CewEhao/VideoSEAL_8B">CewEhao/VideoSEAL_8B</a> | |
| Β· | |
| π» Code: | |
| <a href="https://github.com/Echochef/VideoSEAL">Echochef/VideoSEAL</a> | |
| </p> | |
| ## π Introduction | |
| This is the official model card for **VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority** (ICML 2026). | |
| VideoSEAL provides offline build utilities for long video indexing: | |
| - OCR subtitles (SRT) β OCR captions + (optional) embeddings | |
| - Clip captions (VLM) β clip captions + (optional) embeddings | |
| - Merge into a unified semantic index under `indexes/semantic/<video_id>/` | |
| - (Optional) generate a global `full_story.txt` summary | |
| ## π¦ Layout | |
| - π§° Shell entrypoints: `scripts/` | |
| - π Python package: `videoseal/` | |
| - β Tests: `test/` | |
| - π§© OCR toolchain (vendored): `third_party/video-subtitle-extractor/` | |
| ## βοΈ Configuration | |
| - Defaults live in the scripts under `scripts/`. | |
| - Put real API keys/endpoints in your shell environment / job launcher. | |
| ## ποΈ Run offline build | |
| ```bash | |
| cd /path/to/VideoSEAL | |
| export MLLM_API_KEY="sk_your_api_key" | |
| export EMBEDDING_API_KEY="sk_your_api_key" | |
| export AGENT_LLM_API_KEY="sk_your_api_key" | |
| export VISUAL_INSPECT_API_KEY="sk_your_api_key" | |
| VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh | |
| ``` | |
| ## β Run tests | |
| ```bash | |
| /root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v | |
| ``` | |
| ## ποΈ GRPO training (video tool workflow) | |
| This repo vendors a minimal copy of the `rllm/` + `verl/` Python packages (under the repo root) | |
| to make the video tool-agent GRPO workflow runnable without an extra repo checkout. | |
| ### π§ͺ Training environment (conda) | |
| ```bash | |
| conda create -n videoseal python=3.12 -y | |
| conda activate videoseal | |
| pip install vllm==0.11.0 | |
| cd rllm | |
| pip install -e . | |
| cd ../verl | |
| pip install -e . | |
| ``` | |
| ### π Launcher | |
| - `scripts/train/run_video_workflow_grpo.sh` | |
| ### π§© Example | |
| ```bash | |
| cd /path/to/VideoSEAL | |
| # Export real API keys/endpoints in your environment before launching. | |
| TRAIN_PARQUET='["/path/to/train.parquet"]' \ | |
| VAL_PARQUET='/path/to/val.parquet' \ | |
| MODEL_PATH='Qwen/Qwen3-8B' \ | |
| ./scripts/train/run_video_workflow_grpo.sh train | |
| ``` | |
| ### π Quick checks | |
| ```bash | |
| ./scripts/train/run_video_workflow_grpo.sh test-reward | |
| pytest -q tests/rewards/test_video_reward_tool_env_integration.py | |
| ``` | |