Video-Text-to-Text
Transformers
Safetensors
English
qwen3
text-generation
video-understanding
long-video-understanding
agentic-llm
video-question-answering
vision-language-model
grpo
reinforcement-learning
icml-2026
text-generation-inference
Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
File size: 3,263 Bytes
8ca702c 987d21c 14d11e9 987d21c 14d11e9 987d21c 14d11e9 987d21c 14d11e9 987d21c 14d11e9 987d21c 14d11e9 987d21c 14d11e9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: video-text-to-text
base_model: Qwen/Qwen3-8B
language:
- en
tags:
- video-understanding
- long-video-understanding
- agentic-llm
- video-question-answering
- vision-language-model
- grpo
- reinforcement-learning
- icml-2026
---
<h2 align="center">π¬ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority</h2>
<p align="center">
<a href="https://github.com/Echochef/VideoSEAL"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-black?logo=github"></a>
<a href="https://huggingface.co/CewEhao/VideoSEAL_8B"><img alt="HF Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-VideoSEAL__8B-yellow"></a>
<img alt="ICML 2026" src="https://img.shields.io/badge/ICML-2026-blue">
</p>
<p align="center">
π€ HuggingFace model:
<a href="https://huggingface.co/CewEhao/VideoSEAL_8B">CewEhao/VideoSEAL_8B</a>
Β·
π» Code:
<a href="https://github.com/Echochef/VideoSEAL">Echochef/VideoSEAL</a>
</p>
## π Introduction
This is the official model card for **VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority** (ICML 2026).
VideoSEAL provides offline build utilities for long video indexing:
- OCR subtitles (SRT) β OCR captions + (optional) embeddings
- Clip captions (VLM) β clip captions + (optional) embeddings
- Merge into a unified semantic index under `indexes/semantic/<video_id>/`
- (Optional) generate a global `full_story.txt` summary
## π¦ Layout
- π§° Shell entrypoints: `scripts/`
- π Python package: `videoseal/`
- β
Tests: `test/`
- π§© OCR toolchain (vendored): `third_party/video-subtitle-extractor/`
## βοΈ Configuration
- Defaults live in the scripts under `scripts/`.
- Put real API keys/endpoints in your shell environment / job launcher.
## ποΈ Run offline build
```bash
cd /path/to/VideoSEAL
export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh
```
## β
Run tests
```bash
/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v
```
## ποΈ GRPO training (video tool workflow)
This repo vendors a minimal copy of the `rllm/` + `verl/` Python packages (under the repo root)
to make the video tool-agent GRPO workflow runnable without an extra repo checkout.
### π§ͺ Training environment (conda)
```bash
conda create -n videoseal python=3.12 -y
conda activate videoseal
pip install vllm==0.11.0
cd rllm
pip install -e .
cd ../verl
pip install -e .
```
### π Launcher
- `scripts/train/run_video_workflow_grpo.sh`
### π§© Example
```bash
cd /path/to/VideoSEAL
# Export real API keys/endpoints in your environment before launching.
TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train
```
### π Quick checks
```bash
./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py
```
|