CewEhao
/

VideoSEAL_8B

Video-Text-to-Text

text-generation

video-understanding

long-video-understanding

video-question-answering

vision-language-model

reinforcement-learning

text-generation-inference

Model card Files Files and versions

VideoSEAL_8B / README.md

CewEhao's picture

docs: add model card YAML metadata

8ca702c 4 days ago

|

history blame contribute delete

3.26 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: video-text-to-text
	base_model: Qwen/Qwen3-8B
	language:
	- en
	tags:
	- video-understanding
	- long-video-understanding
	- agentic-llm
	- video-question-answering
	- vision-language-model
	- grpo
	- reinforcement-learning
	- icml-2026
	---

	<h2 align="center">🎬 VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority</h2>

	<p align="center">
	<a href="https://github.com/Echochef/VideoSEAL"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-black?logo=github"></a>
	<a href="https://huggingface.co/CewEhao/VideoSEAL_8B"><img alt="HF Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-VideoSEAL__8B-yellow"></a>
	<img alt="ICML 2026" src="https://img.shields.io/badge/ICML-2026-blue">
	</p>

	<p align="center">
	🤗 HuggingFace model:
	<a href="https://huggingface.co/CewEhao/VideoSEAL_8B">CewEhao/VideoSEAL_8B</a>
	·
	💻 Code:
	<a href="https://github.com/Echochef/VideoSEAL">Echochef/VideoSEAL</a>
	</p>

	## 👉 Introduction

	This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).

	VideoSEAL provides offline build utilities for long video indexing:

	- OCR subtitles (SRT) → OCR captions + (optional) embeddings
	- Clip captions (VLM) → clip captions + (optional) embeddings
	- Merge into a unified semantic index under `indexes/semantic/<video_id>/`
	- (Optional) generate a global `full_story.txt` summary

	## 📦 Layout

	- 🧰 Shell entrypoints: `scripts/`
	- 🐍 Python package: `videoseal/`
	- ✅ Tests: `test/`
	- 🧩 OCR toolchain (vendored): `third_party/video-subtitle-extractor/`

	## ⚙️ Configuration

	- Defaults live in the scripts under `scripts/`.
	- Put real API keys/endpoints in your shell environment / job launcher.

	## 🏗️ Run offline build

	```bash
	cd /path/to/VideoSEAL

	export MLLM_API_KEY="sk_your_api_key"
	export EMBEDDING_API_KEY="sk_your_api_key"
	export AGENT_LLM_API_KEY="sk_your_api_key"
	export VISUAL_INSPECT_API_KEY="sk_your_api_key"
	VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh
	```

	## ✅ Run tests

	```bash
	/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v
	```

	## 🏋️ GRPO training (video tool workflow)

	This repo vendors a minimal copy of the `rllm/` + `verl/` Python packages (under the repo root)
	to make the video tool-agent GRPO workflow runnable without an extra repo checkout.

	### 🧪 Training environment (conda)

	```bash
	conda create -n videoseal python=3.12 -y
	conda activate videoseal

	pip install vllm==0.11.0

	cd rllm
	pip install -e .

	cd ../verl
	pip install -e .
	```

	### 🚀 Launcher

	- `scripts/train/run_video_workflow_grpo.sh`

	### 🧩 Example

	```bash
	cd /path/to/VideoSEAL

	# Export real API keys/endpoints in your environment before launching.

	TRAIN_PARQUET='["/path/to/train.parquet"]' \
	VAL_PARQUET='/path/to/val.parquet' \
	MODEL_PATH='Qwen/Qwen3-8B' \
	./scripts/train/run_video_workflow_grpo.sh train
	```

	### 🔎 Quick checks

	```bash
	./scripts/train/run_video_workflow_grpo.sh test-reward
	pytest -q tests/rewards/test_video_reward_tool_env_integration.py
	```