Video-Text-to-Text
Transformers
Safetensors
English
qwen3
text-generation
video-understanding
long-video-understanding
agentic-llm
video-question-answering
vision-language-model
grpo
reinforcement-learning
icml-2026
text-generation-inference
Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
docs: align README with VideoSEAL (ICML 2026) — fix model link, paper subtitle, paths, conda env
Browse files
README.md
CHANGED
|
@@ -1,15 +1,24 @@
|
|
| 1 |
-
<h2 align="center">🎬 VideoSEAL:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
<p align="center">
|
| 4 |
🤗 HuggingFace model:
|
| 5 |
-
<a href="https://huggingface.co/
|
|
|
|
|
|
|
|
|
|
| 6 |
</p>
|
| 7 |
|
| 8 |
## 👉 Introduction
|
| 9 |
|
| 10 |
-
This is the official
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
- OCR subtitles (SRT) → OCR captions + (optional) embeddings
|
| 15 |
- Clip captions (VLM) → clip captions + (optional) embeddings
|
|
@@ -31,7 +40,7 @@ Videoseal provides offline build utilities for long video indexing:
|
|
| 31 |
## 🏗️ Run offline build
|
| 32 |
|
| 33 |
```bash
|
| 34 |
-
cd /path/to/
|
| 35 |
|
| 36 |
export MLLM_API_KEY="sk_your_api_key"
|
| 37 |
export EMBEDDING_API_KEY="sk_your_api_key"
|
|
@@ -54,8 +63,8 @@ to make the video tool-agent GRPO workflow runnable without an extra repo checko
|
|
| 54 |
### 🧪 Training environment (conda)
|
| 55 |
|
| 56 |
```bash
|
| 57 |
-
conda create -n
|
| 58 |
-
conda activate
|
| 59 |
|
| 60 |
pip install vllm==0.11.0
|
| 61 |
|
|
@@ -73,7 +82,7 @@ pip install -e .
|
|
| 73 |
### 🧩 Example
|
| 74 |
|
| 75 |
```bash
|
| 76 |
-
cd /path/to/
|
| 77 |
|
| 78 |
# Export real API keys/endpoints in your environment before launching.
|
| 79 |
|
|
|
|
| 1 |
+
<h2 align="center">🎬 VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority</h2>
|
| 2 |
+
|
| 3 |
+
<p align="center">
|
| 4 |
+
<a href="https://github.com/Echochef/VideoSEAL"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-black?logo=github"></a>
|
| 5 |
+
<a href="https://huggingface.co/CewEhao/VideoSEAL_8B"><img alt="HF Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-VideoSEAL__8B-yellow"></a>
|
| 6 |
+
<img alt="ICML 2026" src="https://img.shields.io/badge/ICML-2026-blue">
|
| 7 |
+
</p>
|
| 8 |
|
| 9 |
<p align="center">
|
| 10 |
🤗 HuggingFace model:
|
| 11 |
+
<a href="https://huggingface.co/CewEhao/VideoSEAL_8B">CewEhao/VideoSEAL_8B</a>
|
| 12 |
+
·
|
| 13 |
+
💻 Code:
|
| 14 |
+
<a href="https://github.com/Echochef/VideoSEAL">Echochef/VideoSEAL</a>
|
| 15 |
</p>
|
| 16 |
|
| 17 |
## 👉 Introduction
|
| 18 |
|
| 19 |
+
This is the official model card for **VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority** (ICML 2026).
|
| 20 |
|
| 21 |
+
VideoSEAL provides offline build utilities for long video indexing:
|
| 22 |
|
| 23 |
- OCR subtitles (SRT) → OCR captions + (optional) embeddings
|
| 24 |
- Clip captions (VLM) → clip captions + (optional) embeddings
|
|
|
|
| 40 |
## 🏗️ Run offline build
|
| 41 |
|
| 42 |
```bash
|
| 43 |
+
cd /path/to/VideoSEAL
|
| 44 |
|
| 45 |
export MLLM_API_KEY="sk_your_api_key"
|
| 46 |
export EMBEDDING_API_KEY="sk_your_api_key"
|
|
|
|
| 63 |
### 🧪 Training environment (conda)
|
| 64 |
|
| 65 |
```bash
|
| 66 |
+
conda create -n videoseal python=3.12 -y
|
| 67 |
+
conda activate videoseal
|
| 68 |
|
| 69 |
pip install vllm==0.11.0
|
| 70 |
|
|
|
|
| 82 |
### 🧩 Example
|
| 83 |
|
| 84 |
```bash
|
| 85 |
+
cd /path/to/VideoSEAL
|
| 86 |
|
| 87 |
# Export real API keys/endpoints in your environment before launching.
|
| 88 |
|