Video-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
image-text-to-text
video
long-video
reasoning
tool-calling
agentic-rl
grpo
multimodal
Instructions to use ParaVT/ParaVT-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ParaVT/ParaVT-8B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ParaVT/ParaVT-8B") model = AutoModelForImageTextToText.from_pretrained("ParaVT/ParaVT-8B") - Notebooks
- Google Colab
- Kaggle
mwxely commited on
Commit ·
68e7dc7
1
Parent(s): 58a9fe1
card: align H1 with the paper title; drop internal 'Plan B' jargon
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ tags:
|
|
| 19 |
- multimodal
|
| 20 |
---
|
| 21 |
|
| 22 |
-
# ParaVT:
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
|
|
@@ -45,7 +45,7 @@ This repository hosts the final post-RL checkpoint (`ParaVT-8B`), obtained by ru
|
|
| 45 |
| Architecture | `Qwen3VLForConditionalGeneration` |
|
| 46 |
| Parameters | 8 B |
|
| 47 |
| Base model | `Qwen/Qwen3-VL-8B-Instruct` |
|
| 48 |
-
| Training stages | SFT (
|
| 49 |
| Training data | [`ParaVT/ParaVT-Parquet`](https://huggingface.co/datasets/ParaVT/ParaVT-Parquet) (`sft` + `rl` configs) |
|
| 50 |
| Source videos | [`ParaVT/ParaVT-Source`](https://huggingface.co/datasets/ParaVT/ParaVT-Source) |
|
| 51 |
| Native tool | Temporal cropping (start time, end time, optional sub-frame count) |
|
|
|
|
| 19 |
- multimodal
|
| 20 |
---
|
| 21 |
|
| 22 |
+
# ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
|
| 23 |
|
| 24 |
<div align="center">
|
| 25 |
|
|
|
|
| 45 |
| Architecture | `Qwen3VLForConditionalGeneration` |
|
| 46 |
| Parameters | 8 B |
|
| 47 |
| Base model | `Qwen/Qwen3-VL-8B-Instruct` |
|
| 48 |
+
| Training stages | Cold-start SFT (500 steps) → PARA-GRPO RL (54 steps) |
|
| 49 |
| Training data | [`ParaVT/ParaVT-Parquet`](https://huggingface.co/datasets/ParaVT/ParaVT-Parquet) (`sft` + `rl` configs) |
|
| 50 |
| Source videos | [`ParaVT/ParaVT-Source`](https://huggingface.co/datasets/ParaVT/ParaVT-Source) |
|
| 51 |
| Native tool | Temporal cropping (start time, end time, optional sub-frame count) |
|