Video-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
image-text-to-text
video
long-video
reasoning
tool-calling
agentic-rl
grpo
multimodal
Instructions to use ParaVT/ParaVT-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ParaVT/ParaVT-8B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ParaVT/ParaVT-8B") model = AutoModelForImageTextToText.from_pretrained("ParaVT/ParaVT-8B") - Notebooks
- Google Colab
- Kaggle
mwxely commited on
Commit ·
b3e2ae8
1
Parent(s): 07f786f
card: point GitHub links to EvolvingLMMs-Lab/ParaVT (main branch)
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ tags:
|
|
| 24 |
<div align="center">
|
| 25 |
|
| 26 |
[](#citation)
|
| 27 |
-
[](https://github.com/
|
| 28 |
[](https://huggingface.co/datasets/ParaVT/ParaVT-Parquet)
|
| 29 |
[](https://huggingface.co/datasets/ParaVT/ParaVT-Source)
|
| 30 |
|
|
@@ -52,18 +52,18 @@ This repository hosts the final post-RL checkpoint (`ParaVT-8B`), obtained by ru
|
|
| 52 |
|
| 53 |
## Usage
|
| 54 |
|
| 55 |
-
`ParaVT-8B` is a drop-in `transformers` / `vllm` model for video-text-to-text. The full evaluation driver, prompt templates, and reproduction scripts live in the [ParaVT GitHub repository](https://github.com/
|
| 56 |
|
| 57 |
```bash
|
| 58 |
# Reproduce the headline numbers (after installing the eval venv)
|
| 59 |
-
git clone https://github.com/
|
| 60 |
cp .secrets.env.example .secrets.env && $EDITOR .secrets.env
|
| 61 |
bash scripts/setup_env.sh eval
|
| 62 |
PARAVT_EVAL_MODEL=ParaVT/ParaVT-8B \
|
| 63 |
bash paravt/eval/scripts/reproduce_paravt_8b.sh
|
| 64 |
```
|
| 65 |
|
| 66 |
-
For inference outside the eval driver, treat the model exactly like `Qwen/Qwen3-VL-8B-Instruct`: vLLM `--model ParaVT/ParaVT-8B`, the same tokenizer, the same chat template. The agentic system prompt and the tool schema used during PARA-GRPO are documented in [`paravt/eval/configs/withtool.yaml`](https://github.com/
|
| 67 |
|
| 68 |
## Citation
|
| 69 |
|
|
|
|
| 24 |
<div align="center">
|
| 25 |
|
| 26 |
[](#citation)
|
| 27 |
+
[](https://github.com/EvolvingLMMs-Lab/ParaVT)
|
| 28 |
[](https://huggingface.co/datasets/ParaVT/ParaVT-Parquet)
|
| 29 |
[](https://huggingface.co/datasets/ParaVT/ParaVT-Source)
|
| 30 |
|
|
|
|
| 52 |
|
| 53 |
## Usage
|
| 54 |
|
| 55 |
+
`ParaVT-8B` is a drop-in `transformers` / `vllm` model for video-text-to-text. The full evaluation driver, prompt templates, and reproduction scripts live in the [ParaVT GitHub repository](https://github.com/EvolvingLMMs-Lab/ParaVT); please refer to it for the exact environment that produced the reported numbers.
|
| 56 |
|
| 57 |
```bash
|
| 58 |
# Reproduce the headline numbers (after installing the eval venv)
|
| 59 |
+
git clone https://github.com/EvolvingLMMs-Lab/ParaVT.git && cd ParaVT
|
| 60 |
cp .secrets.env.example .secrets.env && $EDITOR .secrets.env
|
| 61 |
bash scripts/setup_env.sh eval
|
| 62 |
PARAVT_EVAL_MODEL=ParaVT/ParaVT-8B \
|
| 63 |
bash paravt/eval/scripts/reproduce_paravt_8b.sh
|
| 64 |
```
|
| 65 |
|
| 66 |
+
For inference outside the eval driver, treat the model exactly like `Qwen/Qwen3-VL-8B-Instruct`: vLLM `--model ParaVT/ParaVT-8B`, the same tokenizer, the same chat template. The agentic system prompt and the tool schema used during PARA-GRPO are documented in [`paravt/eval/configs/withtool.yaml`](https://github.com/EvolvingLMMs-Lab/ParaVT/blob/main/paravt/eval/configs/withtool.yaml) and [`paravt/eval/utils.py`](https://github.com/EvolvingLMMs-Lab/ParaVT/blob/main/paravt/eval/utils.py).
|
| 67 |
|
| 68 |
## Citation
|
| 69 |
|