Instructions to use prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO")
model = AutoModelForImageTextToText.from_pretrained("prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO

SGLang

How to use prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO with Docker Model Runner:
```
docker model run hf.co/prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO
```

Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO

Overview

Large multimodal models (LMMs) are commonly post-trained through supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR). However, SFT may introduce distributional drift, where the post-SFT policy neither fully preserves the original model capabilities nor faithfully matches the supervision distribution. This issue is particularly challenging for multimodal reasoning, where visual perception and logical reasoning errors can drift in different ways and further affect downstream RL.

We introduce PRISM, a three-stage post-training pipeline that inserts an explicit pre-alignment stage between SFT and RLVR. PRISM performs black-box adversarial on-policy distillation with a Mixture-of-Experts discriminator, providing separate corrective signals for visual grounding and reasoning consistency. This model, Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO, is based on Qwen3-VL-4B-Instruct and further optimized with the PRISM pipeline using DAPO as the final RL algorithm.

Usage & Evaluation

For detailed instructions on inference, training, and evaluation, please refer to our GitHub repository. We recommend using the scripts and environment provided there to reproduce our results.

Citation

If you find PRISM useful for your research and applications, please cite using this BibTeX:

@misc{wang2026sfttorlprealignmentblackboxonpolicy,
      title={Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL}, 
      author={Sudong Wang and Weiquan Huang and Xiaomin Yu and Zuhao Yang and Hehai Lin and Keming Wu and Chaojun Xiao and Chen Chen and Wenxuan Wang and Beier Zhu and Yunjian Zhang and Chengwei Qin},
      year={2026},
      eprint={2604.28123},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.28123}, 
}

Check out this paper: https://arxiv.org/abs/2604.28123

Acknowledgements

We gratefully acknowledge the following open-source projects that made this work possible:

LLaMA-Factory for the supervised fine-tuning infrastructure and tools.
verl for the reinforcement learning training framework.
lmms-eval for the comprehensive evaluation framework for large multimodal models.

We thank the developers and contributors of these projects for their excellent work and for making their code publicly available.

Downloads last month: 53

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(262)

this model

Paper for prism-vlm/Qwen3-VL-4B-Instruct-SFT-PRISM-DAPO

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

Paper • 2604.28123 • Published 7 days ago • 40