---
license: apache-2.0
base_model: Qwen/Qwen3-VL-2B-Instruct
tags:
- vision
- multimodal
- safety
- guard-model
- icml-2026
pipeline_tag: image-text-to-text
---
# OmniVL-Guard-2B
[](https://arxiv.org/abs/2602.10687)
[](https://github.com/shen8424/OmniVL-Guard)
[](https://huggingface.co/datasets/SJJ0854/FSFR)
[](https://icml.cc)
[](./)
A safety guard model for vision-language content moderation, accepted at **ICML 2026**. Fine-tuned from [Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct).
## Usage
```python
from transformers import Qwen3_VLForConditionalGeneration, AutoProcessor
model = Qwen3_VLForConditionalGeneration.from_pretrained("SJJ0854/OmniVL-Guard-2B")
processor = AutoProcessor.from_pretrained("SJJ0854/OmniVL-Guard-2B")
```
## Training Data
Refined-SFT and RL datasets available at [SJJ0854/FSFR](https://huggingface.co/datasets/SJJ0854/FSFR).
## Citation
```bibtex
@inproceedings{omnivlguard2026,
title={OmniVL-Guard: A Safety Guard for Vision-Language Models},
booktitle={International Conference on Machine Learning (ICML)},
year={2026}
}
```