| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-VL-2B-Instruct |
| tags: |
| - vision |
| - multimodal |
| - safety |
| - guard-model |
| - icml-2026 |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # OmniVL-Guard-2B |
|
|
| <div align="center"> |
|
|
| <!-- Keep this line empty for spacing --> |
|
|
| [](https://arxiv.org/abs/2602.10687) |
| [](https://github.com/shen8424/OmniVL-Guard) |
| [](https://huggingface.co/datasets/SJJ0854/FSFR) |
| [](https://icml.cc) |
| [](./) |
|
|
| </div> |
|
|
| A safety guard model for vision-language content moderation, accepted at **ICML 2026**. Fine-tuned from [Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct). |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import Qwen3_VLForConditionalGeneration, AutoProcessor |
| |
| model = Qwen3_VLForConditionalGeneration.from_pretrained("SJJ0854/OmniVL-Guard-2B") |
| processor = AutoProcessor.from_pretrained("SJJ0854/OmniVL-Guard-2B") |
| ``` |
|
|
| ## Training Data |
|
|
| Refined-SFT and RL datasets available at [SJJ0854/FSFR](https://huggingface.co/datasets/SJJ0854/FSFR). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{omnivlguard2026, |
| title={OmniVL-Guard: A Safety Guard for Vision-Language Models}, |
| booktitle={International Conference on Machine Learning (ICML)}, |
| year={2026} |
| } |
| ``` |
|
|