VQR1-7B-YouTubeUGC

VQR1-7B-YouTubeUGC is the base video quality assessment (VQA) model used in MDS-VQA: Model-Informed Data Selection for Video Quality Assessment. It instantiates the base quality model f(·) in the MDS-VQA pipeline and is trained on the YouTube-UGC source dataset to predict perceptual video quality scores.

MDS-VQA augments this base model with a separate failure predictor g(·) and a diversity-aware selection module. This repository contains the base VQA model checkpoint, not the MDS-VQA failure predictor or an actively fine-tuned target-domain model.

Paper: arXiv:2603.11525
Project/code: Multimedia-Analytics-Laboratory/MDS-VQA

Model Details

  • Model type: no-reference video quality assessment vision-language model
  • Backbone family: Qwen2.5-VL / VisualQuality-R1-style VLM
  • Architecture in this repository: Qwen2_5_VLForConditionalGeneration
  • Parameters: approximately 8.29B BF16 parameters
  • Training data: YouTube-UGC, used as the labeled source-domain dataset in MDS-VQA
  • Input: a video plus a VQA prompt
  • Output: a quality score on a 1 to 5 scale, typically inside <answer>...</answer> tags
  • License: Apache 2.0

Intended Use

This model is intended for research on no-reference video quality assessment and data selection for VQA. Typical uses include:

  • predicting a baseline quality score for user-generated or streaming videos;
  • serving as the base quality model f(·) in the MDS-VQA pipeline;
  • generating baseline predictions for failure-prediction training and model-informed data selection;
  • comparing active data selection or fine-tuning methods for VQA.

It is not intended as a universal production QoE monitor without domain-specific validation. Quality scores can shift across datasets, display conditions, content types, and distortion families.

Prompt Format

The model follows the VisualQuality-R1-style scoring prompt used in MDS-VQA:

You are doing the video quality assessment task.
Here is the question: What is your overall rating on the quality of this video? The rating should be a float between 1 and 5, rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality.
First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags.

For automatic evaluation, parse the scalar value inside the final <answer> tag.

Example Usage

Please refer to the src/inference.py with the former prompt format.

MDS-VQA Context

MDS-VQA is a model-informed data selection mechanism for VQA. Given an unlabeled video pool, it selects videos that are both:

  1. Difficult for the base VQA model: estimated by a failure predictor trained to rank videos by the base model's prediction errors.
  2. Diverse in content: estimated from frame-level semantic video features, using a Chamfer-distance-based diversity term.

Citation

If you use this model, please cite MDS-VQA:

@article{zou2026mds,
  title={MDS-VQA: Model-Informed Data Selection for Video Quality Assessment},
  author={Zou, Jian and Xu, Xiaoyu and Wang, Zhihua and Wang, Yilin and Adsumilli, Balu and Ma, Kede},
  journal={arXiv preprint arXiv:2603.11525},
  year={2026}
}
Downloads last month
32
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hollow404/VQR1-7B-YouTubeUGC

Finetuned
(1071)
this model
Quantizations
1 model

Paper for hollow404/VQR1-7B-YouTubeUGC