hollow404 commited on
Commit
f2bdbae
verified
1 Parent(s): 6ec9a37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -1,3 +1,84 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ base_model:
7
+ - Qwen/Qwen2.5-VL-7B-Instruct
8
+ datasets:
9
+ - YouTube-UGC
10
+ tags:
11
+ - video-quality-assessment
12
+ - no-reference-vqa
13
+ - visual-quality-r1
14
+ - qwen2.5-vl
15
+ - mds-vqa
16
+ - arxiv:2603.11525
17
  ---
18
+
19
+ # VQR1-7B-YouTubeUGC
20
+
21
+ `VQR1-7B-YouTubeUGC` is the base video quality assessment (VQA) model used in **MDS-VQA: Model-Informed Data Selection for Video Quality Assessment**. It instantiates the base quality model `f(路)` in the MDS-VQA pipeline and is trained on the YouTube-UGC source dataset to predict perceptual video quality scores.
22
+
23
+ MDS-VQA augments this base model with a separate failure predictor `g(路)` and a diversity-aware selection module. This repository contains the base VQA model checkpoint, not the MDS-VQA failure predictor or an actively fine-tuned target-domain model.
24
+
25
+ Paper: [arXiv:2603.11525](https://arxiv.org/abs/2603.11525)
26
+ Project/code: [Multimedia-Analytics-Laboratory/MDS-VQA](https://github.com/Multimedia-Analytics-Laboratory/MDS-VQA)
27
+
28
+ ## Model Details
29
+
30
+ - **Model type:** no-reference video quality assessment vision-language model
31
+ - **Backbone family:** Qwen2.5-VL / VisualQuality-R1-style VLM
32
+ - **Architecture in this repository:** `Qwen2_5_VLForConditionalGeneration`
33
+ - **Parameters:** approximately 8.29B BF16 parameters
34
+ - **Training data:** YouTube-UGC, used as the labeled source-domain dataset in MDS-VQA
35
+ - **Input:** a video plus a VQA prompt
36
+ - **Output:** a quality score on a 1 to 5 scale, typically inside `<answer>...</answer>` tags
37
+ - **License:** Apache 2.0
38
+
39
+ ## Intended Use
40
+
41
+ This model is intended for research on no-reference video quality assessment and data selection for VQA. Typical uses include:
42
+
43
+ - predicting a baseline quality score for user-generated or streaming videos;
44
+ - serving as the base quality model `f(路)` in the MDS-VQA pipeline;
45
+ - generating baseline predictions for failure-prediction training and model-informed data selection;
46
+ - comparing active data selection or fine-tuning methods for VQA.
47
+
48
+ It is not intended as a universal production QoE monitor without domain-specific validation. Quality scores can shift across datasets, display conditions, content types, and distortion families.
49
+
50
+ ## Prompt Format
51
+
52
+ The model follows the VisualQuality-R1-style scoring prompt used in MDS-VQA:
53
+
54
+ ```text
55
+ You are doing the video quality assessment task.
56
+ Here is the question: What is your overall rating on the quality of this video? The rating should be a float between 1 and 5, rounded to two decimal places, with 1 representing very poor quality and 5 representing excellent quality.
57
+ First output the thinking process in <think> </think> tags and then output the final answer with only one score in <answer> </answer> tags.
58
+ ```
59
+
60
+ For automatic evaluation, parse the scalar value inside the final `<answer>` tag.
61
+
62
+ ## Example Usage
63
+
64
+ Please refer to the `src/inference.py` with the former prompt format.
65
+
66
+ ## MDS-VQA Context
67
+
68
+ MDS-VQA is a model-informed data selection mechanism for VQA. Given an unlabeled video pool, it selects videos that are both:
69
+
70
+ 1. **Difficult for the base VQA model:** estimated by a failure predictor trained to rank videos by the base model's prediction errors.
71
+ 2. **Diverse in content:** estimated from frame-level semantic video features, using a Chamfer-distance-based diversity term.
72
+
73
+ ## Citation
74
+
75
+ If you use this model, please cite MDS-VQA:
76
+
77
+ ```bibtex
78
+ @article{zou2026mds,
79
+ title={MDS-VQA: Model-Informed Data Selection for Video Quality Assessment},
80
+ author={Zou, Jian and Xu, Xiaoyu and Wang, Zhihua and Wang, Yilin and Adsumilli, Balu and Ma, Kede},
81
+ journal={arXiv preprint arXiv:2603.11525},
82
+ year={2026}
83
+ }
84
+ ```