metadata
license: apache-2.0
language:
- en
- zh
library_name: transformers
tags:
- avbench
- audio-text
- video-text
- audio-video
base_model:
- Qwen/Qwen2-Audio-7B-Instruct
- Qwen/Qwen2.5-Omni-7B
AVBench Models
This repository hosts the evaluator models used in AVBench, a benchmark for text-to-audio-video generation quality and cross-modal consistency.
AVBench in brief
AVBench evaluates generated content on two splits:
- Normal split: common, easier samples.
- Hard split: challenging samples with stronger cross-modal requirements.
It covers cross-modal alignment (Audio-Text / Video-Text / Audio-Video) and generation quality dimensions.
Dataset link:
Model zoo used by AVBench
| Model | Use in AVBench | Trained / merged from |
|---|---|---|
Qwen2-Audio-7B-AudioTextMatching-Merged |
Audio-Text consistency scoring (AT) | Qwen/Qwen2-Audio-7B-Instruct |
Qwen2.5-Omni-7B-VideoTextMatching-Merged |
Video-Text consistency scoring (VT) | Qwen/Qwen2.5-Omni-7B |
Qwen2.5-Omni-7B-AudioVideoMatching-Merged |
Audio-Video consistency scoring (AV) | Qwen/Qwen2.5-Omni-7B |
Notes
These models are released for AVBench evaluation and analysis.