--- license: apache-2.0 language: - en - zh library_name: transformers tags: - avbench - audio-text - video-text - audio-video base_model: - Qwen/Qwen2-Audio-7B-Instruct - Qwen/Qwen2.5-Omni-7B --- # AVBench Models This repository hosts the evaluator models used in **AVBench**, a benchmark for text-to-audio-video generation quality and cross-modal consistency. [![Hugging Face Dataset](https://img.shields.io/badge/HuggingFace-Dataset-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/datasets/iiiiii123/AVBench) [![AVBench Models](https://img.shields.io/badge/HuggingFace-AVBench__model-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/iiiiii123/AVBench_model) ## AVBench in brief AVBench evaluates generated content on two splits: - **Normal split**: common, easier samples. - **Hard split**: challenging samples with stronger cross-modal requirements. It covers cross-modal alignment (Audio-Text / Video-Text / Audio-Video) and generation quality dimensions. Dataset link: - https://huggingface.co/datasets/iiiiii123/AVBench ## Model zoo used by AVBench | Model | Use in AVBench | Trained / merged from | |---|---|---| | `Qwen2-Audio-7B-AudioTextMatching-Merged` | Audio-Text consistency scoring (AT) | `Qwen/Qwen2-Audio-7B-Instruct` | | `Qwen2.5-Omni-7B-VideoTextMatching-Merged` | Video-Text consistency scoring (VT) | `Qwen/Qwen2.5-Omni-7B` | | `Qwen2.5-Omni-7B-AudioVideoMatching-Merged` | Audio-Video consistency scoring (AV) | `Qwen/Qwen2.5-Omni-7B` | ## Notes These models are released for AVBench evaluation and analysis.