-
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
Paper • 2505.15957 • Published • 3 -
Roadmap towards Superhuman Speech Understanding using Large Language Models
Paper • 2410.13268 • Published • 33 -
StressTest: Can YOUR Speech LM Handle the Stress?
Paper • 2505.22765 • Published • 17 -
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Paper • 2411.05361 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2504.09081
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 58 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Paper • 2504.09081 • Published • 16 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
Sekai: A Video Dataset towards World Exploration
Paper • 2506.15675 • Published • 66 -
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40
-
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 11 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 32 -
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
Paper • 2409.01437 • Published • 71 -
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Paper • 2409.05840 • Published • 49
-
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
Paper • 2505.15957 • Published • 3 -
Roadmap towards Superhuman Speech Understanding using Large Language Models
Paper • 2410.13268 • Published • 33 -
StressTest: Can YOUR Speech LM Handle the Stress?
Paper • 2505.22765 • Published • 17 -
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Paper • 2411.05361 • Published • 5
-
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Paper • 2504.09081 • Published • 16 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
Sekai: A Video Dataset towards World Exploration
Paper • 2506.15675 • Published • 66 -
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 40
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 58 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 11 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 32 -
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
Paper • 2409.01437 • Published • 71 -
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Paper • 2409.05840 • Published • 49
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 21 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 17 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31