Collections
Discover the best community collections!
Collections including paper arxiv:2504.10471
-
MAEB: Massive Audio Embedding Benchmark
Paper • 2602.16008 • Published • 22 -
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
Paper • 2510.10062 • Published • 10 -
MMTEB: Massive Multilingual Text Embedding Benchmark
Paper • 2502.13595 • Published • 48 -
MIEB: Massive Image Embedding Benchmark
Paper • 2504.10471 • Published • 21
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 247 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 40 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 40
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 157 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 124
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 157 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 124
-
MAEB: Massive Audio Embedding Benchmark
Paper • 2602.16008 • Published • 22 -
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
Paper • 2510.10062 • Published • 10 -
MMTEB: Massive Multilingual Text Embedding Benchmark
Paper • 2502.13595 • Published • 48 -
MIEB: Massive Image Embedding Benchmark
Paper • 2504.10471 • Published • 21
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 247 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 40 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 40