-
Nemotron ColEmbed V2: Top-Performing Late Interaction embedding models for Visual Document Retrieval
Paper • 2602.03992 • Published • 1 -
M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Paper • 2512.03514 • Published • 9 -
ModernVBERT: Towards Smaller Visual Document Retrievers
Paper • 2510.01149 • Published • 33 -
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model
Paper • 2507.05513 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2407.01449
-
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 44 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 634k • 2.76k -
tiiuae/falcon-refinedweb
Viewer • Updated • 968M • 42.2k • 904 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 20 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 48 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Paper • 2412.06089 • Published • 4 -
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Paper • 2412.05818 • Published • 1 -
FLAIR: VLM with Fine-grained Language-informed Image Representations
Paper • 2412.03561 • Published • 2
-
Nemotron ColEmbed V2: Top-Performing Late Interaction embedding models for Visual Document Retrieval
Paper • 2602.03992 • Published • 1 -
M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Paper • 2512.03514 • Published • 9 -
ModernVBERT: Towards Smaller Visual Document Retrievers
Paper • 2510.01149 • Published • 33 -
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model
Paper • 2507.05513 • Published • 1
-
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 44 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 634k • 2.76k -
tiiuae/falcon-refinedweb
Viewer • Updated • 968M • 42.2k • 904 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23
-
CompCap: Improving Multimodal Large Language Models with Composite Captions
Paper • 2412.05243 • Published • 20 -
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Paper • 2412.06089 • Published • 4 -
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Paper • 2412.05818 • Published • 1 -
FLAIR: VLM with Fine-grained Language-informed Image Representations
Paper • 2412.03561 • Published • 2
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper • 2404.07204 • Published • 20 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 48 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 121