-
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper • 2507.01352 • Published • 60 -
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
Paper • 2507.13563 • Published • 53 -
Scaling Laws for Optimal Data Mixtures
Paper • 2507.09404 • Published • 38 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233
Collections
Discover the best community collections!
Collections including paper arxiv:2511.12609
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 106 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 191 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 42
-
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Paper • 2507.01352 • Published • 60 -
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
Paper • 2507.13563 • Published • 53 -
Scaling Laws for Optimal Data Mixtures
Paper • 2507.09404 • Published • 38 -
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
Paper • 2511.14993 • Published • 233
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 106 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 68 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 126 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 191 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 42