-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2511.04570
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 57 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 98 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 16 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 87
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 106 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 129 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 82 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 90 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 165
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 106 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 129 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 82 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 60 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 57 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 242 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 98 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 16 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 87
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 90 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 43 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 165