-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2508.09983
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 51 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 70 -
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Paper • 2503.01710 • Published • 6 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 142
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.17k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
-
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Paper • 2508.05988 • Published • 21 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 99 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 8 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 58 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 70 -
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
Paper • 2508.14111 • Published • 33
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 305 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 55 -
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal
Paper • 2508.05988 • Published • 21 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 99 -
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper • 2508.03346 • Published • 8 -
Reinforcement Learning in Vision: A Survey
Paper • 2508.08189 • Published • 30
-
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 51 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 70 -
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Paper • 2503.01710 • Published • 6 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 142
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 86 -
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Paper • 2508.09736 • Published • 58 -
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Paper • 2508.09983 • Published • 70 -
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
Paper • 2508.14111 • Published • 33
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 8.17k • 70 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 282 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 133
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 305 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 55 -
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70