-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 142 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 154 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 143
Collections
Discover the best community collections!
Collections including paper arxiv:2604.02176
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Paper • 2511.21691 • Published • 36 -
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper • 2511.21678 • Published • 12 -
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Paper • 2511.20937 • Published • 16 -
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation
Paper • 2512.10949 • Published • 47
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 152 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 153 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 119
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 480 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 74 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 77
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 142 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 154 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 143
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 152 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 153 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 119
-
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper • 2512.23988 • Published • 19 -
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
Paper • 2512.25075 • Published • 15 -
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Paper • 2512.24176 • Published • 8
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
mHC: Manifold-Constrained Hyper-Connections
Paper • 2512.24880 • Published • 322 -
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Paper • 2601.00393 • Published • 133 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 176
-
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Paper • 2511.21691 • Published • 36 -
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper • 2511.21678 • Published • 12 -
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Paper • 2511.20937 • Published • 16 -
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation
Paper • 2512.10949 • Published • 47
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 480 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 74 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 77
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45