-
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper • 2601.16206 • Published • 86 -
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
Paper • 2601.21590 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2601.14724
-
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 63 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
VIOLA: Towards Video In-Context Learning with Minimal Annotations
Paper • 2601.15549 • Published • 4
-
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper • 2503.04504 • Published • 5 -
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Paper • 2503.15851 • Published • 10 -
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Paper • 2504.11427 • Published • 19 -
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Paper • 2505.04512 • Published • 36
-
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Towards Automated Kernel Generation in the Era of LLMs
Paper • 2601.15727 • Published • 19 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 13 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12
-
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper • 2601.16206 • Published • 86 -
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
Paper • 2601.21590 • Published • 14
-
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Towards Automated Kernel Generation in the Era of LLMs
Paper • 2601.15727 • Published • 19 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 63 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 75 -
VIOLA: Towards Video In-Context Learning with Minimal Annotations
Paper • 2601.15549 • Published • 4
-
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Paper • 2505.15966 • Published • 53 -
GRIT: Teaching MLLMs to Think with Images
Paper • 2505.15879 • Published • 13 -
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Paper • 2505.16854 • Published • 11 -
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Paper • 2505.16192 • Published • 12
-
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper • 2503.04504 • Published • 5 -
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
Paper • 2503.15851 • Published • 10 -
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Paper • 2504.11427 • Published • 19 -
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Paper • 2505.04512 • Published • 36