Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.14724

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published Jan 22 • 86
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Paper • 2601.21590 • Published Jan 29 • 14

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 63
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
VIOLA: Towards Video In-Context Learning with Minimal Annotations

Paper • 2601.15549 • Published Jan 22 • 4

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published Mar 6, 2025 • 5
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

Paper • 2503.15851 • Published Mar 20, 2025 • 10
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Paper • 2504.11427 • Published Apr 15, 2025 • 19
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7, 2025 • 36

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Towards Automated Kernel Generation in the Era of LLMs

Paper • 2601.15727 • Published Jan 22 • 19
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published Jan 22 • 86
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Paper • 2601.21590 • Published Jan 29 • 14

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Towards Automated Kernel Generation in the Era of LLMs

Paper • 2601.15727 • Published Jan 22 • 19
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 63
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75
VIOLA: Towards Video In-Context Learning with Minimal Annotations

Paper • 2601.15549 • Published Jan 22 • 4

Vision Reasoning

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Paper • 2505.16854 • Published May 22, 2025 • 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Paper • 2505.16192 • Published May 22, 2025 • 12

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published Mar 6, 2025 • 5
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

Paper • 2503.15851 • Published Mar 20, 2025 • 10
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors

Paper • 2504.11427 • Published Apr 15, 2025 • 19
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7, 2025 • 36

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs