Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 5 days ago • 68
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published 3 days ago • 90
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20, 2025 • 28
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5, 2024 • 5
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 13 days ago • 33
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 28