Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 4 days ago • 64
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published Jul 18, 2025 • 25
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving Paper • 2601.01528 • Published Jan 4 • 19