Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 6 days ago • 72
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10