Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 7 days ago • 74
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published Feb 4 • 10