Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 74
Fundamental Al Methods; Perception & World Modeling; Reasoning & Generation; Action & Interaction
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection