Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference Paper • 2604.07394 • Published 8 days ago • 16
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 10 days ago • 107