Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 7 days ago • 74
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
Multimodal GGUFs Collection Vision and audio models compatible with llama-server and llama-mtmd-cli • 16 items • Updated Dec 18, 2025 • 20
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published Feb 26, 2025 • 47