Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 9 days ago • 38
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 52