Theorem Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Distillation Memorization Dynamics in Knowledge Distillation for Language Models Paper • 2601.15394 • Published Jan 21 • 3 FASA: Frequency-aware Sparse Attention Paper • 2602.03152 • Published Feb 3 • 154
Memorization Dynamics in Knowledge Distillation for Language Models Paper • 2601.15394 • Published Jan 21 • 3
Theorem Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Vision Transformer Finetuning Benefits from Non-Smooth Components Paper • 2602.06883 • Published Feb 6 • 4
Distillation Memorization Dynamics in Knowledge Distillation for Language Models Paper • 2601.15394 • Published Jan 21 • 3 FASA: Frequency-aware Sparse Attention Paper • 2602.03152 • Published Feb 3 • 154
Memorization Dynamics in Knowledge Distillation for Language Models Paper • 2601.15394 • Published Jan 21 • 3