Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Paper • 2402.12030 • Published • 3
The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers.