Lifting the Curse of Capacity Gap in Distilling Language Models
Paper • 2305.12129 • Published
minimoe-4L-384H distilled from bert-base-uncased on Wikipedia.
Repository: https://github.com/GeneZC/MiniMoE arXiv: https://arxiv.org/abs/2305.12129