More information about kimi-k2.5-experts

#1
by Rebis - opened

Hi,
Would it be possible to get more information about this file and its purpose ?
Thank you in advance.

Hi! Great question. I’m currently experimenting with knowledge distillation and expert-to-dense weight merging.
Specifically, I'm exploring the Kimi K2.5 architecture to see if specific expert weights can be isolated and 'collapsed' into a smaller, dense 1.5B/2B parameter model.
The goal is to see if we can retain the high-level reasoning and genre-specific expertise of a massive MoE model while significantly reducing the inference cost for edge devices. It’s an experiment in weight surgery and SVD-based dimensionality reduction, all within the permissions of the Kimi Modified MIT license.
The era of giant models has shown us what’s possible, but the next frontier is making that intelligence efficient enough for everyone to own. I’d love to hear your thoughts on which expert genres we should prioritize for the community—let’s see if we can turn these 'giants' into something that fits in every pocket.
Always happy to chat with others interested in MoE optimization!

That's a great idea. I hope you succeed because I'm interested too.
I'll be following this closely. Not that I want to put any pressure on you.
Thanks anyway for your reply and for taking this initiative, which is likely to be quite a challenge.

Much appreciated! You’re right—it’s definitely a technical climb, but that’s exactly what makes the Kimi architecture so exciting to work with. We’ll be posting updates on the progress here as we hit the first milestones in the extraction and merging process. Feel free to follow our organization to stay connected with the project as it evolves. Glad to have you along for the journey!

Sign up or log in to comment