Any plans for Xiaomi's MiMo V2 Flash?
Hi,
I tested the reuploaded quants and they work extremely well for multilingual data!
Are there any plans for an AWQ version of Xiaomi's MiMo V2 Flash? It's an extremely good model as well.
Thank you very much!
Thought about normalizing making a contribution each time I make a request. 😁
One more useful calibration dataset here which contains 102 rows:
- 94 with multilingual multiturn dialogues with reasoning covering 24 EU languages
- 4 with coding agent trajectories (OpenHands-like scaffold) <- Note that it contains trajectories with up to 60 turns; might not fit everywhere
- 4 with terminal agent trajectories with reasoning (Terminus2-XML like scaffold) <- Large trajectories too, but max 26k tokens context
Its aim is not only multilingualism, but also SOTA agentic uses. From dev to dev.
- dataset: droussis/multilingual_multiturn_agentic_calibration_data
subset: openhands_code_agent
split: train
columns: [messages]
formatter: chat_completion
num_samples: 4
- dataset: droussis/multilingual_multiturn_agentic_calibration_data
subset: terminal_agent
split: train
columns: [messages]
formatter: chat_completion
num_samples: 4
- dataset: droussis/multilingual_multiturn_agentic_calibration_data
subset: multilingual_multiturn
split: train
columns: [messages]
formatter: chat_completion
num_samples: 94
For now no plan as I would need to add the architecture to llmcompressor.
Furthermore it uses attention sinks like gpt-oss and llmcompressor team didn't even implement all experts calibration for GPT-OSS: https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modeling/gpt_oss.py
Edit: there is an issue about this: https://github.com/vllm-project/llm-compressor/issues/2159