Any plans for Xiaomi's MiMo V2 Flash?

#8
by droussis - opened

Hi,

I tested the reuploaded quants and they work extremely well for multilingual data!

Are there any plans for an AWQ version of Xiaomi's MiMo V2 Flash? It's an extremely good model as well.

Thank you very much!

@mratsim

Thought about normalizing making a contribution each time I make a request. 😁

One more useful calibration dataset here which contains 102 rows:

  • 94 with multilingual multiturn dialogues with reasoning covering 24 EU languages
  • 4 with coding agent trajectories (OpenHands-like scaffold) <- Note that it contains trajectories with up to 60 turns; might not fit everywhere
  • 4 with terminal agent trajectories with reasoning (Terminus2-XML like scaffold) <- Large trajectories too, but max 26k tokens context

Its aim is not only multilingualism, but also SOTA agentic uses. From dev to dev.

    - dataset: droussis/multilingual_multiturn_agentic_calibration_data
      subset: openhands_code_agent
      split: train
      columns: [messages]
      formatter: chat_completion
      num_samples: 4

    - dataset: droussis/multilingual_multiturn_agentic_calibration_data
      subset: terminal_agent
      split: train
      columns: [messages]
      formatter: chat_completion
      num_samples: 4
      
    - dataset: droussis/multilingual_multiturn_agentic_calibration_data
      subset: multilingual_multiturn
      split: train
      columns: [messages]
      formatter: chat_completion
      num_samples: 94

For now no plan as I would need to add the architecture to llmcompressor.

Furthermore it uses attention sinks like gpt-oss and llmcompressor team didn't even implement all experts calibration for GPT-OSS: https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modeling/gpt_oss.py

Edit: there is an issue about this: https://github.com/vllm-project/llm-compressor/issues/2159

Sign up or log in to comment