MERaLiON-2-10B-MLX-4bit

4-bit quantized Apple MLX version of MERaLiON/MERaLiON-2-10B.

MERaLiON-2-10B is a multimodal speech-language model developed by I2R, A*STAR (Singapore). It combines a Whisper-large-v3 encoder with a Gemma-2-9B-IT decoder for speech understanding tasks.

Quantization Details

Component Format Size
Decoder (Gemma-2-9B-IT) 4-bit quantized (group_size=64, affine) 4.96 GB
Encoder (Whisper-large-v3) float16 (unquantized) 1.22 GB
Adaptor float16 (unquantized) 0.43 GB
Total ~6.5 GB

Quantized from the original full-precision MERaLiON-2-10B weights (not re-quantized from 8-bit).

Size comparison:

  • Original PyTorch (bfloat16): ~20 GB
  • MLX 8-bit: ~11.6 GB
  • MLX 4-bit (this model): ~6.5 GB (44% smaller than 8-bit)

Model Structure

encoder.safetensors          # Whisper-large-v3 encoder
adaptor.safetensors          # Speech-text adaptor MLP
decoder-00000.safetensors    # 4-bit quantized Gemma-2-9B-IT
decoder/                     # Standalone decoder directory (symlinks)

Usage

The decoder can be used standalone with mlx_lm:

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-2-10B-MLX-4bit/decoder")
result = generate(model, tokenizer, prompt="Hello", max_tokens=100)

For full multimodal (speech + text) usage, refer to the original model documentation.

License

This model is released under the MERaLiON Public Licence v3.

Downloads last month
42
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/MERaLiON-2-10B-MLX-4bit

Finetuned
(3)
this model