MERaLiON-2-10B-MLX-4bit
4-bit quantized Apple MLX version of MERaLiON/MERaLiON-2-10B.
MERaLiON-2-10B is a multimodal speech-language model developed by I2R, A*STAR (Singapore). It combines a Whisper-large-v3 encoder with a Gemma-2-9B-IT decoder for speech understanding tasks.
Quantization Details
| Component | Format | Size |
|---|---|---|
| Decoder (Gemma-2-9B-IT) | 4-bit quantized (group_size=64, affine) | 4.96 GB |
| Encoder (Whisper-large-v3) | float16 (unquantized) | 1.22 GB |
| Adaptor | float16 (unquantized) | 0.43 GB |
| Total | ~6.5 GB |
Quantized from the original full-precision MERaLiON-2-10B weights (not re-quantized from 8-bit).
Size comparison:
- Original PyTorch (bfloat16): ~20 GB
- MLX 8-bit: ~11.6 GB
- MLX 4-bit (this model): ~6.5 GB (44% smaller than 8-bit)
Model Structure
encoder.safetensors # Whisper-large-v3 encoder
adaptor.safetensors # Speech-text adaptor MLP
decoder-00000.safetensors # 4-bit quantized Gemma-2-9B-IT
decoder/ # Standalone decoder directory (symlinks)
Usage
The decoder can be used standalone with mlx_lm:
from mlx_lm import load, generate
model, tokenizer = load("majentik/MERaLiON-2-10B-MLX-4bit/decoder")
result = generate(model, tokenizer, prompt="Hello", max_tokens=100)
For full multimodal (speech + text) usage, refer to the original model documentation.
License
This model is released under the MERaLiON Public Licence v3.
- Downloads last month
- 42
Hardware compatibility
Log In to add your hardware
Quantized