MERaLiON-2-3B-MLX-4bit

4-bit quantized MLX version of MERaLiON-2-3B for Apple Silicon.

Quantization Details

  • Method: MLX affine quantization
  • Bits: 4
  • Group size: 64
  • Components quantized: Decoder (Gemma2-2B) only
  • Components kept in full precision: Whisper-Large-V3 encoder, multi-modal adaptor

Size Comparison

Component Original Quantized
Decoder (Gemma2-2B) 4.9 GB 1.4 GB
Encoder (Whisper-Large-V3) 1.2 GB 1.2 GB
Adaptor 419 MB 419 MB
Total 6.5 GB 3.0 GB

Usage

Structure

    • Whisper-Large-V3 encoder (full precision)
    • Multi-modal adaptor (full precision)
    • Gemma2-2B decoder (4-bit quantized)
    • Decoder directory with config, tokenizer, and symlinks to decoder shards
Downloads last month
34
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/MERaLiON-2-3B-MLX-4bit

Finetuned
(3)
this model