MERaLiON-2-3B-MLX-4bit
4-bit quantized MLX version of MERaLiON-2-3B for Apple Silicon.
Quantization Details
- Method: MLX affine quantization
- Bits: 4
- Group size: 64
- Components quantized: Decoder (Gemma2-2B) only
- Components kept in full precision: Whisper-Large-V3 encoder, multi-modal adaptor
Size Comparison
| Component | Original | Quantized |
|---|---|---|
| Decoder (Gemma2-2B) | 4.9 GB | 1.4 GB |
| Encoder (Whisper-Large-V3) | 1.2 GB | 1.2 GB |
| Adaptor | 419 MB | 419 MB |
| Total | 6.5 GB | 3.0 GB |
Usage
Structure
- Whisper-Large-V3 encoder (full precision)
- Multi-modal adaptor (full precision)
- Gemma2-2B decoder (4-bit quantized)
- Decoder directory with config, tokenizer, and symlinks to decoder shards
- Downloads last month
- 34
Hardware compatibility
Log In to add your hardware
Quantized