Gemopus-4-E4B-it-MLX-4bit

This is a 4-bit quantization of Jackrong/Gemopus-4-E4B-it converted to MLX format.

Optimization Details

  • Quantization: 4-bit
  • Framework: MLX
  • Hardware used for conversion: MacBook Air M3/M4

Performance on MacBook Air

  • Generation Speed: ~35 tokens/sec
  • Memory Usage: ~4.3 GB

Usage

pip install mlx-lm
python -m mlx_lm.generate --model Nicoesp/Gemopus-4-E4B-it-MLX-4bit --prompt "Ciao!"
Downloads last month
408
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support