Gemopus-4-E4B-it-MLX-4bit
This is a 4-bit quantization of Jackrong/Gemopus-4-E4B-it converted to MLX format.
Optimization Details
- Quantization: 4-bit
- Framework: MLX
- Hardware used for conversion: MacBook Air M3/M4
Performance on MacBook Air
- Generation Speed: ~35 tokens/sec
- Memory Usage: ~4.3 GB
Usage
pip install mlx-lm
python -m mlx_lm.generate --model Nicoesp/Gemopus-4-E4B-it-MLX-4bit --prompt "Ciao!"
- Downloads last month
- 408
Model size
1B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support