Qwen2.5-1.5B-Instruct (MLX, 4-bit)
This repository contains an MLX-converted and 4-bit quantized version ofQwen/Qwen2.5-1.5B-Instruct.
- No fine-tuning or training was performed
- Format conversion + post-training quantization only
- Optimized for Apple Silicon and on-device usage
β Recommended default for iPhone
This 4-bit variant is recommended as the default for on-device usage (e.g., iPhone 13 Pro and newer) due to low memory usage and high throughput.
Usage
pip install -U mlx-lm
mlx_lm.generate \
--model Irfanuruchi/Qwen2.5-1.5B-Instruct-MLX-4bit \
--prompt "Write a helpful onboarding message for an iOS app in 3 bullet points."
Bench notes (MacBook Pro M3 Pro)
- Prompt tokens: 45
- Generation tokens: 31
- Generation speed: ~134 tokens/sec
- Peak memory: ~0.945 GB
Related models
- 8-bit variant (higher quality):
https://huggingface.co/Irfanuruchi/Qwen2.5-1.5B-Instruct-MLX-8bit
- Downloads last month
- 8
Model size
0.2B params
Tensor type
BF16
Β·
U32 Β·
Hardware compatibility
Log In to add your hardware
4-bit