Qwen2.5-1.5B-Instruct (MLX, 4-bit)

This repository contains an MLX-converted and 4-bit quantized version of
Qwen/Qwen2.5-1.5B-Instruct.

  • No fine-tuning or training was performed
  • Format conversion + post-training quantization only
  • Optimized for Apple Silicon and on-device usage

βœ… Recommended default for iPhone

This 4-bit variant is recommended as the default for on-device usage (e.g., iPhone 13 Pro and newer) due to low memory usage and high throughput.


Usage

pip install -U mlx-lm
mlx_lm.generate \
  --model Irfanuruchi/Qwen2.5-1.5B-Instruct-MLX-4bit \
  --prompt "Write a helpful onboarding message for an iOS app in 3 bullet points."

Bench notes (MacBook Pro M3 Pro)

  • Prompt tokens: 45
  • Generation tokens: 31
  • Generation speed: ~134 tokens/sec
  • Peak memory: ~0.945 GB

Related models

Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
BF16
Β·
U32
Β·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Irfanuruchi/Qwen2.5-1.5B-Instruct-MLX-4bit

Quantized
(168)
this model