Qwen2.5-1.5B-Instruct Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of Qwen/Qwen2.5-1.5B-Instruct optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model Qwen2.5-1.5B-Instruct
Parameters 1.54 billion (1.31B non-embedding)
Quantization Q4_K_M (4-bit, medium quality)
File Size ~1.0 GB
Context Window 32,768 tokens
Architecture Qwen2 (RoPE, SwiGLU, RMSNorm)

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs efficiently on most iOS devices
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • Math & Coding: Strong mathematical and programming abilities
  • Reasoning: Excellent for 1.5B class model
  • Multilingual: Supports 29+ languages
  • Long Context: 32K token context window
  • Structured Output: Great at JSON and formatted responses

Usage with llama.cpp

./llama-cli -m Qwen2.5-1.5B-Instruct-Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model inherits the Apache 2.0 license from the original Qwen2.5 model.

Attribution

Downloads last month
1,832
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/Qwen2.5-1.5B-Instruct-Q4_K_M-GGUF

Quantized
(169)
this model