Qwen2.5-0.5B-Instruct Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of Qwen/Qwen2.5-0.5B-Instruct optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model Qwen2.5-0.5B-Instruct
Parameters 490 million (360M non-embedding)
Quantization Q4_K_M (4-bit, medium quality)
File Size ~379 MB
Context Window 32,768 tokens
Architecture Qwen2 (RoPE, SwiGLU, RMSNorm)

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs efficiently on all iOS devices including older models
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • Best-in-class for sub-1B: Excellent performance for its size
  • Multilingual: Supports 29+ languages
  • Long Context: 32K token context window
  • Structured Output: Good at JSON and formatted responses
  • Fast Inference: Quick responses with minimal resources

Usage with llama.cpp

./llama-cli -m Qwen2.5-0.5B-Instruct-Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model inherits the Apache 2.0 license from the original Qwen2.5 model.

Attribution

Downloads last month
87
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/Qwen2.5-0.5B-Instruct-Q4_K_M-GGUF

Quantized
(194)
this model