SmolLM2-1.7B-Instruct Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of HuggingFaceTB/SmolLM2-1.7B-Instruct optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model SmolLM2-1.7B-Instruct
Parameters 1.7 billion
Quantization Q4_K_M (4-bit, medium quality)
File Size ~1.0 GB
Context Window 8,192 tokens
Architecture LLaMA-style transformer
Training Data 11 trillion tokens

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs efficiently on most iOS devices
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • Best in SmolLM2 Family: Highest quality from the SmolLM2 series
  • Function Calling: Supports tool use and function calling
  • Instruction Following: Excellent at following complex instructions
  • Code Generation: Good coding capabilities
  • Efficient: Trained on 11T tokens for maximum efficiency

Usage with llama.cpp

./llama-cli -m SmolLM2-1.7B-Instruct.Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model inherits the Apache 2.0 license from the original SmolLM2 model.

Attribution

Downloads last month
56
GGUF
Model size
2B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/SmolLM2-1.7B-Instruct-Q4_K_M-GGUF

Quantized
(87)
this model