Llama-3.2-3B Q4_K_M GGUF

This is a Q4_K_M quantized GGUF conversion of meta-llama/Llama-3.2-3B-Instruct optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model Llama-3.2-3B-Instruct
Parameters 3 billion
Quantization Q4_K_M (4-bit, medium quality)
File Size ~2.0 GB
Context Window 128,000 tokens
Architecture LLaMA 3.2

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs efficiently on iOS devices with 6GB+ RAM
  • llama.cpp Integration: Compatible with llama.cpp and its bindings
  • On-Device AI: Private, offline inference without cloud dependencies

Capabilities

  • General conversation and Q&A
  • Summarization and text generation
  • Code assistance
  • Reasoning and analysis
  • Multilingual support

Usage with llama.cpp

./llama-cli -m Llama-3.2-3B-Q4_K_M.gguf -p "Your prompt here" -n 512

License

This model is subject to Meta's Llama 3.2 Community License Agreement. Please review the license terms before use. Commercial use requires acceptance of Meta's license.

Attribution

Downloads last month
19
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/Llama-3.2-3B-Q4_K_M-GGUF

Quantized
(439)
this model