sieve-llama-3.2-1b-GGUF

GGUF quantizations of azatvaliev/sieve-llama-3.2-1b, a fine-tune of meta-llama/Llama-3.2-1B for SQL WHERE clause generation.

See the model card for usage details and input/output format.

Quantizations

File	Quant	Size	Recommendation
model-q4_k_m.gguf	Q4_K_M	0.8 GB	Best speed, still retains high quality
model-q6_k.gguf	Q6_K	1.0 GB	Balanced, but not recommended
model-q8_0.gguf	Q8_0	1.2 GB	Best quality, no perceptible loss from F16

Usage

# Download
huggingface-cli download azatvaliev/sieve-llama-3.2-1b-GGUF model-q4_k_m.gguf

# Run
llama-server --model model-q4_k_m.gguf -c 4096 -ngl 99 --port 8080