sieve-llama-3.2-1b-GGUF

GGUF quantizations of azatvaliev/sieve-llama-3.2-1b, a fine-tune of meta-llama/Llama-3.2-1B for SQL WHERE clause generation.

See the model card for usage details and input/output format.

Quantizations

File Quant Size Recommendation
model-q4_k_m.gguf Q4_K_M 0.8 GB Best speed, still retains high quality
model-q6_k.gguf Q6_K 1.0 GB Balanced, but not recommended
model-q8_0.gguf Q8_0 1.2 GB Best quality, no perceptible loss from F16

Usage

# Download
huggingface-cli download azatvaliev/sieve-llama-3.2-1b-GGUF model-q4_k_m.gguf

# Run
llama-server --model model-q4_k_m.gguf -c 4096 -ngl 99 --port 8080
Downloads last month
36
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for azatvaliev/sieve-llama-3.2-1b-GGUF

Quantized
(1)
this model

Collection including azatvaliev/sieve-llama-3.2-1b-GGUF