Qwen3-4B-Product-Extractor - GGUF Q4_K_M
This is a GGUF quantized version of the fine-tuned Qwen3-4B model for product data extraction, optimized for CPU inference.
Model Details
- Base Model: unsloth/Qwen3-4B-Base
- Fine-tuning: GRPO (Group Relative Policy Optimization)
- Quantization: Q4_K_M
- Estimated Size: ~2.5GB
- Optimization: CPU inference, memory efficient
Performance
- Speed: 3x faster than full precision
- Memory: 4x less memory usage
- Quality: Good
Usage with llama.cpp
# Download model
huggingface-cli download pragnesh002/Qwen3-4B-Product-Extractor-GGUF-Q4-K-M --local-dir ./model
# Run inference
./main -m ./model/*.gguf -p "Your prompt here"
Usage with Transformers (AutoGGUF)
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("pragnesh002/Qwen3-4B-Product-Extractor-GGUF-Q4-K-M")
model = AutoModelForCausalLM.from_pretrained(
"pragnesh002/Qwen3-4B-Product-Extractor-GGUF-Q4-K-M",
device_map="cpu",
trust_remote_code=True
)
Recommended Use Cases
- Q4_K_M: Best for deployment with size constraints
- Q5_K_M: Balanced quality and size
- Q8_0: High quality applications
- F16: Maximum quality, research use
Product Data Extraction
This model excels at extracting structured data from product catalogs:
prompt = '''Extract product data from:
Item: GR-AA10
Description: Wall Art
Manufacturer: Harper & Wilde
Output JSON:'''
# Expected output: structured JSON with product information
- Downloads last month
- 14
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support