GPU/VRAM requirements for Qwen3-14B in production batch workload – and is this the right model for our use case?

#17

by Poijs - opened 12 days ago

Hello,

I am from the Centre for Quality Culture in Education at Wrocław Medical University (Poland). We are deploying a local AI workstation for educational quality analytics under GDPR constraints – all inference must remain on-premises.

Our hardware:
HP Z2 Tower G9 | Intel Core i7-14700 | 32 GB DDR5
GPU candidates under evaluation:

Option A: GeForce RTX 5050 – 8 GB VRAM
Option B: GeForce RTX 5070 – 12 GB VRAM
Option C: NVIDIA RTX 4000 Ada Generation – 20 GB VRAM (our recommendation)

Planned workload:

Batch classification of student survey open-text comments (~1,000 records/run)
RAG pipeline over institutional documents: study programs, syllabi, Ministry of Education regulations
Simultaneous LLM + embedding model (nomic-embed-text-v1.5) in VRAM
Automated generation of quality reports
Usage: continuous multi-hour batch processing

Questions for the Qwen team and community:

1. VRAM requirements
What is the minimum VRAM to run Qwen3-14B (Q4_K_M quantization) fully in GPU memory?
Does Option A (8 GB) or Option B (12 GB) allow full GPU inference, or does the model fall back to CPU offloading – and if so, is that still practical for batch workloads of 1,000+ records?

2. Is Qwen3-14B the right model for this use case?
Our tasks are primarily: Polish-language text classification, semantic categorization of short survey responses (1–5 sentences), and document comparison (program text vs. regulatory standard).
Would Qwen3-14B Q4_K_M be appropriate, or would you recommend a different size or variant from the Qwen3 family for this specific workload? Is the "thinking mode" useful here, or does non-thinking mode suffice for classification tasks?

3. Multilingual / Polish language quality
Qwen3 was trained on 119 languages including Polish. Can you comment on the quality of Polish-language instruction following and classification in Qwen3-14B compared to Qwen2.5-14B?

Context:
Our IT department has proposed Option A (8 GB) citing cost. We are seeking factual technical input to support our procurement documentation. Any written confirmation of minimum VRAM requirements from the Qwen team would be very valuable.

Thank you for your time.

Poijs changed discussion status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment