Qwen2.5-7B Saudi Dialect — GPTQ

Fine-tuned from Qwen/Qwen2.5-7B-Instruct on Saudi Najdi dialect conversations.

Capabilities

🇸🇦 Saudi dialect conversation (Najdi style)
🔧 Hermes-style tool calling (CRM APIs: create_ticket, update_crm, query_database)
📚 RAG document injection
⚡ vLLM deployment ready

vLLM Deployment

python -m vllm.entrypoints.openai.api_server \
    --model mohameddalii/qwen25-7b-saudi-gptq \
    --served-model-name qwen-saudi \
    --tool-call-parser hermes \
    --enable-auto-tool-choice \
    --quantization gptq \
    \
    --max-model-len 4096 \
    --dtype float16 \
    --gpu-memory-utilization 0.88 \
    --host 0.0.0.0 --port 8000

Training Details

Parameter	Value
Base model	`Qwen/Qwen2.5-7B-Instruct`
Method	QLoRA 4-bit NF4
LoRA r	64
LoRA alpha	128
Context	4096 tokens
Dataset	`HeshamHaroon/saudi-dialect-conversations`
GPUs	2× RTX 4090 (DDP)

V100 Compatibility

This model uses GPTQ quantization (exllamav2 kernels) which supports V100 (sm_70). AWQ is NOT used as it requires sm_80+.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

I32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support