Qwen2.5-7B Saudi Dialect β€” GPTQ

Fine-tuned from Qwen/Qwen2.5-7B-Instruct on Saudi Najdi dialect conversations.

Capabilities

  • πŸ‡ΈπŸ‡¦ Saudi dialect conversation (Najdi style)
  • πŸ”§ Hermes-style tool calling (CRM APIs: create_ticket, update_crm, query_database)
  • πŸ“š RAG document injection
  • ⚑ vLLM deployment ready

vLLM Deployment

python -m vllm.entrypoints.openai.api_server \
    --model mohameddalii/qwen25-7b-saudi-gptq \
    --served-model-name qwen-saudi \
    --tool-call-parser hermes \
    --enable-auto-tool-choice \
    --quantization gptq \
    \
    --max-model-len 4096 \
    --dtype float16 \
    --gpu-memory-utilization 0.88 \
    --host 0.0.0.0 --port 8000

Training Details

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct
Method QLoRA 4-bit NF4
LoRA r 64
LoRA alpha 128
Context 4096 tokens
Dataset HeshamHaroon/saudi-dialect-conversations
GPUs 2Γ— RTX 4090 (DDP)

V100 Compatibility

This model uses GPTQ quantization (exllamav2 kernels) which supports V100 (sm_70). AWQ is NOT used as it requires sm_80+.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
I32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support