Qwen2.5-7B Saudi Dialect β GPTQ
Fine-tuned from Qwen/Qwen2.5-7B-Instruct on Saudi Najdi dialect conversations.
Capabilities
- πΈπ¦ Saudi dialect conversation (Najdi style)
- π§ Hermes-style tool calling (CRM APIs: create_ticket, update_crm, query_database)
- π RAG document injection
- β‘ vLLM deployment ready
vLLM Deployment
python -m vllm.entrypoints.openai.api_server \
--model mohameddalii/qwen25-7b-saudi-gptq \
--served-model-name qwen-saudi \
--tool-call-parser hermes \
--enable-auto-tool-choice \
--quantization gptq \
\
--max-model-len 4096 \
--dtype float16 \
--gpu-memory-utilization 0.88 \
--host 0.0.0.0 --port 8000
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Method | QLoRA 4-bit NF4 |
| LoRA r | 64 |
| LoRA alpha | 128 |
| Context | 4096 tokens |
| Dataset | HeshamHaroon/saudi-dialect-conversations |
| GPUs | 2Γ RTX 4090 (DDP) |
V100 Compatibility
This model uses GPTQ quantization (exllamav2 kernels) which supports V100 (sm_70). AWQ is NOT used as it requires sm_80+.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support