Uploaded model
- Developed by: Matinmollapur01
- License: apache-2.0
- Finetuned from model : unsloth/qwen2.5-3b-instruct-bnb-4bit
🇮🇷 Persian Qwen 2.5 3B Chat (LoRA)
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on Persian conversational data using QLoRA and Unsloth.
It can generate fluent and creative responses in Persian, making it suitable for chatbots, virtual assistants, and Persian NLP applications.
🧠 Model Details
- Base model: Qwen/Qwen2.5-3B-Instruct
- Fine-tuning method: QLoRA (4-bit quantized LoRA)
- Framework: Unsloth 🦥 (optimized for T4/V100 GPUs)
- Language: Persian (fa)
- License: Apache 2.0
- Repository: Matinmollapur01/persian-qwen-3b-chat
Fine-tuning Details
- Dataset:
xmanii/Persian_QA_Chat_Format(~5.4k QA pairs in chat format) - LoRA rank (r): 16
- LoRA alpha: 16
- Target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - Training duration: ~4 hours on a Tesla T4 (Google Colab)
- Final validation loss: 0.9266
🚀 How to Use
You can load and run the model directly using Unsloth (recommended for speed and memory efficiency):
# Install Unsloth if you haven't
# pip install unsloth
from unsloth import FastLanguageModel
import torch
model_name = "Matinmollapur01/persian-qwen-3b-chat"
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
def chat(prompt):
messages = [
{"role": "system", "content": "شما یک دستیار مفید و خلاق هستید."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
# Extract assistant's reply
if "<|im_start|>assistant" in response:
assistant_part = response.split("<|im_start|>assistant")[-1]
if "<|im_end|>" in assistant_part:
assistant_part = assistant_part.split("<|im_end|>")[0]
else:
assistant_part = response
return assistant_part.strip()
# Example
print(chat("بلندترین قله ایران کجاست؟"))
Using with Transformers (without Unsloth)
If you prefer plain transformers, you need to merge the LoRA weights first (or load with peft). Here’s a minimal example:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
load_in_4bit=True,
torch_dtype=torch.float16,
)
model = PeftModel.from_pretrained(model, "Matinmollapur01/persian-qwen-3b-chat")
# Now you can generate as usual
💬 Example Conversations
Here are a few interactions with the model:
| User | Assistant |
|---|---|
| سلام خوبی؟ | سلام😊 چطور میتونم کمکت کنم؟ |
| اومدم فقط یکمی حرف بزنیم! | با کمال میل، یک حرف از زبان انگلیسی را بنویسم: "A" (ای). |
| پایتخت ایران کجاست؟ | پایتخت ایران، تهران است. این شهر به عنوان مرکز سیاسی، اقتصادی و فرهنگی کشور شناخته میشود. |
| پایتخت ایالات متحده کجاست؟ | پایتخت ایالات متحده آمریکا واشینگتن دیسی است. (دقت کنید مدل گاهی اشتباه میکند) |
| اینو به فارسی ترجمه کن: How to make people laugh? | چطور میتوان افراد را خنداند؟ |
⚠️ Note: The model may occasionally produce incorrect facts (e.g., it once said Istanbul is the capital of Turkey). This is expected due to the limited training data. For factual questions, consider using it in combination with a retrieval system.
📊 Training Logs
- Loss curve: Validation loss decreased steadily from ~1.1 to ~0.93 over 3 epochs.
- Batch size per device: 2
- Gradient accumulation steps: 4 (effective batch size = 8)
- Optimizer: AdamW 8-bit
- Learning rate: 2e-4 (linear decay)
🛠️ Limitations & Future Work
- Factual accuracy: The model has not been fine-tuned on a large-scale factual dataset, so it may hallucinate or provide outdated information.
- Dialogue history: The current version does not maintain conversation history across turns; it treats each prompt independently. A future version could include multi-turn fine-tuning.
- Larger dataset: To improve robustness, the model should be fine-tuned on a much larger Persian corpus (e.g., combining ParsiNLU, PersianQuAD, and other datasets).
- Longer context: The base model supports up to 8k tokens, but we limited to 2048 during training; extending this could improve long-form generation.
📝 License
This model is licensed under the Apache 2.0 license. Feel free to use, modify, and distribute it for any purpose, commercial or non-commercial.
🙏 Acknowledgements
- Qwen for the excellent base model.
- Unsloth for the efficient fine-tuning framework.
- Hugging Face for hosting datasets and models.
- xmanii for the Persian QA Chat dataset.
📬 Contact
If you have questions or suggestions, feel free to open an issue or reach out via Hugging Face.
Enjoy chatting in Persian! 🇮🇷✨