Uploaded model

  • Developed by: Matinmollapur01
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-bnb-4bit

🇮🇷 Persian Qwen 2.5 3B Chat (LoRA)

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on Persian conversational data using QLoRA and Unsloth.
It can generate fluent and creative responses in Persian, making it suitable for chatbots, virtual assistants, and Persian NLP applications.


🧠 Model Details

Fine-tuning Details

  • Dataset: xmanii/Persian_QA_Chat_Format (~5.4k QA pairs in chat format)
  • LoRA rank (r): 16
  • LoRA alpha: 16
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training duration: ~4 hours on a Tesla T4 (Google Colab)
  • Final validation loss: 0.9266

🚀 How to Use

You can load and run the model directly using Unsloth (recommended for speed and memory efficiency):

# Install Unsloth if you haven't
# pip install unsloth

from unsloth import FastLanguageModel
import torch

model_name = "Matinmollapur01/persian-qwen-3b-chat"
max_seq_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

def chat(prompt):
    messages = [
        {"role": "system", "content": "شما یک دستیار مفید و خلاق هستید."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer([text], return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=False)
    # Extract assistant's reply
    if "<|im_start|>assistant" in response:
        assistant_part = response.split("<|im_start|>assistant")[-1]
        if "<|im_end|>" in assistant_part:
            assistant_part = assistant_part.split("<|im_end|>")[0]
    else:
        assistant_part = response
    return assistant_part.strip()

# Example
print(chat("بلندترین قله ایران کجاست؟"))

Using with Transformers (without Unsloth)

If you prefer plain transformers, you need to merge the LoRA weights first (or load with peft). Here’s a minimal example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    load_in_4bit=True,
    torch_dtype=torch.float16,
)
model = PeftModel.from_pretrained(model, "Matinmollapur01/persian-qwen-3b-chat")
# Now you can generate as usual

💬 Example Conversations

Here are a few interactions with the model:

User Assistant
سلام خوبی؟ سلام😊 چطور می‌تونم کمکت کنم؟
اومدم فقط یکمی حرف بزنیم! با کمال میل، یک حرف از زبان انگلیسی را بنویسم: "A" (ای).
پایتخت ایران کجاست؟ پایتخت ایران، تهران است. این شهر به عنوان مرکز سیاسی، اقتصادی و فرهنگی کشور شناخته می‌شود.
پایتخت ایالات متحده کجاست؟ پایتخت ایالات متحده آمریکا واشینگتن دی‌سی است. (دقت کنید مدل گاهی اشتباه می‌کند)
اینو به فارسی ترجمه کن: How to make people laugh? چطور می‌توان افراد را خنداند؟

⚠️ Note: The model may occasionally produce incorrect facts (e.g., it once said Istanbul is the capital of Turkey). This is expected due to the limited training data. For factual questions, consider using it in combination with a retrieval system.


📊 Training Logs

  • Loss curve: Validation loss decreased steadily from ~1.1 to ~0.93 over 3 epochs.
  • Batch size per device: 2
  • Gradient accumulation steps: 4 (effective batch size = 8)
  • Optimizer: AdamW 8-bit
  • Learning rate: 2e-4 (linear decay)

🛠️ Limitations & Future Work

  • Factual accuracy: The model has not been fine-tuned on a large-scale factual dataset, so it may hallucinate or provide outdated information.
  • Dialogue history: The current version does not maintain conversation history across turns; it treats each prompt independently. A future version could include multi-turn fine-tuning.
  • Larger dataset: To improve robustness, the model should be fine-tuned on a much larger Persian corpus (e.g., combining ParsiNLU, PersianQuAD, and other datasets).
  • Longer context: The base model supports up to 8k tokens, but we limited to 2048 during training; extending this could improve long-form generation.

📝 License

This model is licensed under the Apache 2.0 license. Feel free to use, modify, and distribute it for any purpose, commercial or non-commercial.


🙏 Acknowledgements

  • Qwen for the excellent base model.
  • Unsloth for the efficient fine-tuning framework.
  • Hugging Face for hosting datasets and models.
  • xmanii for the Persian QA Chat dataset.

📬 Contact

If you have questions or suggestions, feel free to open an issue or reach out via Hugging Face.


Enjoy chatting in Persian! 🇮🇷✨

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Matinmollapur01/persian-qwen-3b-chat