YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Nice ๐ Adding a clear README.md to your Hugging Face repo will make Ibn-Al-Nafs stand out and help others understand your model quickly. Hereโs a polished draft you can use (you can copy it into README.md):
Ibn-Al-Nafs ๐๐
Ibn-Al-Nafs is a retrieval-augmented fine-tuned model for Arabic cultural multiple-choice question answering (MCQs), developed as part of the PalmX 2025 Shared Task. It adapts NileChat-3B using parameter-efficient fine-tuning (PEFT) and integrates Gemini-based retrieval to provide culturally grounded evidence during training.
๐ Leaderboard Result: Ranked 6th place on PalmX 2025 Subtask 1 with 67.6% accuracy, outperforming the NileChat-3B baseline by +3%.
๐ Associated Paper: ISL-NLP at PalmX 2025: Retrieval-Augmented Fine-Tuning for Arabic Cultural Question Answering (Accepted, 2025).
Model Details
Base Model: NileChat-3B
Techniques:
- Retrieval-Augmented Training with Gemini API
- Parameter-Efficient Fine-Tuning (PEFT): updating only
q_proj,v_proj, andgate_projlayers (68.2% reduction in trainable params) - Instruction-style fine-tuning in Modern Standard Arabic
Results
On the PalmX 2025 Subtask 1 (Arabic Cultural MCQs) development set, Ibn-Al-Nafs outperforms both the NileChat-3B baseline and general-purpose Arabic LLMs:
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Qwen2.5-1.5B | 64.73 | 63.89 | 63.59 | 63.60 |
| Qwen1.5-1.8B | 63.24 | 60.88 | 59.15 | 59.80 |
| NileChat-3B | 71.74 | 70.00 | 69.92 | 70.00 |
| Ibn-Al-Nafs | 73.81 | 73.88 | 73.54 | 73.60 |
๐ Ranked 6th place on the official PalmX 2025 leaderboard, achieving 67.6% accuracy on the blind test set.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MohamedGomaa30/Ibn-Al-Nafs"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example Arabic cultural MCQ
question = "ู
ู ูู ุงูุทุจูุจ ุงูุดููุฑ ุงูุฐู ูุชุจ ูุชุงุจ ุงููุงููู ูู ุงูุทุจุ"
options = ["ุงุจู ุฑุดุฏ", "ุงุจู ุณููุง", "ุงููุงุฑุงุจู", "ุงูุฒูุฑุงูู"]
inputs = tokenizer(f"ุงูุณุคุงู: {question}\nุงูุงุฎุชูุงุฑุงุช: {', '.join(options)}\nุงูุฅุฌุงุจุฉ:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Acknowledgments
- Developed at Intelligent Systems Lab (ISL-NLP), Arab Academy for Science & Technology.
- Thanks to the PalmX 2025 organizers and Kaggle for resources.
- Downloads last month
- 3