🇱🇧 Lebanese-Llama-3.1-8B
Lebanese-Llama-3.1-8B is a high-performance LLM fine-tuned specifically for the Lebanese dialect (Ammiya). It bridges the gap between Modern Standard Arabic (MSA) and the multi-modal nature of Lebanese communication, seamlessly blending Arabic script, French/English influences, and Arabizi (Romanized Arabic with numbers).
This model was trained and validated on the NVIDIA DGX Spark, the world’s first personal AI supercomputer powered by the Grace Blackwell (GB10) architecture.
🎮 Live Demo
Try the model instantly in your browser without any setup: 👉 Lebanese Llama Chat Demo
🎭 The Lebanese Persona
To get the most authentic "Ammiya" experience, use this system prompt. It activates the model's specialized cultural knowledge and linguistic patterns.
System Prompt:
"You are an authentic Lebanese AI assistant. You MUST respond ONLY in Arabizi (Latin/English letters). Speak strictly in the Lebanese dialect (Ammiya). Use expressions like 'Yalla', 'Walaw', 'Mni7', and 'Ya zalame'. Respond in Arabizi (using numbers like 3, 7, 2) or Arabic script as requested. Example: 'Kifak ya m3allem? Shou el akhbar?'"
Example Comparison
- User: "Kifak? Khabbirni kif l wade3 bi Lebnen l yom bi kel sra7a."
- Lebanese-Llama: "Ya zalame, l wade3 de7ek mtl kel marra. Kelshi mni7, hamdellah."
🚀 Model Features
- Dialectal Authenticity: Fine-tuned to recognize and generate Lebanese syntax, specifically the "B" prefix for verbs and regional idioms (e.g., mni7, shou fi ma fi).
- Arabizi Mastery: Expertly handles Romanized Lebanese using numbers (e.g.,
3for 'ayn,7for ha,2for hamza). - Blackwell Optimized: Merged into 16-bit (Bfloat16) to leverage the 5th Gen Tensor Cores and 128GB Unified Memory of the DGX Spark.
- Cultural Nuance: Enhanced understanding of Lebanese culinary, geographic, and social context compared to base Llama-3.1.
🛠 Training Specifications
- Infrastructure: NVIDIA DGX Spark (Grace Blackwell Superchip)
- Architecture: Llama-3.1-8B-Instruct (4-bit QLoRA base)
- Training Steps: 300 steps
- Loss Convergence: Dropped from 5.3 to 2.2285
- Optimizer: Unsloth (Stable Bfloat16 Path)
💻 Usage & Implementation
Because this model was developed on the sm_121a architecture, it is best loaded using the "Stable Path" to avoid Triton compiler conflicts.
Standard Inference (Hugging Face Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "esix117/lebanese-llama-3.1-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Recommended System Prompt for a Native Persona
system_prompt = "You are a helpful Lebanese assistant speaking strictly in Lebanese Ammiya (dialect)."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Marhaba! Kifak el yom? Khabbirni shway shou fi ma fi."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=150, temperature=0.7)
# Slice the output to remove the prompt and only show the assistant's reply
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
💡 Prompting Tips
To get the most out of Lebanese-Llama-3.1-8B, you can "steer" the model's output script and tone by adjusting your system prompt. 🔠 Script Control
You can toggle between Arabizi and Arabic Script by changing the persona constraints: Output Format Keywords to use in System Prompt Arabizi Only "MUST respond ONLY in Arabizi (Latin script). Do not use Arabic script." Arabic Script "MUST respond ONLY in Lebanese Arabic script. Do not use Latin letters." Mixed (Natural) "Respond naturally in Lebanese Ammiya, using the script the user uses." 🎭 Tone & Slang
Because the model was trained on the DGX Spark with a focus on dialectal authenticity, it responds well to specific slang triggers:
Casual: Add "Use slang like 'Ya zalame' or 'ya m3allem'."
Helpful: Add "You are a friendly Lebanese cousin helping a relative."
Direct: Add "Be short and snappy, like a WhatsApp message."
🛠 Troubleshooting for Developers Handling "Echoing"
When using the transformers library, the model may return your prompt along with its answer. Always slice your output tensor to get the clean Lebanese response: Python
Use the input length to slice the output
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
⚠️ Known Limitations & Blackwell Optimization
Triton Compatibility: If running on Blackwell hardware (sm_121a), you may encounter a ptxas fatal : Value 'sm_121a' is not defined error when using custom kernels. To fix this, use the standard PyTorch RMS Norm fallback:
import unsloth.kernels.rms_layernorm
unsloth.kernels.rms_layernorm.fast_rms_layernorm = torch.nn.functional.rms_norm
🤝 Contribution & Acknowledgements
Developed by assix on the DGX Spark infrastructure. Special thanks to the researchers behind the open-source dialect datasets used in this fusion.
License: MIT
- Downloads last month
- 13