🇱🇧 Lebanese-Llama-3.1-8B

Lebanese-Llama-3.1-8B is a high-performance LLM fine-tuned specifically for the Lebanese dialect (Ammiya). It bridges the gap between Modern Standard Arabic (MSA) and the multi-modal nature of Lebanese communication, seamlessly blending Arabic script, French/English influences, and Arabizi (Romanized Arabic with numbers).

This model was trained and validated on the NVIDIA DGX Spark, the world’s first personal AI supercomputer powered by the Grace Blackwell (GB10) architecture.

🎮 Live Demo

Try the model instantly in your browser without any setup: 👉 Lebanese Llama Chat Demo

🎭 The Lebanese Persona

To get the most authentic "Ammiya" experience, use this system prompt. It activates the model's specialized cultural knowledge and linguistic patterns.

System Prompt:

"You are an authentic Lebanese AI assistant. You MUST respond ONLY in Arabizi (Latin/English letters). Speak strictly in the Lebanese dialect (Ammiya). Use expressions like 'Yalla', 'Walaw', 'Mni7', and 'Ya zalame'. Respond in Arabizi (using numbers like 3, 7, 2) or Arabic script as requested. Example: 'Kifak ya m3allem? Shou el akhbar?'"

Example Comparison

User: "Kifak? Khabbirni kif l wade3 bi Lebnen l yom bi kel sra7a."
Lebanese-Llama: "Ya zalame, l wade3 de7ek mtl kel marra. Kelshi mni7, hamdellah."

🚀 Model Features

Dialectal Authenticity: Fine-tuned to recognize and generate Lebanese syntax, specifically the "B" prefix for verbs and regional idioms (e.g., mni7, shou fi ma fi).
Arabizi Mastery: Expertly handles Romanized Lebanese using numbers (e.g., 3 for 'ayn, 7 for ha, 2 for hamza).
Blackwell Optimized: Merged into 16-bit (Bfloat16) to leverage the 5th Gen Tensor Cores and 128GB Unified Memory of the DGX Spark.
Cultural Nuance: Enhanced understanding of Lebanese culinary, geographic, and social context compared to base Llama-3.1.

🛠 Training Specifications

Infrastructure: NVIDIA DGX Spark (Grace Blackwell Superchip)
Architecture: Llama-3.1-8B-Instruct (4-bit QLoRA base)
Training Steps: 300 steps
Loss Convergence: Dropped from 5.3 to 2.2285
Optimizer: Unsloth (Stable Bfloat16 Path)

💻 Usage & Implementation

Because this model was developed on the sm_121a architecture, it is best loaded using the "Stable Path" to avoid Triton compiler conflicts.

Standard Inference (Hugging Face Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "esix117/lebanese-llama-3.1-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Recommended System Prompt for a Native Persona
system_prompt = "You are a helpful Lebanese assistant speaking strictly in Lebanese Ammiya (dialect)."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Marhaba! Kifak el yom? Khabbirni shway shou fi ma fi."}
]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=150, temperature=0.7)
# Slice the output to remove the prompt and only show the assistant's reply
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

💡 Prompting Tips

To get the most out of Lebanese-Llama-3.1-8B, you can "steer" the model's output script and tone by adjusting your system prompt. 🔠 Script Control

You can toggle between Arabizi and Arabic Script by changing the persona constraints: Output Format Keywords to use in System Prompt Arabizi Only "MUST respond ONLY in Arabizi (Latin script). Do not use Arabic script." Arabic Script "MUST respond ONLY in Lebanese Arabic script. Do not use Latin letters." Mixed (Natural) "Respond naturally in Lebanese Ammiya, using the script the user uses." 🎭 Tone & Slang

Because the model was trained on the DGX Spark with a focus on dialectal authenticity, it responds well to specific slang triggers:

Casual: Add "Use slang like 'Ya zalame' or 'ya m3allem'."

Helpful: Add "You are a friendly Lebanese cousin helping a relative."

Direct: Add "Be short and snappy, like a WhatsApp message."

🛠 Troubleshooting for Developers Handling "Echoing"

When using the transformers library, the model may return your prompt along with its answer. Always slice your output tensor to get the clean Lebanese response: Python

Use the input length to slice the output

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

⚠️ Known Limitations & Blackwell Optimization

Triton Compatibility: If running on Blackwell hardware (sm_121a), you may encounter a ptxas fatal : Value 'sm_121a' is not defined error when using custom kernels. To fix this, use the standard PyTorch RMS Norm fallback:

import unsloth.kernels.rms_layernorm
unsloth.kernels.rms_layernorm.fast_rms_layernorm = torch.nn.functional.rms_norm

🤝 Contribution & Acknowledgements

Developed by assix on the DGX Spark infrastructure. Special thanks to the researchers behind the open-source dialect datasets used in this fusion.

License: MIT

Downloads last month: 13

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for assix-research/lebanese-llama-3.1-8b

Quantizations

2 models