You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Jais2 Logo

GGUF Weights for Jais-2-70B-Chat

This repository contains GGUF quantized versions of Jais-2-70B-Chat for use with llama.cpp and compatible inference engines.

Available Formats

Format Size Description
BF16.gguf 135 GiB BFloat16 - Full precision
Q8_0.gguf 72 GiB 8-bit quantization
Q6_K.gguf 56 GiB 6-bit K-quant
Q5_K_M.gguf 49 GiB 5-bit K-quant (medium)
Q5_0.gguf 47 GiB 5-bit quantization
Q4_K_M.gguf 42 GiB 4-bit K-quant (medium)
Q4_0.gguf 39 GiB 4-bit quantization
Q3_K_M.gguf 34 GiB 3-bit K-quant (medium)
Q2_K.gguf 26 GiB 2-bit K-quant

Usage with llama.cpp

# Basic inference
cd ~/Jais-2-70B-Chat-GGUF/
llama-cli -m Q4_K_M.gguf -p "ما هي عاصمة الإمارات؟"

Hardware Requirements

Format Minimum VRAM Recommended Setup
BF16 ~140 GB 2x H100 80GB
Q8_0 ~75 GB 1x H100 80GB
Q4_K_M ~45 GB 1x A100 80GB or 2x A100 40GB
Q2_K ~28 GB 1x A100 40GB

Notes

  • Recommended format: Q4_K_M for best quality/size tradeoff
  • K-quant variants (Q3_K_M, Q4_K_M, Q5_K_M): Require -b 8 flag when using GPU offloading

Original Model Card

Jais-2: The Next Generation of Arabic Frontier LLMs

Model Overview

Jais-2-70B-Chat is a high-capacity bilingual Arabic–English language model developed by MBZUAI, Inception, and Cerebras. Jais-2-70B-Chat Model is trained from scratch on Arabic and English data and is powered by a custom Arabic-centric vocabulary, it efficiently captures Modern Standard Arabic, regional dialects, and mixed Arabic–English code-switching. The model is openly available under a Apache 2.0 license and also deployed as a fast, production-ready chat experience running on Cerebras hardware. Visit the Jais-2 Web App.

Key Technical Specifications

  • Model Developers: MBZUAI, Inception, Cerebras.
  • Languages: Arabic (MSA & dialects) and English
  • Architecture: Transformer-based, Decoder-only architecture with multi-head self-attention.
  • Parameters: 70 Billion
  • Context Length: 8,192
  • Vocabulary Size: 150,272
  • Training Infrastructure: Optimized for Cerebras CS-2 and Condor Galaxy clusters
  • Key Design Choices: Rotary Position Embeddings (RoPE), Squared-ReLU activation, custom μP parameterization, and 8:1 filter-to-hidden size ratio.

How to Use the Model

Using Transformers

1. Clone the Jais 2–compatible Transformers fork

# Pull the latest version and ensures you have the most up-to-date features/models and bug fixes.
# Note: could be not as stable as an official PyPI release.
uv pip install git+https://github.com/huggingface/transformers.git

2. Load and Inference on the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_name = "inceptionai/Jais-2-70B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Example Arabic prompt
system_prompt = "أجب باللغة العربية بطريقة رسمية وواضحة."
user_input = "ما هي عاصمة الإمارات؟"

# Apply chat template (always)
chat_text = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input}
    ],
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize and generate
inputs = tokenizer(chat_text, return_tensors="pt").to(model.device)
inputs.pop("token_type_ids", None)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)

# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
#عاصمة الإمارات العربية المتحدة هي أبوظبي.

Using vLLM

1. Clone the Jais 2–compatible vLLM fork

# Pull the latest version and ensures you have the most up-to-date features/models and bug fixes.
# Note: could be not as stable as an official PyPI release.
uv pip install git+https://github.com/vllm-project/vllm.git

2. Load and Inference on the Model

from vllm import LLM, SamplingParams

# Load model and tokenizer
model_name = "inceptionai/Jais-2-70B-Chat"
llm = LLM(model=model_name, tensor_parallel_size=1)
tokenizer = llm.get_tokenizer()

# Example Arabic prompt
system_prompt = "أجب باللغة العربية بطريقة رسمية وواضحة."
user_input = "ما هي عاصمة الإمارات؟"

# Apply chat template (always)
chat_text = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_input}
    ],
    tokenize=False,
    add_generation_prompt=True
)

# Run generation
sampling_params = SamplingParams(max_tokens=8192, temperature=0)
outputs = llm.generate([chat_text], sampling_params)

#Print output
print(outputs[0].outputs[0].text)
#عاصمة الإمارات العربية المتحدة هي أبوظبي.

Or serve through command line (CLI)

vllm serve inceptionai/Jais-2-70B-Chat \
    --served-model-name inceptionai/Jais-2-70B-Chat-Local --dtype bfloat16 \
    --tensor-parallel-size 2 --max-model-len 8192 --max-num-seqs 256 \
    --host 0.0.0.0 --port 8042 --api-key "Optional"

Evaluation

Performance Overview

We evaluate Jais-2-70B across two key benchmarks that capture both instruction following and generative Arabic ability: IFEval (English and Arabic) and AraGen-12-24 (3C3H).

IFEval Results (Strict 0-shot)

Model Name En-Strict-Prompt-lvl En-Strict-Instruction-lvl Ar-Strict-Prompt-lvl Ar-Strict-Instruction-lvl
Qwen2.5-72B-Instruct 83.53 88.51 67.33 74.05
Llama-3.3-70B-Instruct 88.20 92.10 58.17 63.13
Jais-2-70B (ours) 70.78 78.93 66.58 74.53

AraGen-12-24 (3C3H) Results

Model Name 3C3H Score (%) Correctness Completeness Conciseness Helpfulness Honesty Harmlessness
Qwen2.5-72B-Instruct 62.58 71.92 71.80 19.06 69.86 70.94 71.92
Llama-3.3-70B-Instruct 61.29 68.58 65.11 34.50 63.50 67.47 68.58
Jais-2-70B (ours) 70.71 80.53 79.09 25.48 78.43 80.23 80.53

Overall, our results show that:

  • Jais-2-70B delivers competitive Arabic and English instruction-following performance across IFEval metrics.
  • Jais-2-70B achieves the highest scores across nearly all AraGen metrics, outperforming Qwen2.5-72B and Llama-3.3-70B on Arabic generative tasks.

Intended Use

Target Audiences

  • Academics: Researchers focusing on Arabic NLP, multilingual modeling, or cultural alignment
  • Businesses: Companies targeting Arabic-speaking markets
  • Developers and ML Engineers: Integrating Arabic language capabilities into applications and workflows

Appropriate Use Cases

  • Research:

    • Natural language understanding and generation tasks
    • Conducting interpretability or cross-lingual alignment analyses
    • Investigating Arabic linguistic or cultural patterns
  • Commercial Use:

    • Building chat assistants for Arabic-speaking audiences
    • Performing sentiment and market analysis in regional contexts
    • Summarizing or processing bilingual Arabic–English documents
    • Creating culturally resonant Arabic marketing and entertainment content for regional audiences

Inappropriate Use Cases

  • Harmful or Malicious Use:

    • Producing hate speech, extremist content, or discriminatory language
    • Creating or spreading misinformation or deceptive content
    • Engaging in or promoting illegal activities
  • Sensitive Information:

    • Handling or generating personal, confidential, or sensitive information
    • Attempting to infer, reconstruct, or guess sensitive information about individuals or organizations
  • Language Limitations:

    • Applications requiring strong performance outside Arabic or English languages
  • High-Stakes Decisions:

    • Making medical, legal, financial, or safety-critical decisions without human oversight

Citation

If you find our work helpful, please give us a cite.

@techreport{jais2_2025,
  title        = {Jais 2: {A} Family of {A}rabic-Centric Open Large Language Models},
  author       = {
        Anwar, Mohamed and
    Freihat, Abdelhakim and
    Ibrahim, George and
    Awad, Mostafa and
    Sadallah, Abdelrahman Atef Mohamed Ali and
    Gosal, Gurpreet and
    Ramakrishnan, Gokul and
    Chandran, Sarath and
    Mishra, Biswajit and
    Joshi, Rituraj and
    Frikha, Ahmed and
    Goffinet, Etienne and
    Maiti, Abhishek and
    El Filali, Ali and
    Al Barri, Sarah and
    Ghosh, Samujjwal and
    Pal, Rahul and
    Mullah, Parvez and
    Shukla, Awantika and
    Siddiki, Sajid and
    Kamboj, Samta and
    Pandit, Onkar and
    Sahu, Sunil and
    El Badawy, Abelrahman and
    Mohamed, Amr and
    Chamma, Ahmad and
    Dufraisse, Evan and
    Bounhar, Abdelaziz and
    Bouch, Dani and
    Abdine, Hadi and
    Shang, Guokan and
    Koto, Fajri and
    Wang, Yuxia and
    Xie, Zhuohan and
    Mekky, Ali and
    Elbadry, Rania Hossam Elmohamady and
    Ahmad, Sarfraz and
    Ahsan, Momina and
    El-Herraoui, Omar Emad Mohamed and
    Orel, Daniil and
    Iqbal, Hasan and
    Elzeky, Kareem Mohamed Naguib Abdelmohsen Fahmy and
    Abassy, Mervat and
    Ali, Kareem and
    Eletter, Saadeldine and
    Atif, Farah and
    Mukhituly, Nurdaulet and
    Li, Haonan and
    Han, Xudong and
    Singh, Aaryamonvikram and
    Quraishi, Zain and
    Sengupta, Neha and
    Murray, Larry and
    Sheinin, Avraham and
    Hestness, Joel and
    Vassilieva, Natalia and
    Ren, Hector and
    Liu, Zhengzhong and
    Vazirgiannis, Michalis and
    Nakov, Preslav
  },
  institution  = {IFM},
  type         = {Technical Report},
  year         = {2025},
  month        = dec,
  day          = {09},
}
Downloads last month
5
GGUF
Model size
72B params
Architecture
jais2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inceptionai/Jais-2-70B-Chat-GGUF

Quantized
(1)
this model

Collection including inceptionai/Jais-2-70B-Chat-GGUF