RightNow-Arabic-0.5B-Turbo
The smallest open Arabic-specialized decoder LLM
518M parameters | 398 MB on disk (q4_k_m) | 635 tok/s on H100
Built by RightNow AI
What is this?
RightNow-Arabic-0.5B-Turbo is a 518M-parameter Arabic-specialized language model built on top of Qwen2.5-0.5B via vocabulary injection, continued pretraining, supervised fine-tuning, and direct preference optimization. It is the smallest open Arabic-specialized decoder LLM with publicly available weights.
The model targets edge deployment: phones, laptops, embedded devices, and browsers. Quantized to q4_k_m it fits in 398 MB and generates 635 tokens/s at batch size 1.
Key Features
- 27,032 new Arabic tokens added via mean-subtoken initialization, cutting Arabic tokenizer fertility by 17.3% (2.18 to 1.80 tokens/word)
- 504M Arabic pretraining tokens (Arabic Wikipedia) on 8xH100 SXM5 with FSDP + FlashAttention varlen + Liger fused kernels
- 129,116 Arabic instruction pairs for SFT with response-only loss masking
- 6,750 Arabic preference pairs for DPO
- Weight soup merging (DPO 50%, SFT 25%, Pretrain 25%) for optimal accuracy
- 4 GGUF quantizations for instant edge deployment
Benchmarks
Evaluated with lm-evaluation-harness v0.4.11, apply_chat_template=True, limit=200, acc_norm preferred.
Head-to-head comparison
| Model | Params | COPA-ar | HellaSwag-ar | ArabicMMLU | Mean |
|---|---|---|---|---|---|
| RightNow-Arabic-0.5B-Turbo (ours) | 518M | 58.4% | 26.0% | 23.2% | 35.9% |
| Qwen2.5-0.5B-Instruct | 494M | 53.9% | 22.5% | 26.0% | 34.1% |
| Falcon-H1-0.5B-Instruct | 524M | 44.9% | 23.0% | 24.2% | 30.7% |
| Falcon-H1-1.5B-Instruct | 1.5B | 58.4% | 27.5% | 32.7% | 39.5% |
| AceGPT-7B-chat | 7B | 69.7% | 27.0% | 35.0% | 43.9% |
| ALLaM-7B-Instruct | 7B | 68.5% | 29.0% | 52.2% | 49.9% |
| SILMA-9B-Instruct | 9B | 69.7% | 38.0% | 52.9% | 53.5% |
Among 0.5B models: best on COPA-ar (+4.5 vs Qwen), best on HellaSwag-ar (+3.5 vs Qwen), best mean (+1.8 vs Qwen, +5.2 vs Falcon).
Ties Falcon-H1-1.5B on COPA-ar (both 58.4%) at one-third the parameters.
Recovers 67% of SILMA-9B mean accuracy at 5.8% of the parameters.
Available Formats
| Format | Size | Speed (tok/s, bs=1, H100) | Use case |
|---|---|---|---|
| bf16 | 1.04 GB | 82 (HF generate) | Fine-tuning, research |
| int8 | 664 MB | -- | Reduced memory inference |
| GGUF f16 | 988 MB | 582 | Maximum quality |
| GGUF q8_0 | 525 MB | 646 | Best speed |
| GGUF q5_k_m | 419 MB | 634 | Balanced |
| GGUF q4_k_m | 398 MB | 635 | Smallest footprint |
Quick Start
With Transformers (bf16)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "RightNowAI/RightNow-Arabic-0.5B-Turbo"
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="bf16")
model = AutoModelForCausalLM.from_pretrained(
model_id, subfolder="bf16",
torch_dtype="bfloat16", device_map="auto"
)
messages = [
{"role": "system", "content": "ุฃูุช ู
ุณุงุนุฏ ุฐูู ูุฌูุจ ุจุงููุบุฉ ุงูุนุฑุจูุฉ ุงููุตุญู"},
{"role": "user", "content": "ู
ุง ูู ุนุงุตู
ุฉ ุงูู
ู
ููุฉ ุงูุนุฑุจูุฉ ุงูุณุนูุฏูุฉุ"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
With llama.cpp (GGUF)
# Download the q8_0 quantization (best speed)
huggingface-cli download RightNowAI/RightNow-Arabic-0.5B-Turbo \
gguf/RightNow-Arabic-0.5B-Turbo-q8_0.gguf --local-dir .
# Run inference
./llama-cli -m RightNow-Arabic-0.5B-Turbo-q8_0.gguf \
-p "ู
ุง ูู ุฃูุจุฑ ู
ุฏููุฉ ูู ู
ุตุฑุ" \
-n 128 --temp 0.7
Training Pipeline
Qwen2.5-0.5B (494M, 151,665 vocab)
|
v
Tokenizer Surgery (+27,032 Arabic tokens -> 178,697 vocab)
- SentencePiece unigram 32k on 12.5 GB Arabic corpus
- Mean-subtoken embedding initialization
- Fertility: 2.18 -> 1.80 tok/word (-17.3%)
|
v
Continued Pretraining (504M arwiki tokens)
- 2,500 steps, 8xH100 SXM5
- FSDP _HYBRID_SHARD_ZERO2 + FlashAttention varlen + Liger
- Loss: 14.21 -> 1.69 | Wall time: 6h 57m
|
v
Supervised Fine-Tuning (129,116 instructions)
- 5 datasets: evol-instruct-arabic, alpaca-gpt4-arabic,
sharegpt-arabic, CIDAR, aya_dataset
- Response-only loss masking (72.1% of tokens carry loss)
- 5 epochs, 418 steps | Wall time: 12m
|
v
Direct Preference Optimization (6,750 pairs)
- argilla-dpo-mix-7k-arabic
- 2 epochs, 844 steps | Wall time: 34m
|
v
Weight Soup Merging
- Linear(DPO 0.5, SFT 0.25, Pretrain 0.25)
- +0.44 points mean accuracy over DPO alone
|
v
Export: bf16, int8, GGUF {f16, q8_0, q5_k_m, q4_k_m}
Training Data
| Dataset | Examples/Tokens | Use |
|---|---|---|
| Arabic Wikipedia (wikimedia/wikipedia 20231101.ar) | 504M tokens | Continued pretraining |
| FreedomIntelligence/evol-instruct-arabic | 59,022 | SFT |
| FreedomIntelligence/alpaca-gpt4-arabic | 49,969 | SFT |
| FreedomIntelligence/sharegpt-arabic | 5,231 | SFT |
| arbml/CIDAR | 10,000 | SFT |
| CohereForAI/aya_dataset (Arabic) | 4,947 | SFT |
| 2A2I/argilla-dpo-mix-7k-arabic | 6,750 | DPO |
Limitations
- Knowledge ceiling: At 518M parameters, ArabicMMLU-style knowledge tasks lag 7B+ models by 12-30 points. This is a parameter-count limit, not a training limit.
- MSA only: Trained on Wikipedia (Modern Standard Arabic). Dialects (Egyptian, Gulf, Levantine) get MSA responses.
- 504M pretraining tokens: Below Chinchilla-optimal ratio. More Arabic data would improve knowledge tasks.
- DPO was weak: 6,750 machine-translated preference pairs provided minimal signal at 0.5B scale. The weight soup merge was more impactful.
- GGUF tile alignment: q4_k_m and q5_k_m fall back to higher-bit quantization for 144/290 tensors due to the expanded vocabulary not aligning with k-quant tile sizes.
Hardware
All training ran on a single Nebius gpu-h100-sxm node:
- 8x NVIDIA H100 80 GB SXM5 HBM3, NVLink4
- 128 vCPUs, 1.5 TiB RAM
- CUDA 13.0, PyTorch 2.11, flash-attn 2.8.3, transformers 5.5.0
Citation
@article{jaber2025rightnow,
title={RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment},
author={Jaber, Jaber and Jaber, Osama},
year={2025},
url={https://huggingface.co/RightNowAI/RightNow-Arabic-0.5B-Turbo}
}
License
Apache 2.0 (same as the base Qwen2.5-0.5B model).
Built by RightNow AI
- Downloads last month
- 140
4-bit
5-bit
8-bit
16-bit
Model tree for RightNowAI/RightNow-Arabic-0.5B-Turbo
Base model
Qwen/Qwen2.5-0.5BEvaluation results
- Accuracy (norm) on copa_arself-reported58.400
- Accuracy (norm) on arabic_mt_hellaswagself-reported26.000
- Accuracy on arabic_leaderboard_arabic_mmluself-reported23.200