Nanochat Moroccan Instruct 702M
A 702M-parameter nanochat model for Moroccan Darija, instruction-tuned from the Nanochat Moroccan Base 0.7B checkpoint.
Model
- Parameters: 701,893,188
- Depth: 18
- Sequence length: 2048
- Embedding dim: 1152
- Attention heads: 9
- KV heads: 9
- Window pattern:
SSSL - Tokenizer vocab size: 32,768
Training Data
Instruction-tuned on:
Lyte/Moroccan-Darija-Instruct-573KGemMaroc/TULU-3-50k-darija-english
This model was tuned for Moroccan Darija chat and instruction following.
Checkpoint Format
This repository stores the original nanochat checkpoint format.
Files:
model_000225.ptmeta_000225.jsontokenizer/tokenizer.pkltokenizer/token_bytes.pt
SFT Training Metrics
- Total SFT training time: 2.26 minutes
- Final validation BPB: 0.3743
- Best validation BPB: 0.3743
Chat Eval
| Metric | Score |
|---|---|
| ARC Easy | 26.56 |
| ARC Challenge | 25.34 |
| MMLU | 24.37 |
| GSM8K | 0.08 |
| HumanEval | 0.61 |
| SpellingBee | 0.00 |
Benchmark Context
For rough context only, here is a comparison with a few small English-oriented instruction models. These models were tuned for broad English/general assistant benchmarks. Nanochat Moroccan Instruct 702M was tuned for Moroccan Darija chat, so this is only reference.
| Metric | Nanochat-Moroccan-Instruct-0.7B | SmolLM2-360M-Instruct | Qwen2.5-0.5B-Instruct | SmolLM-360M-Instruct |
|---|---|---|---|---|
| ARC (Average) | 25.95 | 43.7 | 37.3 | 38.8 |
| HellaSwag | - | 52.1 | 48.0 | 47.9 |
| PIQA | - | 70.8 | 67.2 | 69.4 |
| MMLU (cloze) | 24.4 | 32.8 | 31.7 | 30.6 |
| GSM8K (5-shot) | 0.08 | 7.43 | 26.8 | 1.36 |
| IFEval | - | 41.0 | 31.6 | 19.8 |
| MT-Bench | - | 3.66 | 4.16 | 3.37 |
| BBH (3-shot) | - | 27.3 | 30.7 | 24.4 |
Small note: these generic benchmark numbers are not the point of this model. The real target is Moroccan Darija instruction following.
Limitations
- This is still a small model and can drift, repeat, or hallucinate.
- It may produce unsafe, weak, or inconsistent answers.
- Generic English benchmark scores are secondary and should not be read as the goal of this model.
- Real quality should be judged with Darija prompting and Darija-focused evaluation.
Disclaimer
This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.
Credits
Built on top of karpathy/nanochat.
Training adaptation, dataset work, and release by Lyte.
Pretraining data:
Lyte/darija-pretraining-corpus
Instruction tuning data:
Lyte/Moroccan-Darija-Instruct-573KGemMaroc/TULU-3-50k-darija-english
Citation
If you use this model, please cite:
@misc{nanochat-moroccan-instruct-0.7B,
author = {Lyte},
title = {Nanochat Moroccan Instruct 702M},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Lyte/Nanochat-Moroccan-Instruct-0.7B}}
}