Nanochat Moroccan Base 702M
A 702M-parameter nanochat base model pretrained for Moroccan Darija.
This is a base model, not an instruction-tuned assistant.
Model
- Parameters: 701,893,188
- Depth: 18
- Sequence length: 2048
- Embedding dim: 1152
- Attention heads: 9
- KV heads: 9
- Window pattern:
SSSL - Tokenizer vocab size: 32,768
Training Data
Pretrained on Lyte/darija-pretraining-corpus with these subsets:
arabic_rawbilingualpure
The goal was Moroccan Darija pretraining, not English benchmark chasing.
Checkpoint Format
This repository stores the original nanochat checkpoint format.
Files:
model_003248.ptmeta_003248.jsontokenizer/tokenizer.pkltokenizer/token_bytes.pt
Final Training Metrics
- Total training time: 51.83 minutes
- Final validation BPB: 0.744182
- Best validation BPB: 0.743422
- Base eval train BPB: 0.598625
- Base eval val BPB: 0.742597
- CORE metric: 0.0593
Base Eval
| Metric | Score |
|---|---|
| HellaSwag | 29.56 |
| ARC Easy | 29.17 |
| ARC Challenge | 21.33 |
| ARC Average | 25.25 |
| PIQA | 53.65 |
| CommonsenseQA | 33.25 |
| Winogrande | 48.70 |
| OpenBookQA | 22.40 |
| BoolQ | 56.02 |
| COPA | 59.00 |
Benchmark Context
For rough context only, here is a comparison with a few small English-oriented base models. These models were trained for broad general benchmarks. Nanochat Moroccan Base 0.7B was trained for Moroccan Darija pretraining, so this comparison should be read as reference, not as a direct leaderboard claim.
| Metric | Nanochat-Moroccan-Base-0.7B | SmolLM2-360M | Qwen2.5-0.5B | SmolLM-360M |
|---|---|---|---|---|
| HellaSwag | 29.6 | 54.5 | 51.2 | 51.8 |
| ARC (Average) | 25.3 | 53.0 | 45.4 | 50.1 |
| PIQA | 53.7 | 71.7 | 69.9 | 71.6 |
| MMLU (cloze) | - | 35.8 | 33.7 | 34.4 |
| CommonsenseQA | 33.3 | 38.0 | 31.6 | 35.3 |
| TriviaQA | - | 16.9 | 4.3 | 9.1 |
| Winogrande | 48.7 | 52.5 | 54.1 | 52.8 |
| OpenBookQA | 22.4 | 37.4 | 37.4 | 37.2 |
| GSM8K (5-shot) | - | 3.2 | 33.4 | 1.6 |
Small note: this model was not built to score well on English benchmarks. Its target was Moroccan Darija base pretraining.
Limitations
- This is a base pretrained model, not an instruction-tuned assistant.
- It can generate inaccurate, repetitive, biased, or unsafe text.
- English benchmark scores are secondary and should not be read as the goal of this model.
- Real conversational quality should be judged after SFT.
Disclaimer
This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.
Credits
Built on top of karpathy/nanochat.
Training adaptation, dataset work, and release by Lyte.
Pretraining data: Lyte/darija-pretraining-corpus
Citation
If you use this model, please cite:
@misc{nanochat-moroccan-base-0.7B,
author = {Lyte},
title = {Nanochat Moroccan Base 0.7B},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Base-0.7B}}
}