You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nanochat Moroccan Instruct 702M

A 702M-parameter nanochat model for Moroccan Darija, instruction-tuned from the Nanochat Moroccan Base 0.7B checkpoint.

Model

Parameters: 701,893,188
Depth: 18
Sequence length: 2048
Embedding dim: 1152
Attention heads: 9
KV heads: 9
Window pattern: SSSL
Tokenizer vocab size: 32,768

Training Data

Instruction-tuned on:

Lyte/Moroccan-Darija-Instruct-573K
GemMaroc/TULU-3-50k-darija-english

This model was tuned for Moroccan Darija chat and instruction following.

Checkpoint Format

This repository stores the original nanochat checkpoint format.

Files:

model_000225.pt
meta_000225.json
tokenizer/tokenizer.pkl
tokenizer/token_bytes.pt

SFT Training Metrics

Total SFT training time: 2.26 minutes
Final validation BPB: 0.3743
Best validation BPB: 0.3743

Chat Eval

Metric	Score
ARC Easy	26.56
ARC Challenge	25.34
MMLU	24.37
GSM8K	0.08
HumanEval	0.61
SpellingBee	0.00

Benchmark Context

For rough context only, here is a comparison with a few small English-oriented instruction models. These models were tuned for broad English/general assistant benchmarks. Nanochat Moroccan Instruct 702M was tuned for Moroccan Darija chat, so this is only reference.

Metric	Nanochat-Moroccan-Instruct-0.7B	SmolLM2-360M-Instruct	Qwen2.5-0.5B-Instruct	SmolLM-360M-Instruct
ARC (Average)	25.95	43.7	37.3	38.8
HellaSwag	-	52.1	48.0	47.9
PIQA	-	70.8	67.2	69.4
MMLU (cloze)	24.4	32.8	31.7	30.6
GSM8K (5-shot)	0.08	7.43	26.8	1.36
IFEval	-	41.0	31.6	19.8
MT-Bench	-	3.66	4.16	3.37
BBH (3-shot)	-	27.3	30.7	24.4

Small note: these generic benchmark numbers are not the point of this model. The real target is Moroccan Darija instruction following.

Limitations

This is still a small model and can drift, repeat, or hallucinate.
It may produce unsafe, weak, or inconsistent answers.
Generic English benchmark scores are secondary and should not be read as the goal of this model.
Real quality should be judged with Darija prompting and Darija-focused evaluation.

Disclaimer

This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.

Credits

Built on top of karpathy/nanochat.

Training adaptation, dataset work, and release by Lyte.

Pretraining data:

Lyte/darija-pretraining-corpus

Instruction tuning data:

Lyte/Moroccan-Darija-Instruct-573K
GemMaroc/TULU-3-50k-darija-english

Citation

If you use this model, please cite:

@misc{nanochat-moroccan-instruct-0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Instruct 702M},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Lyte/Nanochat-Moroccan-Instruct-0.7B}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including KandirResearch/Nanochat-Moroccan-Instruct-0.7B-pt-raw

Nanochat — The First Moroccan Darija Language Model Family

Collection

Nanochat Moroccan Model Family: models built for Moroccan Darija. Includes the Base model, the raw Instruct checkpoint, and the HF-compatible Instruct • 8 items • Updated Mar 9 • 2