You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Nanochat Moroccan Instruct 702M

A 702M-parameter nanochat model for Moroccan Darija, instruction-tuned from the Nanochat Moroccan Base 0.7B checkpoint.

Model

  • Parameters: 701,893,188
  • Depth: 18
  • Sequence length: 2048
  • Embedding dim: 1152
  • Attention heads: 9
  • KV heads: 9
  • Window pattern: SSSL
  • Tokenizer vocab size: 32,768

Training Data

Instruction-tuned on:

  • Lyte/Moroccan-Darija-Instruct-573K
  • GemMaroc/TULU-3-50k-darija-english

This model was tuned for Moroccan Darija chat and instruction following.

Checkpoint Format

This repository stores the original nanochat checkpoint format.

Files:

  • model_000225.pt
  • meta_000225.json
  • tokenizer/tokenizer.pkl
  • tokenizer/token_bytes.pt

SFT Training Metrics

  • Total SFT training time: 2.26 minutes
  • Final validation BPB: 0.3743
  • Best validation BPB: 0.3743

Chat Eval

Metric Score
ARC Easy 26.56
ARC Challenge 25.34
MMLU 24.37
GSM8K 0.08
HumanEval 0.61
SpellingBee 0.00

Benchmark Context

For rough context only, here is a comparison with a few small English-oriented instruction models. These models were tuned for broad English/general assistant benchmarks. Nanochat Moroccan Instruct 702M was tuned for Moroccan Darija chat, so this is only reference.

Metric Nanochat-Moroccan-Instruct-0.7B SmolLM2-360M-Instruct Qwen2.5-0.5B-Instruct SmolLM-360M-Instruct
ARC (Average) 25.95 43.7 37.3 38.8
HellaSwag - 52.1 48.0 47.9
PIQA - 70.8 67.2 69.4
MMLU (cloze) 24.4 32.8 31.7 30.6
GSM8K (5-shot) 0.08 7.43 26.8 1.36
IFEval - 41.0 31.6 19.8
MT-Bench - 3.66 4.16 3.37
BBH (3-shot) - 27.3 30.7 24.4

Small note: these generic benchmark numbers are not the point of this model. The real target is Moroccan Darija instruction following.

Limitations

  • This is still a small model and can drift, repeat, or hallucinate.
  • It may produce unsafe, weak, or inconsistent answers.
  • Generic English benchmark scores are secondary and should not be read as the goal of this model.
  • Real quality should be judged with Darija prompting and Darija-focused evaluation.

Disclaimer

This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.

Credits

Built on top of karpathy/nanochat.

Training adaptation, dataset work, and release by Lyte.

Pretraining data:

  • Lyte/darija-pretraining-corpus

Instruction tuning data:

  • Lyte/Moroccan-Darija-Instruct-573K
  • GemMaroc/TULU-3-50k-darija-english

Citation

If you use this model, please cite:

@misc{nanochat-moroccan-instruct-0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Instruct 702M},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Lyte/Nanochat-Moroccan-Instruct-0.7B}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including KandirResearch/Nanochat-Moroccan-Instruct-0.7B-pt-raw