You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nanochat Moroccan Base 702M

A 702M-parameter nanochat base model pretrained for Moroccan Darija.

This is a base model, not an instruction-tuned assistant.

Model

Parameters: 701,893,188
Depth: 18
Sequence length: 2048
Embedding dim: 1152
Attention heads: 9
KV heads: 9
Window pattern: SSSL
Tokenizer vocab size: 32,768

Training Data

Pretrained on Lyte/darija-pretraining-corpus with these subsets:

arabic_raw
bilingual
pure

The goal was Moroccan Darija pretraining, not English benchmark chasing.

Checkpoint Format

This repository stores the original nanochat checkpoint format.

Files:

model_003248.pt
meta_003248.json
tokenizer/tokenizer.pkl
tokenizer/token_bytes.pt

Final Training Metrics

Total training time: 51.83 minutes
Final validation BPB: 0.744182
Best validation BPB: 0.743422
Base eval train BPB: 0.598625
Base eval val BPB: 0.742597
CORE metric: 0.0593

Base Eval

Metric	Score
HellaSwag	29.56
ARC Easy	29.17
ARC Challenge	21.33
ARC Average	25.25
PIQA	53.65
CommonsenseQA	33.25
Winogrande	48.70
OpenBookQA	22.40
BoolQ	56.02
COPA	59.00

Benchmark Context

For rough context only, here is a comparison with a few small English-oriented base models. These models were trained for broad general benchmarks. Nanochat Moroccan Base 0.7B was trained for Moroccan Darija pretraining, so this comparison should be read as reference, not as a direct leaderboard claim.

Metric	Nanochat-Moroccan-Base-0.7B	SmolLM2-360M	Qwen2.5-0.5B	SmolLM-360M
HellaSwag	29.6	54.5	51.2	51.8
ARC (Average)	25.3	53.0	45.4	50.1
PIQA	53.7	71.7	69.9	71.6
MMLU (cloze)	-	35.8	33.7	34.4
CommonsenseQA	33.3	38.0	31.6	35.3
TriviaQA	-	16.9	4.3	9.1
Winogrande	48.7	52.5	54.1	52.8
OpenBookQA	22.4	37.4	37.4	37.2
GSM8K (5-shot)	-	3.2	33.4	1.6

Small note: this model was not built to score well on English benchmarks. Its target was Moroccan Darija base pretraining.

Limitations

This is a base pretrained model, not an instruction-tuned assistant.
It can generate inaccurate, repetitive, biased, or unsafe text.
English benchmark scores are secondary and should not be read as the goal of this model.
Real conversational quality should be judged after SFT.

Disclaimer

This release is for research and experimental use. Please evaluate carefully before using it in any real product or user-facing setting.

Credits

Built on top of karpathy/nanochat.

Training adaptation, dataset work, and release by Lyte.

Pretraining data: Lyte/darija-pretraining-corpus

Citation

If you use this model, please cite:

@misc{nanochat-moroccan-base-0.7B,
  author = {Lyte},
  title = {Nanochat Moroccan Base 0.7B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/KandirResearch/Nanochat-Moroccan-Base-0.7B}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KandirResearch/Nanochat-Moroccan-Base-0.7B

Finetunes

1 model

Collection including KandirResearch/Nanochat-Moroccan-Base-0.7B

Nanochat — The First Moroccan Darija Language Model Family

Collection

Nanochat Moroccan Model Family: models built for Moroccan Darija. Includes the Base model, the raw Instruct checkpoint, and the HF-compatible Instruct • 8 items • Updated Mar 9 • 2