Kiel-Mini-59M-DPO

⚠️ Status: early experiment. This 85M-parameter decoder-only transformer was trained from scratch as part of the early AksaraLLM line. It uses the GPT-2 BPE tokenizer (50257 vocab) which is not optimal for Indonesian, and the training corpus was limited. By standard perplexity it is not a usable Indonesian language model today.

Architecture

Property	Value
Parameters	85.0M
Layers	8
Heads	8
Hidden size	512
FFN size	2048
Vocabulary	50257 (GPT-2 BPE)
Context length	128
RMSNorm + RoPE + SwiGLU	yes

Measured baseline (Devin audit, CPU eval)

Perplexity (50 ID sentences, GPT-2 tokenizer): 56525 (very high — model not converged)
English-stopword ratio in ID-prompted output: 0.6%
Indonesian-stopword ratio in ID-prompted output: 0.0%

For comparison, the working Indonesian models in this org reach perplexity ≈ 8–15 on the same 50-sentence eval set.

Sample for "Indonesia adalah negara":

Indonesia adalah negara coal covetedutterstock Citizensindependencealky mac motive <!-- Megan port Ruff togetDefinitionagamemarkets scars Contribut sort finances SharmaJoe [' quarterbacks698 admiredar

Why the previous "Skor 10/11 Grade S" is misleading

That figure is from a custom 11-question in-house scorecard, not from a standard LM evaluation. Perplexity on plain Indonesian text reveals that this checkpoint cannot model the distribution.

Limitations

Wrong tokenizer for the language: GPT-2 BPE is optimised for English.
Severely under-trained at this size + corpus.
No chat template in tokenizer config; treat as a base LM only.

What to use instead

AksaraLLM/Kiel-Pro-0.5B-v3 — 494M Qwen2-based, PPL ≈ 15.
AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public — 1.78B Qwen2-based, PPL ≈ 8.4.

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track