Kiel-59M-Matured
โ ๏ธ Status: early experiment. This 85M-parameter decoder-only transformer was trained from scratch as part of the early AksaraLLM line. It uses the GPT-2 BPE tokenizer (50257 vocab) which is not optimal for Indonesian, and the training corpus was limited. By standard perplexity it is not a usable Indonesian language model today.
Architecture
| Property | Value |
|---|---|
| Parameters | 85.0M |
| Layers | 8 |
| Heads | 8 |
| Hidden size | 512 |
| FFN size | 2048 |
| Vocabulary | 50257 (GPT-2 BPE) |
| Context length | 256 |
| RMSNorm + RoPE + SwiGLU | yes |
Measured baseline (Devin audit, CPU eval)
- Perplexity (50 ID sentences, GPT-2 tokenizer): 23154 (very high โ model not converged)
- English-stopword ratio in ID-prompted output: 0.0%
- Indonesian-stopword ratio in ID-prompted output: 0.0%
For comparison, the working Indonesian models in this org reach perplexity โ 8โ15 on the same 50-sentence eval set.
Sample for "Indonesia adalah negara":
Indonesia adalah negaraalum questionich4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Why the previous "Skor 10/11 Grade S" is misleading
That figure is from a custom 11-question in-house scorecard, not from a standard LM evaluation. Perplexity on plain Indonesian text reveals that this checkpoint cannot model the distribution.
Limitations
- Wrong tokenizer for the language: GPT-2 BPE is optimised for English.
- Severely under-trained at this size + corpus.
- No chat template in tokenizer config; treat as a base LM only.
What to use instead
AksaraLLM/Kiel-Pro-0.5B-v3โ 494M Qwen2-based, PPL โ 15.AksaraLLM/AksaraLLM-Qwen-1.5B-v5-publicโ 1.78B Qwen2-based, PPL โ 8.4.
License
Apache 2.0