--- language: - id license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - indonesian - aksarallm - archived - research --- # Kiel-59M-Matured > ⚠️ **Status: early experiment.** > This 85M-parameter decoder-only transformer was trained from scratch > as part of the early AksaraLLM line. It uses the **GPT-2 BPE** tokenizer > (50257 vocab) which is not optimal for Indonesian, and the > training corpus was limited. By standard perplexity it is **not** a usable > Indonesian language model today. ## Architecture | Property | Value | |----------|-------| | Parameters | 85.0M | | Layers | 8 | | Heads | 8 | | Hidden size | 512 | | FFN size | 2048 | | Vocabulary | 50257 (GPT-2 BPE) | | Context length | 256 | | RMSNorm + RoPE + SwiGLU | yes | ## Measured baseline (Devin audit, CPU eval) - **Perplexity** (50 ID sentences, GPT-2 tokenizer): 23154 (very high — model not converged) - **English-stopword ratio in ID-prompted output**: 0.0% - **Indonesian-stopword ratio in ID-prompted output**: 0.0% For comparison, the working Indonesian models in this org reach perplexity ≈ 8–15 on the same 50-sentence eval set. Sample for "Indonesia adalah negara": ``` Indonesia adalah negaraalum questionich4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ``` ## Why the previous "Skor 10/11 Grade S" is misleading That figure is from a custom 11-question in-house scorecard, not from a standard LM evaluation. Perplexity on plain Indonesian text reveals that this checkpoint cannot model the distribution. ## Limitations - **Wrong tokenizer for the language**: GPT-2 BPE is optimised for English. - **Severely under-trained** at this size + corpus. - **No chat template** in tokenizer config; treat as a base LM only. ## What to use instead - [`AksaraLLM/Kiel-Pro-0.5B-v3`](https://huggingface.co/AksaraLLM/Kiel-Pro-0.5B-v3) — 494M Qwen2-based, PPL ≈ 15. - [`AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public`](https://huggingface.co/AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public) — 1.78B Qwen2-based, PPL ≈ 8.4. ## License Apache 2.0