How to use from
Docker Model Runner
docker model run hf.co/AksaraLLM/Kiel-Mini-59M-DPO
Quick Links

Kiel-Mini-59M-DPO

โš ๏ธ Status: early experiment. This 85M-parameter decoder-only transformer was trained from scratch as part of the early AksaraLLM line. It uses the GPT-2 BPE tokenizer (50257 vocab) which is not optimal for Indonesian, and the training corpus was limited. By standard perplexity it is not a usable Indonesian language model today.

Architecture

Property Value
Parameters 85.0M
Layers 8
Heads 8
Hidden size 512
FFN size 2048
Vocabulary 50257 (GPT-2 BPE)
Context length 128
RMSNorm + RoPE + SwiGLU yes

Measured baseline (Devin audit, CPU eval)

  • Perplexity (50 ID sentences, GPT-2 tokenizer): 56525 (very high โ€” model not converged)
  • English-stopword ratio in ID-prompted output: 0.6%
  • Indonesian-stopword ratio in ID-prompted output: 0.0%

For comparison, the working Indonesian models in this org reach perplexity โ‰ˆ 8โ€“15 on the same 50-sentence eval set.

Sample for "Indonesia adalah negara":

Indonesia adalah negara coal covetedutterstock Citizensindependencealky mac motive <!-- Megan port Ruff togetDefinitionagamemarkets scars Contribut sort finances SharmaJoe [' quarterbacks698 admiredar

Why the previous "Skor 10/11 Grade S" is misleading

That figure is from a custom 11-question in-house scorecard, not from a standard LM evaluation. Perplexity on plain Indonesian text reveals that this checkpoint cannot model the distribution.

Limitations

  • Wrong tokenizer for the language: GPT-2 BPE is optimised for English.
  • Severely under-trained at this size + corpus.
  • No chat template in tokenizer config; treat as a base LM only.

What to use instead

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support