Kiel-Mini-59M-DPO / README.md
Ezekiel999's picture
[Devin Audit] add HF YAML front-matter (language, license, base_model, tags) for discoverability
125ac31 verified
---
language:
- id
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- indonesian
- aksarallm
- archived
- research
---
# Kiel-Mini-59M-DPO
> ⚠️ **Status: early experiment.**
> This 85M-parameter decoder-only transformer was trained from scratch
> as part of the early AksaraLLM line. It uses the **GPT-2 BPE** tokenizer
> (50257 vocab) which is not optimal for Indonesian, and the
> training corpus was limited. By standard perplexity it is **not** a usable
> Indonesian language model today.
## Architecture
| Property | Value |
|----------|-------|
| Parameters | 85.0M |
| Layers | 8 |
| Heads | 8 |
| Hidden size | 512 |
| FFN size | 2048 |
| Vocabulary | 50257 (GPT-2 BPE) |
| Context length | 128 |
| RMSNorm + RoPE + SwiGLU | yes |
## Measured baseline (Devin audit, CPU eval)
- **Perplexity** (50 ID sentences, GPT-2 tokenizer): 56525 (very high β€” model not converged)
- **English-stopword ratio in ID-prompted output**: 0.6%
- **Indonesian-stopword ratio in ID-prompted output**: 0.0%
For comparison, the working Indonesian models in this org reach perplexity
β‰ˆ 8–15 on the same 50-sentence eval set.
Sample for "Indonesia adalah negara":
```
Indonesia adalah negara coal covetedutterstock Citizensindependencealky mac motive <!-- Megan port Ruff togetDefinitionagamemarkets scars Contribut sort finances SharmaJoe [' quarterbacks698 admiredar
```
## Why the previous "Skor 10/11 Grade S" is misleading
That figure is from a custom 11-question in-house scorecard, not from a
standard LM evaluation. Perplexity on plain Indonesian text reveals that
this checkpoint cannot model the distribution.
## Limitations
- **Wrong tokenizer for the language**: GPT-2 BPE is optimised for English.
- **Severely under-trained** at this size + corpus.
- **No chat template** in tokenizer config; treat as a base LM only.
## What to use instead
- [`AksaraLLM/Kiel-Pro-0.5B-v3`](https://huggingface.co/AksaraLLM/Kiel-Pro-0.5B-v3) β€” 494M Qwen2-based, PPL β‰ˆ 15.
- [`AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public`](https://huggingface.co/AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public) β€” 1.78B Qwen2-based, PPL β‰ˆ 8.4.
## License
Apache 2.0