File size: 3,336 Bytes
772bb90
 
 
 
48b6c03
 
772bb90
b024067
48b6c03
 
 
772bb90
 
 
b024067
 
 
 
 
772bb90
b024067
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
language:
- id
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- indonesian
- aksarallm
- archived
- research
---
# Kiel-90M-Matured

> ⚠️ **Status: research artifact / early experiment, not a usable language model today.**
> The tokenizer used at training time was not preserved in this repository,
> so the checkpoint cannot be loaded end-to-end with HF `AutoModel` /
> `AutoTokenizer`. Output quality on standard Indonesian benchmarks is far
> below the org's working models (`Kiel-Pro-0.5B-v3`, `AksaraLLM-Qwen-1.5B-v5-public`).

## What this is

A **107M-parameter** decoder-only transformer trained from scratch as
part of the early AksaraLLM experiments. Architecture (inferred from weight
shapes):

| Property | Value |
|----------|-------|
| Parameters | 106.5M |
| Layers | 10 |
| Heads | 10 |
| Hidden size | 640 |
| FFN size (SwiGLU) | 2560 |
| Vocabulary | 32000 |
| Context length | 256 |
| RMSNorm + RoPE + SwiGLU | yes |

## Measured baseline (Devin audit, CPU eval)

We loaded this checkpoint with `aksarallm.model.aksaraLLMModel`, tested several
candidate tokenizers (`AksaraLLM/aksara-tokenizer-v1/v2/v3`, Llama-2 SentencePiece,
GPT-2 BPE), and ran perplexity on 50 short Indonesian Wikipedia-style sentences
plus 5 free-form prompts. Best-case results:

- **Perplexity**: BROKEN (tokenizer mismatch — see Limitations) — meaning the model is **not** modelling Indonesian distribution.
- **English-stopword ratio in ID-prompted output**: 0.0%
- **Indonesian-stopword ratio in ID-prompted output**: 0.0%

Sample output for prompt "Indonesia adalah negara":

```
Indonesia adalah negaraivery3ies month4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
```

## Why the original "Skor 10/11 (Grade S)" claim is misleading

The score in earlier versions of this README is from a custom 11-question
in-house scorecard graded on a tiny SFT set, not from a standard language
modelling metric. By any standard LM evaluation (perplexity, response
coherence on out-of-distribution prompts, identity calibration), this model
does not function.

## Limitations

- **Tokenizer not preserved.** Without it, downstream usage is impossible.
- **No HF-compatible config.json** in the original training pipeline; the
  ` aksarallm` package is required for loading.
- **Vocab size 32000** — does not match any published AksaraLLM
  tokenizer (32 768, 64 000) or common open tokenizers (Llama-2 = 32 000).
- **Trained on a small mixed corpus** (Wikipedia / SFT pairs), insufficient
  for general Indonesian generation at this scale.

## What to use instead

If you want a small Indonesian LM in the AksaraLLM family, use:

- [`AksaraLLM/Kiel-Pro-0.5B-v3`](https://huggingface.co/AksaraLLM/Kiel-Pro-0.5B-v3) — 494M Qwen2-based, perplexity ≈ 15 on the same eval set.
- [`AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public`](https://huggingface.co/AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public) — 1.78B Qwen2-based, perplexity ≈ 8.4.

## Citation

```
@misc{aksarallm-kiel-90m-matured,
  author       = {Cahyok Putra and AksaraLLM Community},
  title        = {Kiel-90M-Matured: early-experiment Indonesian transformer (107M params)},
  year         = 2025,
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AksaraLLM/Kiel-90M-Matured}},
}
```

## License

Apache 2.0