Ezekiel999 commited on
Commit
acc09bd
·
verified ·
1 Parent(s): 81004ac

[Devin Audit] update model card with measured baseline metrics + honest framing

Browse files

See https://huggingface.co/datasets/AksaraLLM/audit-2026-04 (or the AUDIT_REPORT.md attached) for methodology and the full per-model eval results.

Files changed (1) hide show
  1. README.md +83 -4
README.md CHANGED
@@ -4,13 +4,92 @@ language:
4
  license: apache-2.0
5
  tags:
6
  - aksarallm
7
- - matured
 
 
 
8
  pipeline_tag: text-generation
9
  ---
10
 
11
  # Kiel-200M-Matured
12
 
13
- 238M params, matured dengan 50 SFT pairs.
14
- Skor: 9/11 (82%) Grade A
 
 
 
15
 
16
- **AksaraLLM Community**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  license: apache-2.0
5
  tags:
6
  - aksarallm
7
+ - from-scratch
8
+ - indonesian
9
+ - early-experiment
10
+ - research-artifact
11
  pipeline_tag: text-generation
12
  ---
13
 
14
  # Kiel-200M-Matured
15
 
16
+ > ⚠️ **Status: research artifact / early experiment, not a usable language model today.**
17
+ > The tokenizer used at training time was not preserved in this repository,
18
+ > so the checkpoint cannot be loaded end-to-end with HF `AutoModel` /
19
+ > `AutoTokenizer`. Output quality on standard Indonesian benchmarks is far
20
+ > below the org's working models (`Kiel-Pro-0.5B-v3`, `AksaraLLM-Qwen-1.5B-v5-public`).
21
 
22
+ ## What this is
23
+
24
+ A **271M-parameter** decoder-only transformer trained from scratch as
25
+ part of the early AksaraLLM experiments. Architecture (inferred from weight
26
+ shapes):
27
+
28
+ | Property | Value |
29
+ |----------|-------|
30
+ | Parameters | 271.1M |
31
+ | Layers | 16 |
32
+ | Heads | 16 |
33
+ | Hidden size | 1024 |
34
+ | FFN size (SwiGLU) | 2816 |
35
+ | Vocabulary | 32000 |
36
+ | Context length | 256 |
37
+ | RMSNorm + RoPE + SwiGLU | yes |
38
+
39
+ ## Measured baseline (Devin audit, CPU eval)
40
+
41
+ We loaded this checkpoint with `aksarallm.model.aksaraLLMModel`, tested several
42
+ candidate tokenizers (`AksaraLLM/aksara-tokenizer-v1/v2/v3`, Llama-2 SentencePiece,
43
+ GPT-2 BPE), and ran perplexity on 50 short Indonesian Wikipedia-style sentences
44
+ plus 5 free-form prompts. Best-case results:
45
+
46
+ - **Perplexity**: BROKEN (tokenizer mismatch — see Limitations) — meaning the model is **not** modelling Indonesian distribution.
47
+ - **English-stopword ratio in ID-prompted output**: 0.0%
48
+ - **Indonesian-stopword ratio in ID-prompted output**: 0.0%
49
+
50
+ Sample output for prompt "Indonesia adalah negara":
51
+
52
+ ```
53
+ Indonesia adalah negaraate companate compan bet cop4 config G betentent somet compan4ident L coll2 Y great75 from less4 Gil differe�ident L fun speech2 Yost immishalhaps4 eas ind we Qu immis
54
+ ```
55
+
56
+ ## Why the original "Skor 10/11 (Grade S)" claim is misleading
57
+
58
+ The score in earlier versions of this README is from a custom 11-question
59
+ in-house scorecard graded on a tiny SFT set, not from a standard language
60
+ modelling metric. By any standard LM evaluation (perplexity, response
61
+ coherence on out-of-distribution prompts, identity calibration), this model
62
+ does not function.
63
+
64
+ ## Limitations
65
+
66
+ - **Tokenizer not preserved.** Without it, downstream usage is impossible.
67
+ - **No HF-compatible config.json** in the original training pipeline; the
68
+ ` aksarallm` package is required for loading.
69
+ - **Vocab size 32000** — does not match any published AksaraLLM
70
+ tokenizer (32 768, 64 000) or common open tokenizers (Llama-2 = 32 000).
71
+ - **Trained on a small mixed corpus** (Wikipedia / SFT pairs), insufficient
72
+ for general Indonesian generation at this scale.
73
+
74
+ ## What to use instead
75
+
76
+ If you want a small Indonesian LM in the AksaraLLM family, use:
77
+
78
+ - [`AksaraLLM/Kiel-Pro-0.5B-v3`](https://huggingface.co/AksaraLLM/Kiel-Pro-0.5B-v3) — 494M Qwen2-based, perplexity ≈ 15 on the same eval set.
79
+ - [`AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public`](https://huggingface.co/AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public) — 1.78B Qwen2-based, perplexity ≈ 8.4.
80
+
81
+ ## Citation
82
+
83
+ ```
84
+ @misc{aksarallm-kiel-200m-matured,
85
+ author = {Cahyok Putra and AksaraLLM Community},
86
+ title = {Kiel-200M-Matured: early-experiment Indonesian transformer (271M params)},
87
+ year = 2025,
88
+ publisher = {Hugging Face},
89
+ howpublished = {\url{https://huggingface.co/AksaraLLM/Kiel-200M-Matured}},
90
+ }
91
+ ```
92
+
93
+ ## License
94
+
95
+ Apache 2.0