[Devin Audit] add HF YAML front-matter (language, license, base_model, tags) for discoverability

48b6c03 verified 5 days ago

3.34 kB

language:
  - id
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - indonesian
  - aksarallm
  - archived
  - research

Kiel-90M-Matured

⚠️ Status: research artifact / early experiment, not a usable language model today. The tokenizer used at training time was not preserved in this repository, so the checkpoint cannot be loaded end-to-end with HF AutoModel / AutoTokenizer. Output quality on standard Indonesian benchmarks is far below the org's working models (Kiel-Pro-0.5B-v3, AksaraLLM-Qwen-1.5B-v5-public).

What this is

A 107M-parameter decoder-only transformer trained from scratch as part of the early AksaraLLM experiments. Architecture (inferred from weight shapes):

Property	Value
Parameters	106.5M
Layers	10
Heads	10
Hidden size	640
FFN size (SwiGLU)	2560
Vocabulary	32000
Context length	256
RMSNorm + RoPE + SwiGLU	yes

Measured baseline (Devin audit, CPU eval)

We loaded this checkpoint with aksarallm.model.aksaraLLMModel, tested several candidate tokenizers (AksaraLLM/aksara-tokenizer-v1/v2/v3, Llama-2 SentencePiece, GPT-2 BPE), and ran perplexity on 50 short Indonesian Wikipedia-style sentences plus 5 free-form prompts. Best-case results:

Perplexity: BROKEN (tokenizer mismatch — see Limitations) — meaning the model is not modelling Indonesian distribution.
English-stopword ratio in ID-prompted output: 0.0%
Indonesian-stopword ratio in ID-prompted output: 0.0%

Sample output for prompt "Indonesia adalah negara":

Indonesia adalah negaraivery3ies month4!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Why the original "Skor 10/11 (Grade S)" claim is misleading

The score in earlier versions of this README is from a custom 11-question in-house scorecard graded on a tiny SFT set, not from a standard language modelling metric. By any standard LM evaluation (perplexity, response coherence on out-of-distribution prompts, identity calibration), this model does not function.

Limitations

Tokenizer not preserved. Without it, downstream usage is impossible.
No HF-compatible config.json in the original training pipeline; the aksarallm package is required for loading.
Vocab size 32000 — does not match any published AksaraLLM tokenizer (32 768, 64 000) or common open tokenizers (Llama-2 = 32 000).
Trained on a small mixed corpus (Wikipedia / SFT pairs), insufficient for general Indonesian generation at this scale.

What to use instead

If you want a small Indonesian LM in the AksaraLLM family, use:

AksaraLLM/Kiel-Pro-0.5B-v3 — 494M Qwen2-based, perplexity ≈ 15 on the same eval set.
AksaraLLM/AksaraLLM-Qwen-1.5B-v5-public — 1.78B Qwen2-based, perplexity ≈ 8.4.

Citation

@misc{aksarallm-kiel-90m-matured,
  author       = {Cahyok Putra and AksaraLLM Community},
  title        = {Kiel-90M-Matured: early-experiment Indonesian transformer (107M params)},
  year         = 2025,
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AksaraLLM/Kiel-90M-Matured}},
}

License

Apache 2.0