Arkadiko V5 Experiments โ 150M Architecture Validation
Research artifacts from the 150M production prep sprint (2026-04-12). Single RTX PRO 4000 Blackwell (24GB), /bin/zsh compute.
Contents
| Directory | What | Key result |
|---|---|---|
| tokenizer/ | 60K SentencePiece BPE (6-lang) | ar 4.97, en 5.72 chars/tok |
| aeq_v10_adamw/ | 150M calibration (MoDA + LASER2 cross-attn) | val 2.81, 37.8K tok/s |
| aeq_v10b_sophia/ | Sophia-G matched-step head-to-head | val 2.57 but SFT WORSE |
| sft_adamw/ | AdamW SFT (22K Aya examples) | delta -0.53 overall |
| sft_sophia/ | Sophia SFT comparison | delta -0.37 (REJECTED) |
Architecture (locked for production)
14L x 640d x GQA 5:1 x MoDA(block_size=2), per-layer cross-attention to frozen LASER2 BiLSTM. 173.5M params. AdamW optimizer.
Sophia verdict (ADR-192): REJECTED
Pre-training val loss 8.6% better, but SFT transfer 30% WORSE. Russian regressed. Generations degenerate. Third confirmation that pre-training loss != downstream quality (L-296).
License
CC BY-NC 4.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support