EkiTil-600M Translate: Kazakh-Russian Translation Model
Bilingual Kazakh-Russian translation model fine-tuned from EkiTil-600M base on 5.1M parallel sentence pairs.
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen3ForCausalLM (decoder-only) |
| Parameters | 673.8M |
| Base model | ekitil-core-qwen3-600m-kkru-base-v1 |
| Training data | ekitil-parallel-kkru-v2 (5.1M kk-ru pairs) |
| Training examples | 10.2M (both directions: kk->ru + ru->kk) |
| Training steps | 19,921 (~0.5 epochs) |
| Final loss | 2.62 |
| LR | 2e-5 (cosine decay, 500 warmup) |
| Effective batch | 256 (32 x 8 grad accum) |
| Hardware | 1x NVIDIA H100 80GB |
| Training time | ~4.5h |
Translation Format
<|kk|> казахский текст <|translate|> <|ru|> -> generates Russian translation
<|ru|> русский текст <|translate|> <|kk|> -> generates Kazakh translation
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"stukenov/ekitil-core-qwen3-600m-kkru-translate-v1",
dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("stukenov/ekitil-core-qwen3-600m-kkru-translate-v1")
# Kazakh -> Russian
prompt = "<|kk|> Қазақстан — Орталық Азиядағы ең үлкен мемлекет. <|translate|> <|ru|>"
ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=100, repetition_penalty=1.3, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
# Russian -> Kazakh
prompt = "<|ru|> Образование является ключом к успеху. <|translate|> <|kk|>"
ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=100, repetition_penalty=1.3, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
Training Data
5.1M deduplicated kk-ru parallel pairs from 13 sources:
| Source | Pairs |
|---|---|
| WMT19 crawl | 4,512,841 |
| KazParC (human-translated) | 362,208 |
| OPUS (12 corpora) | 225,924 |
| Total | 5,100,973 |
See ekitil-parallel-kkru-v2 for full details.
Generation Examples
Generated with repetition_penalty=1.3, num_beams=4:
Kazakh -> Russian:
SRC: Қазақстан — Орталық Азиядағы ең үлкен мемлекет.
OUT: Казахстан является одним из крупнейших государств мира.
SRC: Бүгін ауа райы өте жақсы.
OUT: У нас очень хорошая погода.
SRC: Мен университетте оқимын.
OUT: У нас в университете.
Russian -> Kazakh:
SRC: Казахстан — красивая страна с богатой историей.
OUT: Қазақстан — Қазақстанның ең бай тарихы. Қазақстан — өте бай ел.
SRC: Здравствуйте, как у вас дела?
OUT: Сіздер туралы айтып беріңізші.
SRC: Образование является ключом к успеху.
OUT: Өнеркәсiптiк iс-әрекеттiң ерекшелiктерi бiр-бiрiмен байланысты.
Note: Translation captures meaning but quality varies. The model was trained for only 0.5 epochs — more training would improve fluency. Use
repetition_penalty >= 1.3to avoid repetition.
Limitations
- Decoder-only architecture may repeat output; use
repetition_penalty >= 1.2 - Trained for only 0.5 epochs; more training could improve quality
- Best for sentence-level translation, not full documents
- Translation quality varies by domain (strongest on legal/government text from WMT19)
EkiTil Model Family
| Model | Type | Params | HF |
|---|---|---|---|
| EkiTil-123M | Base LM | 124.7M | base-v1 |
| EkiTil-300M | Base LM | 245.9M | base-v1 |
| EkiTil-600M | Base LM | 673.8M | base-v1 |
| EkiTil-600M Translate | Translation | 673.8M | this model |
License
MIT
- Downloads last month
- 11
Model tree for stukenov/ekitil-core-qwen3-600m-kkru-translate-v1
Base model
stukenov/ekitil-core-qwen3-600m-kkru-base-v1