You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SozKZ mGPT 1.3B Translate KK-RU v1

A bidirectional Kazakh-Russian translation model based on mGPT-1.3B-kazakh. Trained in two stages on 500K parallel sentence pairs from EkiTil Parallel Corpus.

Model Details

Base model ai-forever/mGPT-1.3B-kazakh
Architecture GPT-2 (24 layers, 2048 hidden, 16 heads)
Parameters 1.42B
Languages Kazakh (kk) <-> Russian (ru)
License MIT
Training data stukenov/ekitil-parallel-kkru-v2 (500K pairs)
Hardware 1x NVIDIA H100 80GB SXM
Total training time ~6.5 hours

Training Pipeline

Stage 1: Continual Pretraining (full fine-tune)

  • Format: [KK>RU] source [SEP] target</s> and [RU>KK] source [SEP] target</s>
  • 1M examples (500K pairs x 2 directions), 1 epoch
  • BS=32, grad_accum=2, lr=2e-5, cosine schedule
  • Eval loss: 1.054

Stage 2: SFT with LoRA (instruction format)

  • Format: ### Аудар [KK>RU]:\nsource\n### Аударма:\ntarget</s>
  • LoRA r=32, alpha=64, targets: c_attn, c_proj, c_fc (25M trainable params)
  • 1M examples, 1 epoch
  • Eval loss: 0.896

Prompt Format

Kazakh to Russian

### Аудар [KK>RU]:
Қазақстан Республикасы — Орталық Азиядағы мемлекет.
### Аударма:

Russian to Kazakh

### Аудар [RU>KK]:
Казахстан — государство в Центральной Азии.
### Аударма:

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "stukenov/sozkz-mgpt-1.3b-translate-kkru-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("stukenov/sozkz-mgpt-1.3b-translate-kkru-v1")

# Kazakh -> Russian
prompt = "### Аудар [KK>RU]:\nҚазақстан Республикасы — Орталық Азиядағы мемлекет.\n### Аударма:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=100, do_sample=False, repetition_penalty=1.2)

print(tokenizer.decode(out[0], skip_special_tokens=True))
# -> Республика Казахстан - государство в Центральной Азии.

Translation Examples

Direction Input Output
KK->RU Қазақстан Республикасы — Орталық Азиядағы мемлекет. Республика Казахстан - государство в Центральной Азии.
RU->KK Казахстан — государство в Центральной Азии. Қазақстанда Орталық Азияда мемлекет орналасады.
KK->RU Абай Құнанбайұлы — ұлы қазақ ақыны, ағартушы, ойшыл. Академик Абая Кунанбаева - выдающийся казахский поэт и просветитель.
RU->KK Алматы — крупнейший город Казахстана и культурная столица. Астана Қазақстанның елордасы және мәдени орталығы болып табылады.
RU->KK Образование является важнейшим направлением политики. Білім мемлекеттік саясаттың мақсаты болып табылады.

Limitations

  • Repetition issues on longer outputs (needs repetition_penalty=1.2+)
  • Some factual errors in translation (e.g., names may be altered)
  • Decoder-only architecture is less optimal for translation than encoder-decoder (T5/NLLB)
  • Not evaluated on FLORES+ benchmark yet

Related Models

Citation

@misc{sozkz-mgpt-translate-2026,
  title={SozKZ mGPT 1.3B Translate KK-RU v1},
  author={Stukenov, Saken},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/stukenov/sozkz-mgpt-1.3b-translate-kkru-v1}
}
Downloads last month
274
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stukenov/sozkz-mgpt-1.3b-translate-kkru-v1

Finetuned
(2)
this model

Dataset used to train stukenov/sozkz-mgpt-1.3b-translate-kkru-v1

Evaluation results