SozKZ mGPT 1.3B Kazakh Instruct v1
A Kazakh-language instruction-following model. Fine-tuned from mGPT-1.3B-kazakh on 52K Kazakh instruction-output pairs.
Model Details
| Base model | ai-forever/mGPT-1.3B-kazakh |
| Architecture | GPT-2 (24 layers, 2048 hidden, 16 heads) |
| Parameters | 1.42B |
| Language | Kazakh (kk) |
| License | MIT |
| Training data | AmanMussa/kazakh-instruction-v2 (52,201 examples) |
| Training method | Full-parameter SFT (no LoRA/adapters) |
| Precision | bf16 |
| Hardware | 1x NVIDIA H100 80GB SXM |
| Training time | 79 minutes |
| Eval loss | 0.919 |
Prompt Format
### Instruction:
{instruction}
### Response:
With optional input:
### Instruction:
{instruction}
### Input:
{input}
### Response:
The Kazakh-language prompt template used during training:
### Nusqaulyq:
{instruction}
### Zhauap:
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"stukenov/sozkz-mgpt-1.3b-kk-instruct-v1",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("stukenov/sozkz-mgpt-1.3b-kk-instruct-v1")
prompt = "### Нұсқаулық:\nАбай Құнанбайұлы кім?\n\n### Жауап:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch size | 4 |
| Gradient accumulation | 8 (effective batch = 32) |
| Learning rate | 2e-5 |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Max sequence length | 512 |
| Optimizer | AdamW |
| Gradient checkpointing | enabled |
Training Progress
| Step | Train Loss | Eval Loss | Epoch |
|---|---|---|---|
| 500 | 1.043 | 0.976 | 0.31 |
| 1000 | 0.971 | 0.931 | 0.63 |
| 1500 | 0.958 | 0.921 | 0.94 |
| 2000 | 0.951 | 0.919 | 1.25 |
| 2500 | 0.950 | 0.921 | 1.56 |
| 3000 | 0.944 | 0.919 | 1.88 |
| 3500 | 0.949 | 0.920 | 2.19 |
| 4000 | 0.947 | 0.919 | 2.50 |
| 4500 | 0.950 | 0.919 | 2.81 |
| 4797 | 0.950 | 0.919 | 3.00 |
Why This Base Model?
The base model was selected through a comprehensive benchmark (exp033) that evaluated 28 open-source models on Kazakh language perplexity, tokenizer efficiency, and generation quality. mGPT-1.3B-kazakh ranked #1 with the lowest Kazakh perplexity among all tested models, including models 10x its size.
| Rank | Model | Kazakh PPL | Params |
|---|---|---|---|
| 1 | mGPT-1.3B-kazakh (this model's base) | 2.0 | 1.4B |
| 2 | mGPT-13B | 2.4 | 13.1B |
| 3 | mGPT-1.3B | 2.7 | 1.4B |
| 4 | LLaMA-3.1-8B | 3.2 | 8.0B |
| 5 | LLaMA-3.2-3B | 3.5 | 3.2B |
| 6 | Qwen2.5-14B | 4.3 | 14.8B |
| 7 | LLaMA-3.2-1B | 4.4 | 1.2B |
| 8 | Gemma-2-9B | 4.8 | 9.2B |
The mGPT family (by ai-forever) was pre-trained on 60+ languages including Kazakh, giving it a strong multilingual foundation that larger English-centric models lack for low-resource languages.
Limitations
- Limited reasoning capacity: 1.4B parameters constrains complex reasoning and factual recall
- Hallucinations: may generate plausible but incorrect facts, especially for specific dates, numbers, or niche topics
- Kazakh-focused: Russian and English capabilities are significantly degraded compared to base mGPT (PPL ru=21.6, en=18.2)
- No safety tuning: no RLHF, DPO, or content filtering has been applied
- Dataset quality: training data is machine-translated Alpaca-style instructions, not human-curated Kazakh content
Related Models
- ai-forever/mGPT - base multilingual model (60+ languages)
- ai-forever/mGPT-1.3B-kazakh - Kazakh-adapted base (this model's parent)
Citation
@misc{sozkz-mgpt-instruct-2026,
title={SozKZ mGPT 1.3B Kazakh Instruct v1},
author={Stukenov, Saken},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/stukenov/sozkz-mgpt-1.3b-kk-instruct-v1}
}
- Downloads last month
- 221
Model tree for stukenov/sozkz-mgpt-1.3b-kk-instruct-v1
Base model
ai-forever/mGPT-1.3B-kazakhDataset used to train stukenov/sozkz-mgpt-1.3b-kk-instruct-v1
Evaluation results
- eval_lossself-reported0.919