You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SozKZ mGPT 1.3B Kazakh Instruct v1

A Kazakh-language instruction-following model. Fine-tuned from mGPT-1.3B-kazakh on 52K Kazakh instruction-output pairs.

Model Details


Base model	ai-forever/mGPT-1.3B-kazakh
Architecture	GPT-2 (24 layers, 2048 hidden, 16 heads)
Parameters	1.42B
Language	Kazakh (kk)
License	MIT
Training data	AmanMussa/kazakh-instruction-v2 (52,201 examples)
Training method	Full-parameter SFT (no LoRA/adapters)
Precision	bf16
Hardware	1x NVIDIA H100 80GB SXM
Training time	79 minutes
Eval loss	0.919

Prompt Format

### Instruction:
{instruction}

### Response:

With optional input:

### Instruction:
{instruction}

### Input:
{input}

### Response:

The Kazakh-language prompt template used during training:

### Nusqaulyq:
{instruction}

### Zhauap:

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "stukenov/sozkz-mgpt-1.3b-kk-instruct-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("stukenov/sozkz-mgpt-1.3b-kk-instruct-v1")

prompt = "### Нұсқаулық:\nАбай Құнанбайұлы кім?\n\n### Жауап:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.2,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Hyperparameters

Parameter	Value
Epochs	3
Batch size	4
Gradient accumulation	8 (effective batch = 32)
Learning rate	2e-5
LR scheduler	cosine
Warmup ratio	0.05
Max sequence length	512
Optimizer	AdamW
Gradient checkpointing	enabled

Training Progress

Step	Train Loss	Eval Loss	Epoch
500	1.043	0.976	0.31
1000	0.971	0.931	0.63
1500	0.958	0.921	0.94
2000	0.951	0.919	1.25
2500	0.950	0.921	1.56
3000	0.944	0.919	1.88
3500	0.949	0.920	2.19
4000	0.947	0.919	2.50
4500	0.950	0.919	2.81
4797	0.950	0.919	3.00

Why This Base Model?

The base model was selected through a comprehensive benchmark (exp033) that evaluated 28 open-source models on Kazakh language perplexity, tokenizer efficiency, and generation quality. mGPT-1.3B-kazakh ranked #1 with the lowest Kazakh perplexity among all tested models, including models 10x its size.

Rank	Model	Kazakh PPL	Params
1	mGPT-1.3B-kazakh (this model's base)	2.0	1.4B
2	mGPT-13B	2.4	13.1B
3	mGPT-1.3B	2.7	1.4B
4	LLaMA-3.1-8B	3.2	8.0B
5	LLaMA-3.2-3B	3.5	3.2B
6	Qwen2.5-14B	4.3	14.8B
7	LLaMA-3.2-1B	4.4	1.2B
8	Gemma-2-9B	4.8	9.2B

The mGPT family (by ai-forever) was pre-trained on 60+ languages including Kazakh, giving it a strong multilingual foundation that larger English-centric models lack for low-resource languages.

Limitations

Limited reasoning capacity: 1.4B parameters constrains complex reasoning and factual recall
Hallucinations: may generate plausible but incorrect facts, especially for specific dates, numbers, or niche topics
Kazakh-focused: Russian and English capabilities are significantly degraded compared to base mGPT (PPL ru=21.6, en=18.2)
No safety tuning: no RLHF, DPO, or content filtering has been applied
Dataset quality: training data is machine-translated Alpaca-style instructions, not human-curated Kazakh content

Related Models

ai-forever/mGPT - base multilingual model (60+ languages)
ai-forever/mGPT-1.3B-kazakh - Kazakh-adapted base (this model's parent)

Citation

@misc{sozkz-mgpt-instruct-2026,
  title={SozKZ mGPT 1.3B Kazakh Instruct v1},
  author={Stukenov, Saken},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/stukenov/sozkz-mgpt-1.3b-kk-instruct-v1}
}

Downloads last month: 221

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for stukenov/sozkz-mgpt-1.3b-kk-instruct-v1

Base model

ai-forever/mGPT-1.3B-kazakh

Finetuned

(2)

this model

Dataset used to train stukenov/sozkz-mgpt-1.3b-kk-instruct-v1

Evaluation results

eval_loss
self-reported

0.919