You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SozKZ mGPT 1.3B Kazakh Instruct v1

A Kazakh-language instruction-following model. Fine-tuned from mGPT-1.3B-kazakh on 52K Kazakh instruction-output pairs.

Model Details

Base model ai-forever/mGPT-1.3B-kazakh
Architecture GPT-2 (24 layers, 2048 hidden, 16 heads)
Parameters 1.42B
Language Kazakh (kk)
License MIT
Training data AmanMussa/kazakh-instruction-v2 (52,201 examples)
Training method Full-parameter SFT (no LoRA/adapters)
Precision bf16
Hardware 1x NVIDIA H100 80GB SXM
Training time 79 minutes
Eval loss 0.919

Prompt Format

### Instruction:
{instruction}

### Response:

With optional input:

### Instruction:
{instruction}

### Input:
{input}

### Response:

The Kazakh-language prompt template used during training:

### Nusqaulyq:
{instruction}

### Zhauap:

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "stukenov/sozkz-mgpt-1.3b-kk-instruct-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("stukenov/sozkz-mgpt-1.3b-kk-instruct-v1")

prompt = "### Нұсқаулық:\nАбай Құнанбайұлы кім?\n\n### Жауап:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.2,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Hyperparameters

Parameter Value
Epochs 3
Batch size 4
Gradient accumulation 8 (effective batch = 32)
Learning rate 2e-5
LR scheduler cosine
Warmup ratio 0.05
Max sequence length 512
Optimizer AdamW
Gradient checkpointing enabled

Training Progress

Step Train Loss Eval Loss Epoch
500 1.043 0.976 0.31
1000 0.971 0.931 0.63
1500 0.958 0.921 0.94
2000 0.951 0.919 1.25
2500 0.950 0.921 1.56
3000 0.944 0.919 1.88
3500 0.949 0.920 2.19
4000 0.947 0.919 2.50
4500 0.950 0.919 2.81
4797 0.950 0.919 3.00

Why This Base Model?

The base model was selected through a comprehensive benchmark (exp033) that evaluated 28 open-source models on Kazakh language perplexity, tokenizer efficiency, and generation quality. mGPT-1.3B-kazakh ranked #1 with the lowest Kazakh perplexity among all tested models, including models 10x its size.

Rank Model Kazakh PPL Params
1 mGPT-1.3B-kazakh (this model's base) 2.0 1.4B
2 mGPT-13B 2.4 13.1B
3 mGPT-1.3B 2.7 1.4B
4 LLaMA-3.1-8B 3.2 8.0B
5 LLaMA-3.2-3B 3.5 3.2B
6 Qwen2.5-14B 4.3 14.8B
7 LLaMA-3.2-1B 4.4 1.2B
8 Gemma-2-9B 4.8 9.2B

The mGPT family (by ai-forever) was pre-trained on 60+ languages including Kazakh, giving it a strong multilingual foundation that larger English-centric models lack for low-resource languages.

Limitations

  • Limited reasoning capacity: 1.4B parameters constrains complex reasoning and factual recall
  • Hallucinations: may generate plausible but incorrect facts, especially for specific dates, numbers, or niche topics
  • Kazakh-focused: Russian and English capabilities are significantly degraded compared to base mGPT (PPL ru=21.6, en=18.2)
  • No safety tuning: no RLHF, DPO, or content filtering has been applied
  • Dataset quality: training data is machine-translated Alpaca-style instructions, not human-curated Kazakh content

Related Models

Citation

@misc{sozkz-mgpt-instruct-2026,
  title={SozKZ mGPT 1.3B Kazakh Instruct v1},
  author={Stukenov, Saken},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/stukenov/sozkz-mgpt-1.3b-kk-instruct-v1}
}
Downloads last month
221
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stukenov/sozkz-mgpt-1.3b-kk-instruct-v1

Finetuned
(2)
this model

Dataset used to train stukenov/sozkz-mgpt-1.3b-kk-instruct-v1

Evaluation results