ruGPT-3 XL (HuggingFace format) GGUF

A 1.3B-parameter GPT-3-style language model for Russian, converted from the original ai-forever/rugpt3xl Megatron-LM checkpoint into a native HuggingFace transformers format.

This is a base (pretrained) model, not instruction-tuned. It performs text completion and can be fine-tuned for downstream tasks.

Details in "A family of pretrained transformer language models for Russian" paper.

Model Details

Parameter	Value
Parameters	1.3B
Architecture	GPT-3 (decoder-only transformer)
Hidden size	2048
Layers	24
Attention heads	16
FFN intermediate size	8192
Max sequence length	2048
Vocabulary	50,264 tokens (BPE)
Activation	GELU
Normalization	Pre-LayerNorm
Position encoding	Learned absolute
Precision	float16
Training data	80B tokens of Russian text (4 epochs)
Test perplexity	12.05

Quick Start

Example with Q4_K_M:

./llama.cpp/build/bin/llama-cli \
  -m ./ruGPT3XL-GGUF/ruGPT3XL-q4_k_m.gguf \
  -c 2048 \
  -p "Москва - столица" \
  -n 128 \
  --temp 0.7 \
  --top-p 0.9 \
  --repeat-penalty 1.2

Notes:

Use -c 2048 for the native context length.
Prefer ruGPT3XL-q4_k_m.gguf or ruGPT3XL-q8_0.gguf for CPU inference.
Use ruGPT3XL-f16.gguf mainly for GPU.

Start server:

./llama.cpp/build/bin/llama-server \
  -m ./ruGPT3XL-GGUF/ruGPT3XL-q4_k_m.gguf \
  -c 2048 \
  --host 127.0.0.1 \
  --port 8080

Example request:

curl http://127.0.0.1:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Вопрос: Какая столица России?\n\nОтвет: ",
    "n_predict": 128,
    "temperature": 0.7,
    "top_p": 0.9,
    "repeat_penalty": 1.2
  }'

Limitations

This is a base model trained on Russian internet text. It may generate biased, factually incorrect, or offensive content.
The model was trained primarily on Russian text. It has limited capability in other languages.
Maximum context length is 2048 tokens. Inputs longer than this will be truncated.
The model is not instruction-tuned and works best for text completion rather than following specific instructions.

Citation

@misc{rugpt3xl-gguf,
  title={ruGPT3XL-GGUF},
  author={Pavel Rykov},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/evilfreelancer/ruGPT3XL-GGUF}
}

Model tree for evilfreelancer/ruGPT3XL-GGUF

Base model

ai-forever/rugpt3xl

Finetuned

evilfreelancer/ruGPT3XL

Quantized

(2)

this model