ruGPT-3 XL (HuggingFace format) GGUF

A 1.3B-parameter GPT-3-style language model for Russian, converted from the original ai-forever/rugpt3xl Megatron-LM checkpoint into a native HuggingFace transformers format.

This is a base (pretrained) model, not instruction-tuned. It performs text completion and can be fine-tuned for downstream tasks.

Details in "A family of pretrained transformer language models for Russian" paper.

Model Details

Parameter Value
Parameters 1.3B
Architecture GPT-3 (decoder-only transformer)
Hidden size 2048
Layers 24
Attention heads 16
FFN intermediate size 8192
Max sequence length 2048
Vocabulary 50,264 tokens (BPE)
Activation GELU
Normalization Pre-LayerNorm
Position encoding Learned absolute
Precision float16
Training data 80B tokens of Russian text (4 epochs)
Test perplexity 12.05

Quick Start

Example with Q4_K_M:

./llama.cpp/build/bin/llama-cli \
  -m ./ruGPT3XL-GGUF/ruGPT3XL-q4_k_m.gguf \
  -c 2048 \
  -p "Москва - столица" \
  -n 128 \
  --temp 0.7 \
  --top-p 0.9 \
  --repeat-penalty 1.2

Notes:

  • Use -c 2048 for the native context length.
  • Prefer ruGPT3XL-q4_k_m.gguf or ruGPT3XL-q8_0.gguf for CPU inference.
  • Use ruGPT3XL-f16.gguf mainly for GPU.

Start server:

./llama.cpp/build/bin/llama-server \
  -m ./ruGPT3XL-GGUF/ruGPT3XL-q4_k_m.gguf \
  -c 2048 \
  --host 127.0.0.1 \
  --port 8080

Example request:

curl http://127.0.0.1:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Вопрос: Какая столица России?\n\nОтвет: ",
    "n_predict": 128,
    "temperature": 0.7,
    "top_p": 0.9,
    "repeat_penalty": 1.2
  }'

Limitations

  • This is a base model trained on Russian internet text. It may generate biased, factually incorrect, or offensive content.
  • The model was trained primarily on Russian text. It has limited capability in other languages.
  • Maximum context length is 2048 tokens. Inputs longer than this will be truncated.
  • The model is not instruction-tuned and works best for text completion rather than following specific instructions.

Citation

@misc{rugpt3xl-gguf,
  title={ruGPT3XL-GGUF},
  author={Pavel Rykov},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/evilfreelancer/ruGPT3XL-GGUF}
}

Links

Downloads last month
383
GGUF
Model size
1B params
Architecture
gpt2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for evilfreelancer/ruGPT3XL-GGUF

Quantized
(2)
this model