SinLlama GGUF — Sinhala Language Model for Low-End Hardware

Why This Model Exists

The original polyglots/SinLlama_v01 is a powerful Sinhala language model built on Meta-Llama-3-8B. However, it comes with a critical barrier:

The Problem with the Original Model

Requirement	Original SinLlama	This GGUF Version
Model Size	~16 GB (FP16)	~4.65 GB
RAM Required	16–32 GB+	6–8 GB
GPU Required	Yes (CUDA-capable, 16GB+ VRAM)	No (runs on CPU)
Software Stack	PyTorch, Transformers, CUDA	Ollama / llama.cpp
Setup Complexity	High (Python environment, CUDA drivers, etc.)	Minimal
Suitable for Low-End PCs	❌ No	✅ Yes

The original HuggingFace model demands expensive GPU hardware and a complex Python/CUDA environment. Most Sri Lankan developers and students don't have access to high-end machines with 16GB+ VRAM GPUs to even load the model. This creates an accessibility gap — the people who need Sinhala AI the most are locked out from using it.

How This GGUF Model Solves It

This is a Q4_K_M quantized GGUF version of SinLlama that:

Runs on ordinary laptops and desktops — no GPU required
Reduced from ~16 GB to ~4.65 GB — fits in modest RAM
Works with Ollama — one-command setup, no Python knowledge needed
Works with llama.cpp — lightweight, cross-platform inference
Preserves Sinhala language quality — Q4_K_M is the sweet spot between size and accuracy
Extended Sinhala tokenizer — optimized 139K vocabulary for better Sinhala text handling

Model Details

Property	Value
Base Model	Meta-Llama-3-8B
Fine-tuned Model	polyglots/SinLlama_v01
Quantization	Q4_K_M (4-bit, mixed precision)
Format	GGUF (llama.cpp compatible)
File Size	4.65 GB
Context Length	2048 tokens
Vocabulary Size	~139,000 tokens (extended Sinhala)
Languages	Sinhala (සිංහල), English

Quick Start

Option 1: Using Ollama (Recommended — Easiest)

Install Ollama → https://ollama.com/download
Create a Modelfile:

FROM sinllama-q4_k_m.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048

SYSTEM You are SinLlama, a helpful AI assistant that can communicate in Sinhala (සිංහල). You are based on Meta-Llama-3-8B and have been specially trained on Sinhala language data.

Create and run:

ollama create sinllama -f Modelfile
ollama run sinllama

Start chatting in Sinhala:

>>> හෙලෝ, ඔබට කෙසේද?

Option 2: Using llama.cpp

# Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Run the model
./main -m sinllama-q4_k_m.gguf \
  -p "ශ්‍රී ලංකාව පිළිබඳ කියන්න" \
  -n 256 \
  --temp 0.7

Option 3: Using Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="sinllama-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=4,  # Adjust to your CPU cores
)

output = llm(
    "ශ්‍රී ලංකාව පිළිබඳ කියන්න",
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
)

print(output["choices"][0]["text"])

Intended Use

Sinhala language text generation — articles, creative writing, summaries
Conversational AI — chatbots and virtual assistants in Sinhala
Education — Sinhala language learning tools
Research — low-resource NLP research for Sinhala
Customer service — automated Sinhala-language support systems

Limitations

This is a quantized model; there may be minor quality loss compared to the full-precision original
Context window is limited to 2048 tokens
The model may occasionally generate incorrect or nonsensical Sinhala text
Not suitable for critical applications without human oversight
Inherits limitations and biases from the base Llama-3-8B model

Hardware Requirements

Minimum (CPU-only)

RAM: 6 GB available
Storage: 5 GB free disk space
CPU: Any modern x86_64 processor (Intel/AMD)
OS: Windows, macOS, or Linux

Quantization Details

The model was quantized using llama.cpp with the Q4_K_M method:

Q4_K_M uses 4-bit quantization with medium-sized key-value cache
Provides the best balance between model size, inference speed, and output quality
Recommended by the llama.cpp community as the default quantization for most use cases

Quantization	Size	Quality	Speed
FP16 (Original)	~16 GB	★★★★★	Slow (needs GPU)
Q8_0	~8.5 GB	★★★★☆	Moderate
Q4_K_M (This)	~4.65 GB	★★★★☆	Fast
Q4_0	~4.3 GB	★★★☆☆	Fastest

Copyright & License

Model License

This model is distributed under the Meta Llama 3 Community License. By downloading or using this model, you agree to the terms of the Meta Llama 3 Community License Agreement.

Quantization & Distribution

The GGUF quantization and this distribution were prepared by the repository maintainer
The original SinLlama fine-tuning was done by polyglots
All rights to the base architecture belong to Meta Platforms, Inc.

Usage Terms

✅ Free for research and personal use
✅ Free for commercial use (subject to Meta Llama 3 license terms)
✅ Redistribution allowed with attribution
❌ Do not use for generating harmful, misleading, or illegal content
❌ Do not misrepresent the model's outputs as human-written content without disclosure

Attribution

If you use this model in your work, please cite:

@misc{sinllama-gguf,
  title={SinLlama GGUF - Quantized Sinhala Language Model},
  author={SanudaDev},
  year={2025},
  url={https://huggingface.co/SanudaDev/SinLlama-GGUF},
  note={Q4_K_M GGUF quantization of polyglots/SinLlama_v01}
}

Acknowledgments

Meta AI — for the Llama 3 base model
polyglots — for the original SinLlama Sinhala fine-tuning
llama.cpp — for the GGUF quantization toolchain
Ollama — for making local LLM deployment simple

Making Sinhala AI accessible to everyone — not just those with expensive hardware.

Downloads last month: 268

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for SanudaDev/SinLlama-GGUF

Base model

meta-llama/Meta-Llama-3-8B

Adapter

polyglots/SinLlama_v01

Quantized

(1)

this model