SinLlama GGUF β€” Sinhala Language Model for Low-End Hardware

Sinhala Llama 3 GGUF Q4_K_M Size


Why This Model Exists

The original polyglots/SinLlama_v01 is a powerful Sinhala language model built on Meta-Llama-3-8B. However, it comes with a critical barrier:

The Problem with the Original Model

Requirement Original SinLlama This GGUF Version
Model Size ~16 GB (FP16) ~4.65 GB
RAM Required 16–32 GB+ 6–8 GB
GPU Required Yes (CUDA-capable, 16GB+ VRAM) No (runs on CPU)
Software Stack PyTorch, Transformers, CUDA Ollama / llama.cpp
Setup Complexity High (Python environment, CUDA drivers, etc.) Minimal
Suitable for Low-End PCs ❌ No βœ… Yes

The original HuggingFace model demands expensive GPU hardware and a complex Python/CUDA environment. Most Sri Lankan developers and students don't have access to high-end machines with 16GB+ VRAM GPUs to even load the model. This creates an accessibility gap β€” the people who need Sinhala AI the most are locked out from using it.

How This GGUF Model Solves It

This is a Q4_K_M quantized GGUF version of SinLlama that:

  • Runs on ordinary laptops and desktops β€” no GPU required
  • Reduced from ~16 GB to ~4.65 GB β€” fits in modest RAM
  • Works with Ollama β€” one-command setup, no Python knowledge needed
  • Works with llama.cpp β€” lightweight, cross-platform inference
  • Preserves Sinhala language quality β€” Q4_K_M is the sweet spot between size and accuracy
  • Extended Sinhala tokenizer β€” optimized 139K vocabulary for better Sinhala text handling

Model Details

Property Value
Base Model Meta-Llama-3-8B
Fine-tuned Model polyglots/SinLlama_v01
Quantization Q4_K_M (4-bit, mixed precision)
Format GGUF (llama.cpp compatible)
File Size 4.65 GB
Context Length 2048 tokens
Vocabulary Size ~139,000 tokens (extended Sinhala)
Languages Sinhala (ΰ·ƒΰ·’ΰΆ‚ΰ·„ΰΆ½), English

Quick Start

Option 1: Using Ollama (Recommended β€” Easiest)

  1. Install Ollama β†’ https://ollama.com/download

  2. Create a Modelfile:

FROM sinllama-q4_k_m.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048

SYSTEM You are SinLlama, a helpful AI assistant that can communicate in Sinhala (ΰ·ƒΰ·’ΰΆ‚ΰ·„ΰΆ½). You are based on Meta-Llama-3-8B and have been specially trained on Sinhala language data.
  1. Create and run:
ollama create sinllama -f Modelfile
ollama run sinllama
  1. Start chatting in Sinhala:
>>> හෙࢽෝ, ΰΆ”ΰΆΆΰΆ§ ΰΆšΰ·™ΰ·ƒΰ·šΰΆ―?

Option 2: Using llama.cpp

# Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Run the model
./main -m sinllama-q4_k_m.gguf \
  -p "ΰ·ΰ·Šβ€ΰΆ»ΰ·“ ΰΆ½ΰΆ‚ΰΆšΰ·ΰ·€ ΰΆ΄ΰ·’ΰ·…ΰ·’ΰΆΆΰΆ³ ΰΆšΰ·’ΰΆΊΰΆ±ΰ·ŠΰΆ±" \
  -n 256 \
  --temp 0.7

Option 3: Using Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="sinllama-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=4,  # Adjust to your CPU cores
)

output = llm(
    "ΰ·ΰ·Šβ€ΰΆ»ΰ·“ ΰΆ½ΰΆ‚ΰΆšΰ·ΰ·€ ΰΆ΄ΰ·’ΰ·…ΰ·’ΰΆΆΰΆ³ ΰΆšΰ·’ΰΆΊΰΆ±ΰ·ŠΰΆ±",
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
)

print(output["choices"][0]["text"])

Intended Use

  • Sinhala language text generation β€” articles, creative writing, summaries
  • Conversational AI β€” chatbots and virtual assistants in Sinhala
  • Education β€” Sinhala language learning tools
  • Research β€” low-resource NLP research for Sinhala
  • Customer service β€” automated Sinhala-language support systems

Limitations

  • This is a quantized model; there may be minor quality loss compared to the full-precision original
  • Context window is limited to 2048 tokens
  • The model may occasionally generate incorrect or nonsensical Sinhala text
  • Not suitable for critical applications without human oversight
  • Inherits limitations and biases from the base Llama-3-8B model

Hardware Requirements

Minimum (CPU-only)

  • RAM: 6 GB available
  • Storage: 5 GB free disk space
  • CPU: Any modern x86_64 processor (Intel/AMD)
  • OS: Windows, macOS, or Linux

Recommended

  • RAM: 8 GB+ available
  • CPU: 4+ cores for faster inference
  • Storage: SSD for faster model loading

No GPU required. This is the entire point of this GGUF release.


Quantization Details

The model was quantized using llama.cpp with the Q4_K_M method:

  • Q4_K_M uses 4-bit quantization with medium-sized key-value cache
  • Provides the best balance between model size, inference speed, and output quality
  • Recommended by the llama.cpp community as the default quantization for most use cases
Quantization Size Quality Speed
FP16 (Original) ~16 GB β˜…β˜…β˜…β˜…β˜… Slow (needs GPU)
Q8_0 ~8.5 GB β˜…β˜…β˜…β˜…β˜† Moderate
Q4_K_M (This) ~4.65 GB β˜…β˜…β˜…β˜…β˜† Fast
Q4_0 ~4.3 GB β˜…β˜…β˜…β˜†β˜† Fastest

Copyright & License

Model License

This model is distributed under the Meta Llama 3 Community License. By downloading or using this model, you agree to the terms of the Meta Llama 3 Community License Agreement.

Quantization & Distribution

  • The GGUF quantization and this distribution were prepared by the repository maintainer
  • The original SinLlama fine-tuning was done by polyglots
  • All rights to the base architecture belong to Meta Platforms, Inc.

Usage Terms

  • βœ… Free for research and personal use
  • βœ… Free for commercial use (subject to Meta Llama 3 license terms)
  • βœ… Redistribution allowed with attribution
  • ❌ Do not use for generating harmful, misleading, or illegal content
  • ❌ Do not misrepresent the model's outputs as human-written content without disclosure

Attribution

If you use this model in your work, please cite:

@misc{sinllama-gguf,
  title={SinLlama GGUF - Quantized Sinhala Language Model},
  author={SanudaDev},
  year={2025},
  url={https://huggingface.co/SanudaDev/SinLlama-GGUF},
  note={Q4_K_M GGUF quantization of polyglots/SinLlama_v01}
}

Acknowledgments

  • Meta AI β€” for the Llama 3 base model
  • polyglots β€” for the original SinLlama Sinhala fine-tuning
  • llama.cpp β€” for the GGUF quantization toolchain
  • Ollama β€” for making local LLM deployment simple

Making Sinhala AI accessible to everyone β€” not just those with expensive hardware.

Downloads last month
268
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SanudaDev/SinLlama-GGUF

Quantized
(1)
this model