🌏 Mission Statement

Our mission is to preserve, elevate, and promote Philippine native languages through the power of Artificial Intelligence. By creating and open-sourcing AI models for underrepresented local languages, we aim to bridge the digital language divide, ensure our cultural heritage thrives in the digital age, and empower local communities.

Please support this mission and our continuous work by contributing to the PLTAT (Philippine Language Translation and AI Training) Fund.

Llama-3.1-8B-Ilocano-Alpaca (GGUF)

This repository contains GGUF format model files for Llama-3.1-8B-Ilocano, a fine-tuned version of Meta's LLaMA 3.1 (8 Billion parameters) designed to understand and generate text in Ilocano (Iloko), an Austronesian language primarily spoken in Northern Luzon, specifically in the Ilocos Region, Cagayan Valley, and the Cordillera Administrative Region in the Philippines.

Note on Training: This model was fine-tuned using the Alpaca dataset format. For the best responses, you must use the Alpaca prompt template when interacting with it.

These GGUF files are highly optimized for inference on consumer hardware (both CPU and GPU) using tools like llama.cpp, LM Studio, Ollama, and text-generation-webui.

🦙 Run with Ollama

You can easily run this model locally using Ollama. Ensure you have Ollama installed, and then run the following command in your terminal to chat with the model:

ollama run welyjesch/ilocano-llama-3

Ollama Hub Link:ollama.com/welyjesch/ilocano-llama-3

Prompt Template (Alpaca Format)

Because this model was trained on an Alpaca-formatted dataset, you should structure your prompts like this.

Without Input:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
[Your Ilocano instruction here]

### Response:

With Context/Input:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
[Your Ilocano instruction here]

### Input:
[Optional context here]

### Response:

🚀 How to use for Inference in Google Colab

You can easily use the lora version of this through Unsloth, which is faster and easier, because I've provided a python notebook you can open in Colab: Ilocano Llama 3.1 Inference Notebook

You can easily run this GGUF model in the free tier of Google Colab using llama-cpp-python with hardware acceleration.

Open a new Google Colab notebook and set the runtime to T4 GPU.
Run the following code block:

# 1. Install llama-cpp-python with CUDA (GPU) support
!CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
!pip install huggingface_hub

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# 2. Download the GGUF model from Hugging Face
repo_id = "welyjesch/ilocano_llama_3.1_FT_8B_GGUF" 
filename = "llama-3.1-8b-ilocano-alpaca-q4_k_m.gguf" 

model_path = hf_hub_download(repo_id=repo_id, filename=filename)

# 3. Load the model
llm = Llama(
    model_path=model_path,
    n_gpu_layers=-1, # Offloads all layers to the GPU
    n_ctx=2048,      # Context window size
    verbose=False
)

# 4. Run Inference using the Alpaca Prompt Format
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Kumusta ka ita nga aldaw? Adda kadi naimbag a damagmo?

### Response:
"""

output = llm(
    prompt,
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
    stop=["<|end_of_text|>", "<|eot_id|>"] # Standard Llama 3 end tokens
)

print(output["choices"][0]["text"])

🖥️ Fine-Tuning with the New Unsloth Studio (No-Code/UI)

Unsloth recently introduced Unsloth Studio, a powerful graphical interface that makes fine-tuning incredibly easy and fast without writing code. Here is how you can further fine-tune this model using the Studio:

Download & Install: Get Unsloth Studio for your local machine or access it via their cloud environment.
Start a New Project: Open the Studio and create a new project. Select meta-llama/Meta-Llama-3.1-8B as your base model.
Upload Dataset: Upload your Ilocano Alpaca-formatted dataset (in CSV or JSON/JSONL format). Map your columns (Instruction, Input, Output) directly in the UI.
Configure Settings: Unsloth Studio will automatically apply memory-saving optimizations (like QLoRA and 4-bit quantization). You can tweak parameters like batch size, learning rate, and epochs if needed.
Train & Export: Click Start Training. Once the fine-tuning completes, you can use the Studio's built-in export feature to save your new model directly to GGUF format with just a click!

🛠️ How to do Further Fine-Tuning in Google Colab (Code)

Note: You cannot easily fine-tune a .gguf file directly. To do further code-based fine-tuning, you should use the original unquantized model weights (Safetensors) using Unsloth, which allows you to fine-tune LLaMA 3.1 8B on a free Colab T4 GPU.

Open a new Google Colab notebook and select T4 GPU.
Install Unsloth and dependencies:

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

Run the fine-tuning script:

from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

max_seq_length = 2048
dtype = None # Auto detection
load_in_4bit = True # 4bit quantization to save memory

# 1. Load the BASE model (Not the GGUF, but the Safetensors repo)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "welyjesch/ilocano_llama_3.1_finetuned_lora", 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

# 2. Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules =["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

# 3. Format dataset to Alpaca
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts =[]
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

# Load and format your new Ilocano dataset
dataset = load_dataset("json", data_files="your_new_ilocano_alpaca_data.json", split="train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

# 4. Setup Trainer
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # Increase this for actual training
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

# 5. Start Training
trainer_stats = trainer.train()

# 6. Save the new model and export back to GGUF
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
# You can then upload the resulting GGUF file back to Hugging Face

Limitations and Bias

While this model has been fine-tuned to understand and generate Ilocano, it is still subject to the limitations of its base model (LLaMA 3.1).

Hallucinations: As a low-resource language, the model might occasionally mix Ilocano with Tagalog, Pangasinan, or English (code-switching), or hallucinate facts.
Cultural Nuance: The model may not fully capture the deep cultural nuances or specific regional dialect variations (e.g., Ilocos Norte vs. Ilocos Sur vs. La Union vs. Cordillera variations).
Users are advised to verify critical information generated by the model.

ilocano_llama_3.1_FT_8B_GGUF : GGUF

This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

For text only LLMs: llama-cli -hf welyjesch/ilocano_llama_3.1_FT_8B_GGUF --jinja
For multimodal models: llama-mtmd-cli -hf welyjesch/ilocano_llama_3.1_FT_8B_GGUF --jinja

Available Model files:

llama-3.1-8b.Q8_0.gguf This was trained 2x faster with Unsloth

Downloads last month: 33

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PLTAT/ilocano_llama_3.1_FT_8B_GGUF

Base model

meta-llama/Llama-3.1-8B

Quantized

(320)

this model