Model Loading

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft"

# Define 4-bit quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype="float16"  # Use "bfloat16" if your GPU supports it
)

# Load tokenizer and quantized model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map="auto"
)
model.eval()
---
Downloads last month
17
Safetensors
Model size
2B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft

Quantized
(65)
this model