Model Loading
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft"
# Define 4-bit quantization config
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16" # Use "bfloat16" if your GPU supports it
)
# Load tokenizer and quantized model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quant_config,
device_map="auto"
)
model.eval()
---
- Downloads last month
- 17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for iCIIT/mmlu-fine-tuned-model-ris-sinhala-qwen2.5-1.5b-ft
Base model
Qwen/Qwen2.5-1.5B