YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Bangla MixLoRA — Phase B (Instruction Tuned)
Model Description
First MoE-style instruction-tuned LLM for Bangla using MixLoRA parameter-efficient fine-tuning.
Training Details
- Base model: mistralai/Mistral-7B-v0.1
- Phase A: MixLoRA CPT on 175K Bangla samples (~82M tokens)
- Phase B: SFT on 80K Bangla instruction pairs
- Framework: MoE-PEFT v2.0.2
- Hardware: AMD MI300X VF (205.8GB VRAM)
- Precision: bf16
- Final loss: 0.1721201241016388
- Date: 2026-03-22
Architecture
- Base: Mistral-7B frozen
- Front layers (1-8): Single LoRA (rank 16)
- Middle layers (9-24): MixLoRA — 4 experts, top-2 routing
- Back layers (25-32): Single LoRA (rank 16)
- Trainable params: 120,586,240 (1.66%)
Benchmark Results
BanglaMMLU
- accuracy: 0.2969
- correct: 38
- total: 128
- skipped: 72
Indic-Squad-QA
- exact_match: 0.302
- correct: 29
- total: 96
Dataset
- Pretraining: sahilfarib/bangla-pretraining-corpus-clean (700K samples, ~2B tokens)
- SFT: sahilfarib/bangla-mixlora-real/bangla_sft_80k.jsonl (80K pairs)
Usage
from mixlora import MixLoraModelForCausalLM
from transformers import AutoTokenizer
import torch
model, config = MixLoraModelForCausalLM.from_pretrained(
"sahilfarib/bangla-mixlora-sft",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "বাংলাদেশের রাজধানী কোথায়?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@misc{bangla-mixlora-2026,
title={BanglaMixLoRA: Parameter-Efficient MoE Adaptation for Bangla},
author={Sahil Farib},
year={2026}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support