CivicLens β Llama-3.2-3B Nepali Legal Assistant
A QLoRA fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on a domain-specific Nepali legal Q&A dataset. The model is trained to answer questions about Nepal's laws, constitution, and governance documents accurately, cite sources, and respond in the same language as the question.
Model Details
| Property | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit NF4 + PEFT) |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q, k, v, o, gate, up, down proj |
| Training epochs | 3 |
| Learning rate | 2e-4 |
| Batch size | 4 (grad accum 4, effective 16) |
| Max sequence length | 512 |
| Trainable parameters | ~24M (β1% of base model) |
Dataset
Domain-specific Nepali legal Q&A pairs sourced from Nepal's constitution, acts, and governance documents. Dataset includes both Nepali and English language questions and answers with source citations.
| Split | Samples |
|---|---|
| Train | ~3,200 |
| Validation | ~400 |
| Test | ~430 |
Evaluation Results
Evaluated on 50 held-out test samples against the base model.
| Metric | Base | Fine-tuned | Delta |
|---|---|---|---|
| ROUGE-L | 0.1975 | 0.2913 | +47.5% |
| BLEU (char bigram) | 0.3827 | 0.4798 | +25.4% |
| Semantic Similarity | 0.5400 | 0.6823 | +26.4% |
| LLM Judge (1β5) | 1.720 | 2.600 | +51.2% |
LLM-as-Judge scoring was performed using llama-3.3-70b-versatile via Groq API on the same 50 samples.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "Bibidh/civicLens-llama3.2-3b-nepali-legal"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
SYSTEM_PROMPT = (
"You are CivicLens, a legal assistant specialized in Nepal's laws, "
"constitution, and governance documents. Answer questions accurately, "
"cite your sources, and respond in the same language as the question. "
"If you don't know, say so."
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "What are the fundamental rights guaranteed by the Constitution of Nepal?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- Absolute LLM Judge score of 2.6/5 reflects the inherent difficulty of legal reasoning for a 3B parameter model
- Performance on rare or complex legal provisions may be unreliable
- Source citations are learned behavior and should be independently verified
- Evaluated on 50 samples β results on the full test set may vary
Training Infrastructure
Trained on a single NVIDIA A100 GPU using Hugging Face transformers, peft, and bitsandbytes. Experiment tracking via Weights & Biases.
- Downloads last month
- 4
Model tree for Bibidh/civicLens-llama3.2-3b-nepali-legal
Base model
meta-llama/Llama-3.2-3B-Instruct