CivicLens — Llama-3.2-3B Nepali Legal Assistant

A QLoRA fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on a domain-specific Nepali legal Q&A dataset. The model is trained to answer questions about Nepal's laws, constitution, and governance documents accurately, cite sources, and respond in the same language as the question.

Model Details

Property	Value
Base model	meta-llama/Llama-3.2-3B-Instruct
Fine-tuning method	QLoRA (4-bit NF4 + PEFT)
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	q, k, v, o, gate, up, down proj
Training epochs	3
Learning rate	2e-4
Batch size	4 (grad accum 4, effective 16)
Max sequence length	512
Trainable parameters	~24M (≈1% of base model)

Dataset

Domain-specific Nepali legal Q&A pairs sourced from Nepal's constitution, acts, and governance documents. Dataset includes both Nepali and English language questions and answers with source citations.

Split	Samples
Train	~3,200
Validation	~400
Test	~430

Evaluation Results

Evaluated on 50 held-out test samples against the base model.

Metric	Base	Fine-tuned	Delta
ROUGE-L	0.1975	0.2913	+47.5%
BLEU (char bigram)	0.3827	0.4798	+25.4%
Semantic Similarity	0.5400	0.6823	+26.4%
LLM Judge (1–5)	1.720	2.600	+51.2%

LLM-as-Judge scoring was performed using llama-3.3-70b-versatile via Groq API on the same 50 samples.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id    = "Bibidh/civicLens-llama3.2-3b-nepali-legal"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base      = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="auto")
model     = PeftModel.from_pretrained(base, adapter_id)
model.eval()

SYSTEM_PROMPT = (
    "You are CivicLens, a legal assistant specialized in Nepal's laws, "
    "constitution, and governance documents. Answer questions accurately, "
    "cite your sources, and respond in the same language as the question. "
    "If you don't know, say so."
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": "What are the fundamental rights guaranteed by the Constitution of Nepal?"},
]

prompt  = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs  = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

Absolute LLM Judge score of 2.6/5 reflects the inherent difficulty of legal reasoning for a 3B parameter model
Performance on rare or complex legal provisions may be unreliable
Source citations are learned behavior and should be independently verified
Evaluated on 50 samples — results on the full test set may vary

Training Infrastructure

Trained on a single NVIDIA A100 GPU using Hugging Face transformers, peft, and bitsandbytes. Experiment tracking via Weights & Biases.

Downloads last month: 4

Model tree for Bibidh/civicLens-llama3.2-3b-nepali-legal

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(683)

this model