Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language

Overview

This repository contains the implementation and experiments for benchmarking various BERT-based transformer models on sentence-level topic classification in Nepali, a low-resource language.

We evaluate multilingual, Indic, Hindi, and Nepali-specific models to understand their effectiveness in capturing linguistic nuances of Nepali text.

Objectives

Benchmark multiple BERT-based models on Nepali text classification
Analyze performance differences across multilingual, Indic, and monolingual models
Establish a strong baseline for future Nepali NLP tasks
Provide insights into low-resource language modeling

Dataset

The dataset consists of 25,006 Nepali sentences categorized into five domains:

🌾 Agriculture
🏥 Health
🎓 Education & Technology
🏔️ Culture & Tourism
💬 General Communication

The dataset is balanced across all categories.

🔗 Dataset Link: https://huggingface.co/datasets/ilprl-docse/NepSen-Nepali-Categorical-Sentences-Corpus

Models Evaluated

We benchmarked the following transformer-based models:

Multilingual Models

mBERT
XLM-RoBERTa
mDeBERTa

Indic Models

MuRIL (base & large)
IndicBERT
DevBERT

Language-Specific Models

HindiBERT
NepBERTa

English Model

RoBERTa

🔗 Model Links: https://hf.co/collections/ilprl-docse/benchmarking-bert-based-models-for-topic-classification

Visit: https://github.com/ilprl/Benchmarking-BERT-based-Models-for-Sentence-level-Topic-Classification-in-Nepali-Language

Experimental Setup

Training samples: 20,005
Validation samples: 2,500
Test samples: 2,501
Epochs: 10
Learning rate: 2e-5
Batch size: 8
Gradient accumulation: 2
Max sequence length: 256

Framework: 🤗 Hugging Face Transformers

Evaluation Metrics

Accuracy
Precision (Weighted)
Recall (Weighted)
F1-score (Weighted)
AUROC

Key Results

MuRIL-large achieved the best performance:
- F1-score: 90.60%
NepBERTa showed strong competitive performance:
- F1-score: 88.26%
Indic models outperformed multilingual and general models overall

Key Insights

Region-specific (Indic) models perform better for Nepali
Monolingual pretraining (NepBERTa) is highly effective
General multilingual models are slightly less optimized for Nepali

Limitations

Limited to sentence-level classification
Dataset covers only five domains
No extensive error analysis performed
Results may not generalize to other NLP tasks

Future Work

Extend to document-level classification
Expand dataset with more domains
Perform detailed error analysis
Explore ensemble methods
Improve Nepali-specific pretraining

Citation

Paper Link: https://arxiv.org/abs/2602.23940

If you use this work, please cite:

@inproceedings{karki2026benchmarking,
  title={Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language},
  author={Karki, Nischal and Subedi, Bipesh and Poudyal, Prakash and Ghimire, Rupak Raj and Bal, Bal Krishna},
  booktitle={Proceedings of the Regional International Conference on Natural Language Processing (RegICON 2025)},
  year={2026},
  address={Guwahati, India},
  note={Gauhati University, November 27--29, 2025},
  url={https://arxiv.org/abs/2602.23940}
}

Downloads last month: 38

Safetensors

Model size

0.5B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ilprl-docse/SL-topic-classification-MuRIL-large-25k-304M

Benchmarking BERT-based Models for Topic Classification

Collection

This collection includes dataset and models for sentence level topic classification in Nepali language using BERT based models. • 6 items • Updated 23 days ago

Paper for ilprl-docse/SL-topic-classification-MuRIL-large-25k-304M

Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language

Paper • 2602.23940 • Published Feb 27