MSME Legal Dispute Classifier (Longformer, 6-Class)

Model Overview

This model is a multi-class legal document classifier designed to categorize MSME-related dispute cases into six statutory dispute categories. It is fine-tuned from allenai/longformer-base-4096 and optimized for long-form legal documents up to 1200 tokens. The system is intended for automated dispute categorization, legal triage, and decision-support applications in MSME dispute resolution workflows.

Problem Statement

MSME dispute cases often involve lengthy legal narratives including:

Statement of claim
Buyer response
Case summary
Contractual and payment details

Manual classification is time-consuming and error-prone. This model automates dispute categorization into predefined legal classes.

Classification Labels

The model predicts one of the following six categories:

Label ID	Category
0	Delayed payment (no dispute)
1	Quality dispute
2	No formal contract
3	Partial payment dispute
4	Government procurement delay
5	Service-related dispute

Label mapping is included in label_mapping.json.

Model Architecture

Base Model: Longformer
Checkpoint: allenai/longformer-base-4096
Max Sequence Length: 1200 tokens
Hidden Size: 768
Number of Layers: 12
Attention Type: Local attention (CLS token classification)
Classification Head: Linear layer (6 outputs)

Longformer was selected due to the long-document nature of legal dispute texts.

Dataset Information

Final Dataset Size (after cleaning): 2152 samples
Duplicates removed
Label conflicts resolved
Stratified 80–20 train/test split
5-fold stratified cross-validation

Class imbalance handled using weighted cross-entropy loss.

Training Configuration

Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 2
Gradient Accumulation Steps: 4
Effective Batch Size: 8
Epochs: 3
Warmup Steps: 200
Mixed Precision (FP16): Enabled
Loss Function: Weighted Cross Entropy

Evaluation Results (Held-Out Test Set)

Test Set Size: 431 samples

Metric	Score
Accuracy	0.77
Macro Precision	0.76
Macro Recall	0.74
Macro F1 Score	0.75
Macro AUC-ROC (OvR)	0.948

These results indicate strong class separability and balanced performance across all categories.

Intended Use

This model is suitable for:

Automated legal dispute classification
MSME case triage systems
Online Dispute Resolution (ODR) platforms
Legal analytics systems
Case routing and prioritization tools

Limitations

Performance may degrade for documents significantly exceeding 1200 tokens.
Domain-specific to MSME dispute scenarios.
Not designed for general legal classification tasks.
Should not be used as a substitute for legal judgment.

Ethical Considerations

This model is intended as a decision-support tool. Human oversight is recommended for legal decision-making applications. It does not provide legal advice.

Usage Example

from transformers import LongformerForSequenceClassification, AutoTokenizer
import torch

model = LongformerForSequenceClassification.from_pretrained("YOUR_USERNAME/msme-legal-dispute-classifier-longformer")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/msme-legal-dispute-classifier-longformer")

text = "The buyer failed to release payment within the agreed 45-day period."

inputs = tokenizer(text, truncation=True, max_length=1200, return_tensors="pt")
outputs = model(**inputs)

predicted_class = torch.argmax(outputs.logits, dim=1)
print("Predicted Label:", predicted_class.item())

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

F32

abhinavdread
/

msme-legal-dispute-classifier-longformer