MSME Legal Dispute Classifier (Longformer, 6-Class)

Model Overview

This model is a multi-class legal document classifier designed to categorize MSME-related dispute cases into six statutory dispute categories. It is fine-tuned from allenai/longformer-base-4096 and optimized for long-form legal documents up to 1200 tokens. The system is intended for automated dispute categorization, legal triage, and decision-support applications in MSME dispute resolution workflows.

Problem Statement

MSME dispute cases often involve lengthy legal narratives including:

  • Statement of claim
  • Buyer response
  • Case summary
  • Contractual and payment details

Manual classification is time-consuming and error-prone. This model automates dispute categorization into predefined legal classes.

Classification Labels

The model predicts one of the following six categories:

Label ID Category
0 Delayed payment (no dispute)
1 Quality dispute
2 No formal contract
3 Partial payment dispute
4 Government procurement delay
5 Service-related dispute

Label mapping is included in label_mapping.json.

Model Architecture

  • Base Model: Longformer
  • Checkpoint: allenai/longformer-base-4096
  • Max Sequence Length: 1200 tokens
  • Hidden Size: 768
  • Number of Layers: 12
  • Attention Type: Local attention (CLS token classification)
  • Classification Head: Linear layer (6 outputs)

Longformer was selected due to the long-document nature of legal dispute texts.

Dataset Information

  • Final Dataset Size (after cleaning): 2152 samples
  • Duplicates removed
  • Label conflicts resolved
  • Stratified 80โ€“20 train/test split
  • 5-fold stratified cross-validation

Class imbalance handled using weighted cross-entropy loss.

Training Configuration

  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 2
  • Gradient Accumulation Steps: 4
  • Effective Batch Size: 8
  • Epochs: 3
  • Warmup Steps: 200
  • Mixed Precision (FP16): Enabled
  • Loss Function: Weighted Cross Entropy

Evaluation Results (Held-Out Test Set)

Test Set Size: 431 samples

Metric Score
Accuracy 0.77
Macro Precision 0.76
Macro Recall 0.74
Macro F1 Score 0.75
Macro AUC-ROC (OvR) 0.948

These results indicate strong class separability and balanced performance across all categories.

Intended Use

This model is suitable for:

  • Automated legal dispute classification
  • MSME case triage systems
  • Online Dispute Resolution (ODR) platforms
  • Legal analytics systems
  • Case routing and prioritization tools

Limitations

  • Performance may degrade for documents significantly exceeding 1200 tokens.
  • Domain-specific to MSME dispute scenarios.
  • Not designed for general legal classification tasks.
  • Should not be used as a substitute for legal judgment.

Ethical Considerations

This model is intended as a decision-support tool. Human oversight is recommended for legal decision-making applications. It does not provide legal advice.

Usage Example

from transformers import LongformerForSequenceClassification, AutoTokenizer
import torch

model = LongformerForSequenceClassification.from_pretrained("YOUR_USERNAME/msme-legal-dispute-classifier-longformer")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/msme-legal-dispute-classifier-longformer")

text = "The buyer failed to release payment within the agreed 45-day period."

inputs = tokenizer(text, truncation=True, max_length=1200, return_tensors="pt")
outputs = model(**inputs)

predicted_class = torch.argmax(outputs.logits, dim=1)
print("Predicted Label:", predicted_class.item())
Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using abhinavdread/msme-legal-dispute-classifier-longformer 1