Intelligent Legal Document Analysis Classifier (Longformer)
This model is a fine-tuned version of the Legal-Longformer-Base-4096, developed at the Pune Institute of Computer Technology (PICT) as part of the research paper: "Intelligent Legal Document Analysis using NLP".
The framework leverages NLP and Information Retrieval (IR) to classify unstructured documents into three primary domains: Criminal Law, Contract Law, and Education Law.
Model Details
Model Description
Traditional NLP often struggles with the complexity of legal discourse and domain-specific jargon. This model addresses these challenges by focusing on a tractable subset of twelve small-scale laws. It utilizes the Vector Space Model (VSM) for clause-level representation and the Longformer's sliding window attention to process documents spanning thousands of tokens.
- Developed by: Tanishq Shinde, Nilakshi Sonawane, Sarang Joshi, Mansi Jangle, and Vaishnavi Madavi
- Institution: Pune Institute of Computer Technology (PICT), Pune, India
- Model type: Transformer-based Sequence Classifier
- Finetuned from model: Saibo-creator/legal-longformer-base-4096
Model Sources
- Research Project: Intelligent Legal Document Analysis using NLP (Paper ID 685)
- Repository: Hugging Face Model Hub
Uses
Direct Use
The model is optimized to categorize legal clauses into the following twelve sub-domains:
- Criminal Law: Traffic signal violations, drunk driving penalties, petty theft, and noise pollution laws.
- Contract Law: House rent agreements, lease termination, small loan disputes, and consumer redressal rules.
- Education Law: Teacher appointments, service statutes, wage rules, and leave regulations.
Engineering-Inspired Features
The classifier is designed to support several high-level analytical components described in the paper:
- Knapsack-based Term Selection: Selecting the most informative terms to maximize relevance within a "scope budget".
- Fuzzy Word Identification: Flagging ambiguous expressions (e.g., "reasonable time") to highlight legal uncertainties for human review.
- Finite State Machines (FSM): Modeling legal procedural flows, such as the transition from a "contract active" state to a "penalty imposed" state.
Training Details
Training Data
The training utilized a balanced dataset of 30,000 legal rows (10,000 per primary domain).
- Preprocessing: Text normalization, tokenization, and retention of critical legal abbreviations and Latin expressions.
- Representation: Documents were segmented into individual clauses, treating each as a mathematical point in high-dimensional space.
Training Procedure
- Hardware: Single NVIDIA P100 GPU (Kaggle).
- Precision: FP16 Mixed Precision for accelerated computation.
- Epochs: 1.0.
- Effective Batch Size: 32 (Batch Size 16 with Gradient Accumulation steps: 2).
- Final Training Loss: 0.0842.
Evaluation
Results
The model achieved a Global Training Loss of 0.084, demonstrating high accuracy in differentiating between the structured yet dense terminologies of Criminal, Contract, and Education law.
How to Get Started with the Model
from transformers import pipeline
# Load the fine-tuned legal classifier
classifier = pipeline("text-classification", model="Tanishq77/legal-classifier-v1")
# Test on a Contract Law clause
text = "The lessee shall be responsible for all utility payments during the lease term."
result = classifier(text)
print(f"Domain: {result[0]['label']} | Confidence: {result[0]['score']:.4f}")
Ethical Considerations
As outlined in the research methodology, this model is intended for educational and analytical purposes only. It is not a substitute for professional legal advice. The framework assumes human oversight at every stage, with legal experts expected to validate outputs to prevent misinterpretation and misuse.
Citation
If you use this model or refer to the intelligent legal document analysis framework in your research, please cite it as follows:
BibTeX
@article{shinde2026intelligent,
title={Intelligent Legal Document Analysis using NLP},
author={Shinde, Tanishq and Sonawane, Nilakshi and Joshi, Sarang and Jangle, Mansi and Madavi, Vaishnavi},
journal={Dept. of Computer Engineering, Pune Institute of Computer Technology},
year={2026}
}
- Downloads last month
- 11
Model tree for Tanishq77/legal-classifier-v1
Base model
Saibo-creator/legal-longformer-base-4096