LegalBERT Β· Contract Clause Classifier (LoRA)
A LoRA adapter fine-tuned on top of
nlpaueb/legal-bert-base-uncased
for multi-class contract clause classification across all 41 CUAD clause types.
The model significantly outperforms the untrained baseline (accuracy 3.28% β 71.46%, macro F1 0.005 β 0.502, weighted F1 0.008 β 0.677) after 5 epochs of LoRA fine-tuning.
Model Details
| Property | Value |
|---|---|
| Base model | nlpaueb/legal-bert-base-uncased |
| Adapter type | LoRA (PEFT) |
| Task | Multi-class sequence classification |
| Classes | 41 CUAD clause types |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.1 |
| Target modules | query, value |
| Max sequence length | 256 tokens |
| Epochs | 5 |
| Learning rate | 2e-4 |
| Batch size | 16 |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Optimizer | AdamW (default HF Trainer) |
| Hardware | Kaggle GPU (T4) |
| PEFT version | 0.18.1 |
Training
The adapter was trained for 5 epochs on the CUAD dataset, which contains expert-labelled contract clauses across 41 legal categories. The dataset was split 80/20 (train/test) with stratification across all 41 labels.
- Train size: ~7,930 examples
- Test size: 1,983 examples
- Split strategy: Stratified random split (random_state=42)
Training Curve
| Epoch | Train Loss | Val Loss | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|---|---|
| 1 | 5.992 | 4.285 | 43.22% | 0.316 | 0.158 |
| 2 | 2.881 | 2.485 | 65.81% | 0.601 | 0.382 |
| 3 | 2.203 | 2.124 | 69.79% | 0.651 | 0.448 |
| 4 | 1.958 | 2.005 | 71.05% | 0.668 | 0.488 |
| 5 | 1.852 | 1.944 | 71.46% | 0.677 | 0.502 |
Baseline Comparison
| Metric | Baseline (untrained) | Fine-Tuned (this model) |
|---|---|---|
| Accuracy | 3.28% | 71.46% |
| Weighted F1 | 0.0082 | 0.6771 |
| Macro F1 | 0.0053 | 0.5016 |
The baseline was evaluated by running the untrained
nlpaueb/legal-bert-base-uncasedmodel directly on the test set without any fine-tuning. The near-random performance (3.28%) confirms the base model has no prior knowledge of CUAD clause types.
General Benchmark β Catastrophic Forgetting Check
To verify the model did not lose general language understanding after fine-tuning, it was evaluated on a 100-sample subset of the MMLU Abstract Algebra benchmark:
| Metric | Base Model | Fine-Tuned |
|---|---|---|
| MMLU Abstract Algebra Accuracy | 19.00% | 24.00% |
No catastrophic forgetting detected β the fine-tuned model improved by 5% on the general reasoning benchmark compared to the base model, confirming that domain-specific fine-tuning did not degrade general language ability.
Evaluation Results (Per-Class)
Selected high-performing classes from the classification report:
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Non-Disparagement (13) | 1.00 | 1.00 | 1.00 | 89 |
| Termination for Convenience (14) | 0.96 | 0.95 | 0.96 | 109 |
| Expiration Date (4) | 0.92 | 0.97 | 0.95 | 127 |
| Irrevocable or Perpetual License (29) | 0.72 | 0.79 | 0.75 | 89 |
| Audit Rights (32) | 0.84 | 0.93 | 0.88 | 82 |
| Effective Date (3) | 0.88 | 0.90 | 0.89 | 125 |
| Renewal Term (5) | 0.72 | 0.97 | 0.83 | 133 |
| Insurance (37)* | 0.00 | 0.00 | 0.00 | 33 |
* Some rare classes (e.g. Insurance label index 37, classes 0, 1, 2) have very few training examples and score near zero β see Limitations section below.
Example Inference Results
Real predictions from the fine-tuned model on unseen clauses:
| Clause | Predicted Type | Confidence |
|---|---|---|
| "Either party may terminate this Agreement upon 30 days written notice." | Termination for Convenience | 79.50% |
| "Licensee shall not transfer or sublicense any rights granted herein." | Anti-Assignment | 61.04% |
| "This Agreement shall be governed by the laws of California." | Governing Law | 96.87% |
| "The Company shall maintain insurance coverage of at least $1,000,000." | Insurance | 97.44% |
| "Neither party shall disclose confidential information to third parties." | Anti-Assignment | 41.98% |
Usage
This is a PEFT LoRA adapter β load it on top of the base model
using the peft library.
Installation
pip install transformers peft scikit-learn
Inference
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
base_model_id = "nlpaueb/legal-bert-base-uncased"
adapter_id = "Mokshith31/legalbert-contract-clause-classification"
# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForSequenceClassification.from_pretrained(
base_model_id,
num_labels=41
)
# Load LoRA adapter on top
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
# Label mapping (ID β clause type name)
id2label = {
0: "Document Name", 1: "Parties", 2: "Agreement Date",
3: "Effective Date", 4: "Expiration Date", 5: "Renewal Term",
6: "Notice Period to Terminate Renewal", 7: "Governing Law",
8: "Most Favored Nation", 9: "Non-Compete", 10: "Exclusivity",
11: "No-Solicit of Customers", 12: "No-Solicit of Employees",
13: "Non-Disparagement", 14: "Termination for Convenience",
15: "ROFR / ROFO / ROFN", 16: "Change of Control",
17: "Anti-Assignment", 18: "Revenue / Profit Sharing",
19: "Price Restriction", 20: "Minimum Commitment",
21: "Volume Restriction", 22: "IP Ownership Assignment",
23: "Joint IP Ownership", 24: "License Grant",
25: "Non-Transferable License", 26: "Affiliate License-Licensor",
27: "Affiliate License-Licensee",
28: "Unlimited / All-You-Can-Eat License",
29: "Irrevocable or Perpetual License", 30: "Source Code Escrow",
31: "Post-Termination Services", 32: "Audit Rights",
33: "Uncapped Liability", 34: "Cap on Liability",
35: "Liquidated Damages", 36: "Warranty Duration",
37: "Insurance", 38: "Covenant Not to Sue",
39: "Third Party Beneficiary", 40: "Other"
}
# Run inference
clause = "This Agreement shall be governed by the laws of California."
inputs = tokenizer(
clause,
return_tensors="pt",
truncation=True,
max_length=256
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred_id = outputs.logits.argmax(dim=-1).item()
confidence = probs.max().item()
print(f"Predicted clause type: {id2label[pred_id]}")
print(f"Confidence: {confidence:.2%}")
With Merged Weights (pipeline API)
import torch
from peft import PeftModel
from transformers import (AutoModelForSequenceClassification,
AutoTokenizer, pipeline)
base = AutoModelForSequenceClassification.from_pretrained(
"nlpaueb/legal-bert-base-uncased", num_labels=41
)
model = PeftModel.from_pretrained(
base,
"Mokshith31/legalbert-contract-clause-classification"
)
model = model.merge_and_unload() # fuse LoRA weights into base
tokenizer = AutoTokenizer.from_pretrained(
"nlpaueb/legal-bert-base-uncased"
)
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer
)
result = classifier(
"Either party may terminate upon 30 days written notice.",
truncation=True,
max_length=256
)
print(result)
CUAD Label Schema
The model predicts one of the following 41 clause categories:
| ID | Clause Type |
|---|---|
| 0 | Document Name |
| 1 | Parties |
| 2 | Agreement Date |
| 3 | Effective Date |
| 4 | Expiration Date |
| 5 | Renewal Term |
| 6 | Notice Period to Terminate Renewal |
| 7 | Governing Law |
| 8 | Most Favored Nation |
| 9 | Non-Compete |
| 10 | Exclusivity |
| 11 | No-Solicit of Customers |
| 12 | No-Solicit of Employees |
| 13 | Non-Disparagement |
| 14 | Termination for Convenience |
| 15 | ROFR / ROFO / ROFN |
| 16 | Change of Control |
| 17 | Anti-Assignment |
| 18 | Revenue / Profit Sharing |
| 19 | Price Restriction |
| 20 | Minimum Commitment |
| 21 | Volume Restriction |
| 22 | IP Ownership Assignment |
| 23 | Joint IP Ownership |
| 24 | License Grant |
| 25 | Non-Transferable License |
| 26 | Affiliate License-Licensor |
| 27 | Affiliate License-Licensee |
| 28 | Unlimited / All-You-Can-Eat License |
| 29 | Irrevocable or Perpetual License |
| 30 | Source Code Escrow |
| 31 | Post-Termination Services |
| 32 | Audit Rights |
| 33 | Uncapped Liability |
| 34 | Cap on Liability |
| 35 | Liquidated Damages |
| 36 | Warranty Duration |
| 37 | Insurance |
| 38 | Covenant Not to Sue |
| 39 | Third Party Beneficiary |
| 40 | Other |
Limitations and Bias
- Trained exclusively on English-language commercial contracts from the CUAD dataset. Performance may degrade on other legal domains (e.g. employment, real estate) or non-US contract styles.
- Some CUAD classes have very few training examples (e.g. class 2 β Agreement Date β has only 1 support sample), which leads to near-zero per-class performance on rare clause types. Classes 0, 1, 2, 7, 9, 21, 22, 27, 37, 38 scored F1 = 0.00 due to insufficient training data.
- Class imbalance in the CUAD dataset means the model favours more common clause types (e.g. Renewal Term, Effective Date).
- The model is not a substitute for legal advice. Predictions should be reviewed by qualified legal professionals before use in any legal workflow.
- Max sequence length is 256 tokens β longer clauses will be truncated and may lose important context.
Citation
If you use this model, please cite the original CUAD dataset:
@article{hendrycks2021cuad,
title={CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review},
author={Hendrycks, Dan and Burns, Collin and Chen, Anya and Ball, Spencer},
journal={arXiv preprint arXiv:2103.06268},
year={2021}
}
And the LegalBERT base model:
@inproceedings{chalkidis-etal-2020-legal,
title={LEGAL-BERT: The Muppets straight out of Law School},
author={Chalkidis, Ilias and Fergadiotis, Manos and Malakasiotis,
Prodromos and Aletras, Nikolaos and Androutsopoulos, Ion},
booktitle={Findings of EMNLP},
year={2020}
}
Experiment Tracking
Training was tracked using Weights & Biases:
π W&B Project β contract-intelligence
Framework Versions
| Library | Version |
|---|---|
| Transformers | latest |
| PEFT | 0.18.1 |
| PyTorch | latest |
| Datasets | latest |
| scikit-learn | latest |
| Accelerate | latest |
- Downloads last month
- 26