Spaces:
Sleeping
Sleeping
Merge branch 'main' of https://huggingface.co/spaces/gaurv007/ClauseGuard
Browse files- README.md +99 -0
- app.py +962 -137
- compare.py +229 -0
- compliance.py +245 -0
- obligations.py +190 -0
- requirements.txt +11 -4
README.md
CHANGED
|
@@ -9,3 +9,102 @@ python_version: "3.12"
|
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
---
|
| 12 |
+
|
| 13 |
+
# π‘οΈ ClauseGuard β World's Best Open-Source Legal Contract Analysis
|
| 14 |
+
|
| 15 |
+
**ClauseGuard** is the most comprehensive open-source AI-powered legal contract analysis tool. It analyzes contracts using state-of-the-art legal NLP models and provides actionable risk assessments.
|
| 16 |
+
|
| 17 |
+
## β¨ Core Features
|
| 18 |
+
|
| 19 |
+
### Analysis Engine
|
| 20 |
+
| Feature | Description |
|
| 21 |
+
|---------|-------------|
|
| 22 |
+
| **41 CUAD Clause Categories** | Full taxonomy: Document Name, Parties, Governing Law, Indemnification, Termination, Non-Compete, IP Ownership, Audit Rights, Force Majeure, and more |
|
| 23 |
+
| **4-Tier Risk Scoring** | Critical π΄ / High π / Medium π‘ / Low π’ with visual risk matrix |
|
| 24 |
+
| **Legal NER** | Extracts parties, dates, monetary values ($), jurisdictions, defined terms, and party roles |
|
| 25 |
+
| **NLI Contradiction Detection** | Identifies conflicting clauses (e.g., uncapped + capped liability) and missing critical provisions |
|
| 26 |
+
| **Obligation Tracker** | Categorizes action items: monetary π°, compliance βοΈ, reporting π, delivery π¦, termination π |
|
| 27 |
+
| **Compliance Checker** | Validates against GDPR, CCPA, SOX, HIPAA, and FINRA requirements |
|
| 28 |
+
| **Contract Comparison** | Side-by-side diff between two contracts with alignment scoring |
|
| 29 |
+
|
| 30 |
+
### Document Support
|
| 31 |
+
- **PDF** parsing via `pdfplumber`
|
| 32 |
+
- **DOCX/DOC** parsing via `python-docx`
|
| 33 |
+
- **TXT / Markdown** direct text input
|
| 34 |
+
|
| 35 |
+
### UI/UX
|
| 36 |
+
- **3-Panel Professional Layout** β Upload sidebar + Main analysis + Summary dashboard
|
| 37 |
+
- **Document Viewer** β Inline entity highlights (colored annotations)
|
| 38 |
+
- **Clause Cards** β Expandable risk-badged cards with confidence scores
|
| 39 |
+
- **Export Reports** β JSON (structured) and CSV (tabular) downloads
|
| 40 |
+
- **Color-Coded Risk Badges** β Instant visual triage
|
| 41 |
+
|
| 42 |
+
## π§ Models & Architecture
|
| 43 |
+
|
| 44 |
+
| Component | Technology |
|
| 45 |
+
|-----------|------------|
|
| 46 |
+
| Clause Classification | `Mokshith31/legalbert-contract-clause-classification` β LoRA adapter on `nlpaueb/legal-bert-base-uncased`, fine-tuned on CUAD 41-class taxonomy |
|
| 47 |
+
| NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
|
| 48 |
+
| NLI | Heuristic contradiction detection with 5 conflict patterns + missing-clause detection |
|
| 49 |
+
| Compliance | Regulatory keyword matching across GDPR, CCPA, SOX, HIPAA, FINRA |
|
| 50 |
+
| Comparison | SequenceMatcher-based clause alignment with risk delta analysis |
|
| 51 |
+
| Obligations | Regex pattern matching across 5 obligation categories |
|
| 52 |
+
|
| 53 |
+
## π Risk Scoring Methodology
|
| 54 |
+
|
| 55 |
+
Risk scores combine clause detection with weighted severity:
|
| 56 |
+
- **CRITICAL**: 40 pts (Uncapped Liability, Arbitration, IP Assignment, etc.)
|
| 57 |
+
- **HIGH**: 20 pts (Non-Compete, Exclusivity, Unilateral Change, etc.)
|
| 58 |
+
- **MEDIUM**: 10 pts (Governing Law, Jurisdiction, etc.)
|
| 59 |
+
- **LOW**: 3 pts (Document Name, Dates, etc.)
|
| 60 |
+
|
| 61 |
+
Final score normalized to 0-100 with letter grades:
|
| 62 |
+
- A (0-14): Low risk
|
| 63 |
+
- B (15-29): Moderate risk
|
| 64 |
+
- C (30-49): Elevated risk
|
| 65 |
+
- D (50-69): High risk
|
| 66 |
+
- F (70+): Critical risk
|
| 67 |
+
|
| 68 |
+
## π Datasets & Research
|
| 69 |
+
|
| 70 |
+
- [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β 510 contracts, 13K annotations, 41 clause categories
|
| 71 |
+
- [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β 322 legal reasoning tasks
|
| 72 |
+
- [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β Unfair Terms of Service classification
|
| 73 |
+
- Paper: [CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review](https://arxiv.org/abs/2103.06268) (Hendrycks et al., 2021)
|
| 74 |
+
|
| 75 |
+
## π Usage
|
| 76 |
+
|
| 77 |
+
1. **Upload** a contract (PDF, DOCX, or TXT) or paste text directly
|
| 78 |
+
2. Click **Analyze Contract**
|
| 79 |
+
3. View results across tabs:
|
| 80 |
+
- **Document**: Full text with inline entity highlights
|
| 81 |
+
- **Clauses**: Detected clauses with risk badges
|
| 82 |
+
- **Entities**: Extracted parties, dates, money, jurisdictions
|
| 83 |
+
- **Contradictions**: Conflicting clauses and missing provisions
|
| 84 |
+
- **Obligations**: Action items categorized by type
|
| 85 |
+
- **Compliance**: Regulatory framework checks
|
| 86 |
+
4. **Export** JSON/CSV reports
|
| 87 |
+
|
| 88 |
+
## π Compare Contracts
|
| 89 |
+
|
| 90 |
+
Switch to the **Compare Contracts** tab to:
|
| 91 |
+
- Upload or paste two contracts side-by-side
|
| 92 |
+
- See clause-level diffs (added, removed, modified)
|
| 93 |
+
- Get an alignment score and risk delta
|
| 94 |
+
- View raw JSON comparison data
|
| 95 |
+
|
| 96 |
+
## β οΈ Disclaimer
|
| 97 |
+
|
| 98 |
+
*Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions. The tool may miss nuances and should be used as a preliminary screening aid, not a substitute for professional legal review.*
|
| 99 |
+
|
| 100 |
+
## π Links
|
| 101 |
+
|
| 102 |
+
- [ClauseGuard Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
|
| 103 |
+
- [Clause Classifier Model](https://huggingface.co/Mokshith31/legalbert-contract-clause-classification)
|
| 104 |
+
- [Legal-BERT Base](https://huggingface.co/nlpaueb/legal-bert-base-uncased)
|
| 105 |
+
- [CUAD Dataset](https://huggingface.co/datasets/theatticusproject/cuad-qa)
|
| 106 |
+
- [CUAD Paper (arXiv:2103.06268)](https://arxiv.org/abs/2103.06268)
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
*Built with β€οΈ using Gradio, Hugging Face Transformers, and Legal-BERT. Open source and free for all.*
|
app.py
CHANGED
|
@@ -1,37 +1,327 @@
|
|
| 1 |
"""
|
| 2 |
-
ClauseGuard β
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
"""
|
| 5 |
|
| 6 |
-
import
|
| 7 |
import re
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
import numpy as np
|
| 9 |
|
| 10 |
-
# ββ
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
|
|
|
| 14 |
try:
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
}
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
"Limitation of liability": [r"not liable", r"shall not be (liable|responsible)", r"in no event.*liable", r"limitation of liability", r"without warranty", r"disclaim"],
|
| 36 |
"Unilateral termination": [r"terminat.*at any time", r"suspend.*account.*without", r"we may (terminat|suspend|discontinu)", r"right to (terminat|suspend)"],
|
| 37 |
"Unilateral change": [r"sole discretion", r"reserves? the right to (modify|change|update|amend)", r"at any time.*without (prior )?notice", r"we may (modify|change|update)"],
|
|
@@ -40,115 +330,454 @@ PATTERNS = {
|
|
| 40 |
"Choice of law": [r"governed by.*laws? of", r"shall be governed", r"laws of the state of"],
|
| 41 |
"Jurisdiction": [r"exclusive jurisdiction", r"courts? of.*(california|delaware|new york|ireland|england)", r"submit to.*jurisdiction"],
|
| 42 |
"Arbitration": [r"arbitrat", r"binding arbitration", r"waive.*right.*court", r"class action waiver"],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
}
|
| 44 |
|
| 45 |
-
def
|
| 46 |
-
"""Classify using the trained Legal-BERT model."""
|
| 47 |
-
if not ml_pipeline:
|
| 48 |
-
return classify_regex(text)
|
| 49 |
-
try:
|
| 50 |
-
preds = ml_pipeline(text, truncation=True, max_length=512)
|
| 51 |
-
results = []
|
| 52 |
-
for p in preds[0] if isinstance(preds[0], list) else preds:
|
| 53 |
-
if p["score"] > 0.5 and p["label"] in LABELS:
|
| 54 |
-
sev, desc = LABELS[p["label"]]
|
| 55 |
-
results.append({"name": p["label"], "severity": sev, "desc": desc, "confidence": round(p["score"], 2)})
|
| 56 |
-
return results
|
| 57 |
-
except Exception:
|
| 58 |
-
return classify_regex(text)
|
| 59 |
-
|
| 60 |
-
def classify_regex(text):
|
| 61 |
-
"""Fallback regex classifier."""
|
| 62 |
-
results = []
|
| 63 |
text_lower = text.lower()
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
break
|
| 70 |
return results
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
|
|
|
|
|
|
|
|
|
|
| 81 |
clauses = split_clauses(text)
|
| 82 |
if not clauses:
|
| 83 |
-
return
|
| 84 |
-
|
| 85 |
-
flagged = []
|
| 86 |
-
sev_counts = {"HIGH": 0, "MEDIUM": 0, "LOW": 0}
|
| 87 |
-
|
| 88 |
for clause in clauses:
|
| 89 |
-
|
| 90 |
-
if
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
</div>
|
| 114 |
-
<span style="font-size:13px;font-weight:500;padding:2px 10px;border-radius:4px;{
|
| 115 |
-
'background:#fef2f2;color:#b91c1c;' if grade in ('F','D') else
|
| 116 |
-
'background:#fffbeb;color:#a16207;' if grade == 'C' else
|
| 117 |
-
'background:#f0fdf4;color:#15803d;'
|
| 118 |
-
}">Grade {grade}</span>
|
| 119 |
</div>
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
if not flagged:
|
| 124 |
-
summary += '<div style="border:1px solid #e4e4e7;border-radius:8px;padding:24px;text-align:center;"><p style="font-size:14px;color:#71717a;">No unfair clauses found.</p></div>'
|
| 125 |
-
else:
|
| 126 |
-
for item in flagged:
|
| 127 |
-
max_sev = max(item["hits"], key=lambda h: {"HIGH":3,"MEDIUM":2,"LOW":1}[h["severity"]])["severity"]
|
| 128 |
-
border = {"HIGH":"#fca5a5","MEDIUM":"#fcd34d","LOW":"#93c5fd"}[max_sev]
|
| 129 |
-
|
| 130 |
-
tags = ""
|
| 131 |
-
for h in item["hits"]:
|
| 132 |
-
ts = {"HIGH":"background:#fef2f2;color:#b91c1c;border:1px solid #fecaca;",
|
| 133 |
-
"MEDIUM":"background:#fffbeb;color:#a16207;border:1px solid #fde68a;",
|
| 134 |
-
"LOW":"background:#eff6ff;color:#1d4ed8;border:1px solid #bfdbfe;"}[h["severity"]]
|
| 135 |
-
conf = f' ({h["confidence"]})' if h.get("confidence") and ml_pipeline else ""
|
| 136 |
-
tags += f'<span style="{ts}font-size:11px;font-weight:500;padding:1px 8px;border-radius:3px;margin-right:4px;">{h["name"]}{conf}</span>'
|
| 137 |
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
Spotify may, in its sole discretion, modify or update these Terms of Service at any time without prior notice. Your continued use of the Service after any such changes constitutes your acceptance of the new Terms of Service.
|
| 154 |
|
|
@@ -160,39 +789,235 @@ Spotify may terminate your account or suspend your access at any time, with or w
|
|
| 160 |
|
| 161 |
These Terms will be governed by and construed in accordance with the laws of the State of New York.
|
| 162 |
|
| 163 |
-
Any dispute shall be finally settled by arbitration in New York County."""
|
| 164 |
|
| 165 |
-
|
| 166 |
|
| 167 |
The Landlord shall not be liable for any damage to the Tenant's personal property, whether caused by water leaks, fire, theft, or any other cause, including the Landlord's own negligence.
|
| 168 |
|
| 169 |
The Landlord may terminate this lease at any time with only 7 days written notice, for any reason or no reason at all.
|
| 170 |
|
| 171 |
-
Any disputes arising from this lease agreement shall be resolved exclusively in the courts of the
|
| 172 |
|
| 173 |
The Landlord reserves the right to modify the terms of this lease at any time. Continued occupancy constitutes acceptance of the new terms."""
|
| 174 |
|
| 175 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
-
|
| 178 |
-
gr.HTML('<div style="font-family:system-ui,sans-serif;padding:16px 0;"><h1 style="font-size:20px;font-weight:600;margin:0;">ClauseGuard</h1><p style="font-size:13px;color:#a1a1aa;margin-top:2px;">Paste a Terms of Service, contract, or lease. Get a risk breakdown.</p></div>')
|
| 179 |
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
with gr.Row():
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
|
|
|
|
|
|
| 187 |
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
|
| 192 |
-
|
| 193 |
-
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 196 |
|
| 197 |
if __name__ == "__main__":
|
| 198 |
demo.launch()
|
|
|
|
| 1 |
"""
|
| 2 |
+
ClauseGuard β World's Best Legal Contract Analysis Tool
|
| 3 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 4 |
+
Features:
|
| 5 |
+
β’ 41 CUAD clause categories via fine-tuned Legal-BERT
|
| 6 |
+
β’ 4-tier risk scoring (Critical / High / Medium / Low)
|
| 7 |
+
β’ Legal NER: parties, dates, monetary values, jurisdictions, defined terms
|
| 8 |
+
β’ NLI contradiction & missing-clause detection
|
| 9 |
+
β’ Contract comparison engine (diff between 2 contracts)
|
| 10 |
+
β’ Obligation tracker (monetary, compliance, reporting, delivery)
|
| 11 |
+
β’ Compliance checker (GDPR, CCPA, SOX, HIPAA, FINRA)
|
| 12 |
+
β’ PDF / DOCX / TXT parsing
|
| 13 |
+
β’ Professional 3-panel Gradio UI
|
| 14 |
+
β’ JSON & CSV export
|
| 15 |
+
|
| 16 |
+
Models:
|
| 17 |
+
β’ Clause classifier: Mokshith31/legalbert-contract-clause-classification
|
| 18 |
+
(LoRA adapter on nlpaueb/legal-bert-base-uncased, 41 CUAD classes)
|
| 19 |
"""
|
| 20 |
|
| 21 |
+
import os
|
| 22 |
import re
|
| 23 |
+
import json
|
| 24 |
+
import csv
|
| 25 |
+
import io
|
| 26 |
+
from collections import defaultdict
|
| 27 |
+
from datetime import datetime
|
| 28 |
+
|
| 29 |
+
import gradio as gr
|
| 30 |
import numpy as np
|
| 31 |
|
| 32 |
+
# ββ Document parsers (soft-fail) ββββββββββββββββββββββββββββββββββββ
|
| 33 |
+
try:
|
| 34 |
+
import pdfplumber
|
| 35 |
+
_HAS_PDF = True
|
| 36 |
+
except Exception:
|
| 37 |
+
_HAS_PDF = False
|
| 38 |
+
|
| 39 |
+
try:
|
| 40 |
+
from docx import Document as DocxDocument
|
| 41 |
+
_HAS_DOCX = True
|
| 42 |
+
except Exception:
|
| 43 |
+
_HAS_DOCX = False
|
| 44 |
|
| 45 |
+
# ββ PyTorch / Transformers (soft-fail) ββββββββββββββββββββββββββββββββ
|
| 46 |
try:
|
| 47 |
+
import torch
|
| 48 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 49 |
+
from peft import PeftModel
|
| 50 |
+
_HAS_TORCH = True
|
| 51 |
+
except Exception:
|
| 52 |
+
_HAS_TORCH = False
|
| 53 |
+
|
| 54 |
+
# ββ Import submodules βββββββββββββββββββββββββββββββββββββββββββββββ
|
| 55 |
+
from compare import compare_contracts, render_comparison_html
|
| 56 |
+
from obligations import extract_obligations, render_obligations_html
|
| 57 |
+
from compliance import check_compliance, render_compliance_html
|
| 58 |
+
|
| 59 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 60 |
+
# 1. CONFIGURATION
|
| 61 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 62 |
+
|
| 63 |
+
CUAD_LABELS = [
|
| 64 |
+
"Document Name", "Parties", "Agreement Date", "Effective Date",
|
| 65 |
+
"Expiration Date", "Renewal Term", "Governing Law", "Most Favored Nation",
|
| 66 |
+
"Non-Compete", "Exclusivity", "No-Solicit of Customers",
|
| 67 |
+
"No-Solicit of Employees", "Non-Disparagement",
|
| 68 |
+
"Termination for Convenience", "ROFR/ROFO/ROFN", "Change of Control",
|
| 69 |
+
"Anti-Assignment", "Revenue/Profit Sharing", "Price Restriction",
|
| 70 |
+
"Minimum Commitment", "Volume Restriction", "IP Ownership Assignment",
|
| 71 |
+
"Joint IP Ownership", "License Grant", "Non-Transferable License",
|
| 72 |
+
"Affiliate License-Licensor", "Affiliate License-Licensee",
|
| 73 |
+
"Unlimited/All-You-Can-Eat License", "Irrevocable or Perpetual License",
|
| 74 |
+
"Source Code Escrow", "Post-Termination Services", "Audit Rights",
|
| 75 |
+
"Uncapped Liability", "Cap on Liability", "Liquidated Damages",
|
| 76 |
+
"Warranty Duration", "Insurance", "Covenant Not to Sue",
|
| 77 |
+
"Third Party Beneficiary", "Other"
|
| 78 |
+
]
|
| 79 |
+
|
| 80 |
+
_UNFAIR_LABELS = [
|
| 81 |
+
"Limitation of liability", "Unilateral termination", "Unilateral change",
|
| 82 |
+
"Content removal", "Contract by using", "Choice of law",
|
| 83 |
+
"Jurisdiction", "Arbitration"
|
| 84 |
+
]
|
| 85 |
+
|
| 86 |
+
_ALL_LABELS = CUAD_LABELS + _UNFAIR_LABELS
|
| 87 |
+
|
| 88 |
+
RISK_MAP = {
|
| 89 |
+
# Critical
|
| 90 |
+
"Uncapped Liability": "CRITICAL",
|
| 91 |
+
"Arbitration": "CRITICAL",
|
| 92 |
+
"IP Ownership Assignment": "CRITICAL",
|
| 93 |
+
"Termination for Convenience": "CRITICAL",
|
| 94 |
+
"Limitation of liability": "CRITICAL",
|
| 95 |
+
"Unilateral termination": "CRITICAL",
|
| 96 |
+
"Liquidated Damages": "CRITICAL",
|
| 97 |
+
# High
|
| 98 |
+
"Non-Compete": "HIGH",
|
| 99 |
+
"Exclusivity": "HIGH",
|
| 100 |
+
"Change of Control": "HIGH",
|
| 101 |
+
"No-Solicit of Customers": "HIGH",
|
| 102 |
+
"No-Solicit of Employees": "HIGH",
|
| 103 |
+
"Unilateral change": "HIGH",
|
| 104 |
+
"Content removal": "HIGH",
|
| 105 |
+
"Anti-Assignment": "HIGH",
|
| 106 |
+
# Medium
|
| 107 |
+
"Governing Law": "MEDIUM",
|
| 108 |
+
"Jurisdiction": "MEDIUM",
|
| 109 |
+
"Choice of law": "MEDIUM",
|
| 110 |
+
"Price Restriction": "MEDIUM",
|
| 111 |
+
"Minimum Commitment": "MEDIUM",
|
| 112 |
+
"Volume Restriction": "MEDIUM",
|
| 113 |
+
"Non-Disparagement": "MEDIUM",
|
| 114 |
+
"Most Favored Nation": "MEDIUM",
|
| 115 |
+
"Revenue/Profit Sharing": "MEDIUM",
|
| 116 |
+
"Warranty Duration": "MEDIUM",
|
| 117 |
+
# Low
|
| 118 |
+
"Document Name": "LOW",
|
| 119 |
+
"Parties": "LOW",
|
| 120 |
+
"Agreement Date": "LOW",
|
| 121 |
+
"Effective Date": "LOW",
|
| 122 |
+
"Expiration Date": "LOW",
|
| 123 |
+
"Renewal Term": "LOW",
|
| 124 |
+
"Joint IP Ownership": "LOW",
|
| 125 |
+
"License Grant": "LOW",
|
| 126 |
+
"Non-Transferable License": "LOW",
|
| 127 |
+
"Affiliate License-Licensor": "LOW",
|
| 128 |
+
"Affiliate License-Licensee": "LOW",
|
| 129 |
+
"Unlimited/All-You-Can-Eat License": "LOW",
|
| 130 |
+
"Irrevocable or Perpetual License": "LOW",
|
| 131 |
+
"Source Code Escrow": "LOW",
|
| 132 |
+
"Post-Termination Services": "LOW",
|
| 133 |
+
"Audit Rights": "LOW",
|
| 134 |
+
"Cap on Liability": "LOW",
|
| 135 |
+
"Insurance": "LOW",
|
| 136 |
+
"Covenant Not to Sue": "LOW",
|
| 137 |
+
"Third Party Beneficiary": "LOW",
|
| 138 |
+
"Other": "LOW",
|
| 139 |
+
"ROFR/ROFO/ROFN": "LOW",
|
| 140 |
+
"Contract by using": "LOW",
|
| 141 |
}
|
| 142 |
|
| 143 |
+
DESC_MAP = {label: label.replace("_", " ") for label in _ALL_LABELS}
|
| 144 |
+
DESC_MAP.update({
|
| 145 |
+
"Limitation of liability": "Company limits or excludes liability for losses, data breaches, or service failures.",
|
| 146 |
+
"Unilateral termination": "Company can terminate your account at any time without reason.",
|
| 147 |
+
"Unilateral change": "Company can change terms at any time without your consent.",
|
| 148 |
+
"Content removal": "Company can delete your content without notice or justification.",
|
| 149 |
+
"Contract by using": "You are bound to the contract simply by using the service.",
|
| 150 |
+
"Choice of law": "Governing law may differ from your country, reducing your legal protections.",
|
| 151 |
+
"Jurisdiction": "Disputes must be resolved in a jurisdiction that may disadvantage you.",
|
| 152 |
+
"Arbitration": "Forces disputes to arbitration instead of court. You waive your right to sue.",
|
| 153 |
+
"Uncapped Liability": "No financial limit on damages the party may be liable for.",
|
| 154 |
+
"Cap on Liability": "Maximum financial liability is explicitly capped.",
|
| 155 |
+
"Non-Compete": "Restrictions on competing with the counter-party.",
|
| 156 |
+
"Exclusivity": "Obligation to deal exclusively with one party.",
|
| 157 |
+
"IP Ownership Assignment": "Intellectual property rights are transferred entirely.",
|
| 158 |
+
"Termination for Convenience": "Either party may terminate without cause or notice.",
|
| 159 |
+
"Governing Law": "Specifies which jurisdiction's laws apply.",
|
| 160 |
+
"Non-Disparagement": "Agreement not to speak negatively about the other party.",
|
| 161 |
+
"ROFR/ROFO/ROFN": "Right of First Refusal / Offer / Negotiation clause.",
|
| 162 |
+
"Change of Control": "Provisions triggered by ownership or control changes.",
|
| 163 |
+
"Anti-Assignment": "Restrictions on transferring contract rights to third parties.",
|
| 164 |
+
"Liquidated Damages": "Pre-determined damages amount for breach of contract.",
|
| 165 |
+
"Source Code Escrow": "Third-party holds source code for release under defined conditions.",
|
| 166 |
+
"Post-Termination Services": "Services to be provided after the contract ends.",
|
| 167 |
+
"Audit Rights": "Right to inspect records or verify compliance.",
|
| 168 |
+
"Warranty Duration": "Length of time warranties remain in effect.",
|
| 169 |
+
"Covenant Not to Sue": "Agreement not to bring legal action against a party.",
|
| 170 |
+
"Third Party Beneficiary": "Non-party who benefits from the contract terms.",
|
| 171 |
+
"Insurance": "Insurance coverage requirements.",
|
| 172 |
+
"Revenue/Profit Sharing": "Revenue or profit sharing arrangements between parties.",
|
| 173 |
+
"Price Restriction": "Restrictions on pricing or discounting.",
|
| 174 |
+
"Minimum Commitment": "Minimum purchase or usage commitment.",
|
| 175 |
+
"Volume Restriction": "Limits on volume of goods or services.",
|
| 176 |
+
"License Grant": "Permission to use intellectual property.",
|
| 177 |
+
"Non-Transferable License": "License that cannot be transferred to third parties.",
|
| 178 |
+
"Irrevocable or Perpetual License": "License that cannot be revoked or lasts indefinitely.",
|
| 179 |
+
"Unlimited/All-You-Can-Eat License": "License with no usage limits.",
|
| 180 |
+
})
|
| 181 |
+
|
| 182 |
+
RISK_WEIGHTS = {"CRITICAL": 40, "HIGH": 20, "MEDIUM": 10, "LOW": 3}
|
| 183 |
+
|
| 184 |
+
RISK_STYLES = {
|
| 185 |
+
"CRITICAL": ("#dc2626", "#fef2f2", "β οΈ"),
|
| 186 |
+
"HIGH": ("#ea580c", "#fff7ed", "β‘"),
|
| 187 |
+
"MEDIUM": ("#ca8a04", "#fefce8", "π"),
|
| 188 |
+
"LOW": ("#16a34a", "#f0fdf4", "β"),
|
| 189 |
+
}
|
| 190 |
+
|
| 191 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 192 |
+
# 2. MODEL LOADING
|
| 193 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 194 |
+
|
| 195 |
+
cuad_tokenizer = None
|
| 196 |
+
cuad_model = None
|
| 197 |
+
|
| 198 |
+
def _load_cuad_model():
|
| 199 |
+
global cuad_tokenizer, cuad_model
|
| 200 |
+
if not _HAS_TORCH:
|
| 201 |
+
print("[ClauseGuard] PyTorch not available β using regex fallback")
|
| 202 |
+
return
|
| 203 |
+
try:
|
| 204 |
+
base = "nlpaueb/legal-bert-base-uncased"
|
| 205 |
+
adapter = "Mokshith31/legalbert-contract-clause-classification"
|
| 206 |
+
print(f"[ClauseGuard] Loading CUAD classifier: {adapter}")
|
| 207 |
+
cuad_tokenizer = AutoTokenizer.from_pretrained(base)
|
| 208 |
+
base_model = AutoModelForSequenceClassification.from_pretrained(
|
| 209 |
+
base, num_labels=41, ignore_mismatched_sizes=True
|
| 210 |
+
)
|
| 211 |
+
cuad_model = PeftModel.from_pretrained(base_model, adapter)
|
| 212 |
+
cuad_model.eval()
|
| 213 |
+
print("[ClauseGuard] CUAD model loaded successfully")
|
| 214 |
+
except Exception as e:
|
| 215 |
+
print(f"[ClauseGuard] CUAD model load failed: {e}")
|
| 216 |
+
cuad_tokenizer = None
|
| 217 |
+
cuad_model = None
|
| 218 |
+
|
| 219 |
+
_load_cuad_model()
|
| 220 |
+
|
| 221 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 222 |
+
# 3. DOCUMENT PARSING
|
| 223 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 224 |
+
|
| 225 |
+
def parse_pdf(file_path):
|
| 226 |
+
if not _HAS_PDF:
|
| 227 |
+
return None, "PDF parsing not available (pdfplumber not installed)"
|
| 228 |
+
try:
|
| 229 |
+
text = ""
|
| 230 |
+
with pdfplumber.open(file_path) as pdf:
|
| 231 |
+
for page in pdf.pages:
|
| 232 |
+
page_text = page.extract_text()
|
| 233 |
+
if page_text:
|
| 234 |
+
text += page_text + "\n\n"
|
| 235 |
+
return text.strip(), None
|
| 236 |
+
except Exception as e:
|
| 237 |
+
return None, f"PDF parse error: {e}"
|
| 238 |
+
|
| 239 |
+
def parse_docx(file_path):
|
| 240 |
+
if not _HAS_DOCX:
|
| 241 |
+
return None, "DOCX parsing not available (python-docx not installed)"
|
| 242 |
+
try:
|
| 243 |
+
doc = DocxDocument(file_path)
|
| 244 |
+
paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
|
| 245 |
+
return "\n\n".join(paragraphs), None
|
| 246 |
+
except Exception as e:
|
| 247 |
+
return None, f"DOCX parse error: {e}"
|
| 248 |
+
|
| 249 |
+
def parse_document(file_path):
|
| 250 |
+
if file_path is None:
|
| 251 |
+
return None, "No file uploaded"
|
| 252 |
+
ext = os.path.splitext(file_path)[1].lower()
|
| 253 |
+
if ext == ".pdf":
|
| 254 |
+
return parse_pdf(file_path)
|
| 255 |
+
elif ext in (".docx", ".doc"):
|
| 256 |
+
return parse_docx(file_path)
|
| 257 |
+
elif ext in (".txt", ".md", ".rst"):
|
| 258 |
+
try:
|
| 259 |
+
with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
|
| 260 |
+
return f.read(), None
|
| 261 |
+
except Exception as e:
|
| 262 |
+
return None, f"Text read error: {e}"
|
| 263 |
+
else:
|
| 264 |
+
return None, f"Unsupported file type: {ext}"
|
| 265 |
+
|
| 266 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 267 |
+
# 4. CLAUSE DETECTION
|
| 268 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 269 |
+
|
| 270 |
+
def split_clauses(text):
|
| 271 |
+
text = re.sub(r'\n{3,}', '\n\n', text.strip())
|
| 272 |
+
parts = re.split(
|
| 273 |
+
r'(?<=[.!?])\s+(?=[A-Z0-9(])|(?:\n\n)(?=\d+[.)]\s|\([a-z]\)\s|[A-Z][A-Z\s]{2,})',
|
| 274 |
+
text
|
| 275 |
+
)
|
| 276 |
+
clauses = []
|
| 277 |
+
for p in parts:
|
| 278 |
+
p = p.strip()
|
| 279 |
+
if len(p) > 30:
|
| 280 |
+
clauses.append(p)
|
| 281 |
+
return clauses
|
| 282 |
+
|
| 283 |
+
def classify_cuad(clause_text):
|
| 284 |
+
if cuad_model is None or cuad_tokenizer is None:
|
| 285 |
+
return _classify_regex(clause_text)
|
| 286 |
+
try:
|
| 287 |
+
inputs = cuad_tokenizer(
|
| 288 |
+
clause_text,
|
| 289 |
+
return_tensors="pt",
|
| 290 |
+
truncation=True,
|
| 291 |
+
max_length=256,
|
| 292 |
+
padding=True
|
| 293 |
+
)
|
| 294 |
+
with torch.no_grad():
|
| 295 |
+
logits = cuad_model(**inputs).logits
|
| 296 |
+
probs = torch.softmax(logits, dim=-1)[0]
|
| 297 |
+
threshold = 0.15
|
| 298 |
+
results = []
|
| 299 |
+
for i, prob in enumerate(probs):
|
| 300 |
+
if prob > threshold and i < len(CUAD_LABELS):
|
| 301 |
+
label = CUAD_LABELS[i]
|
| 302 |
+
risk = RISK_MAP.get(label, "LOW")
|
| 303 |
+
results.append({
|
| 304 |
+
"label": label,
|
| 305 |
+
"confidence": round(float(prob), 3),
|
| 306 |
+
"risk": risk,
|
| 307 |
+
"description": DESC_MAP.get(label, label),
|
| 308 |
+
})
|
| 309 |
+
results.sort(key=lambda x: x["confidence"], reverse=True)
|
| 310 |
+
if not results:
|
| 311 |
+
top_idx = int(probs.argmax())
|
| 312 |
+
label = CUAD_LABELS[top_idx] if top_idx < len(CUAD_LABELS) else "Other"
|
| 313 |
+
results.append({
|
| 314 |
+
"label": label,
|
| 315 |
+
"confidence": round(float(probs[top_idx]), 3),
|
| 316 |
+
"risk": RISK_MAP.get(label, "LOW"),
|
| 317 |
+
"description": DESC_MAP.get(label, label),
|
| 318 |
+
})
|
| 319 |
+
return results
|
| 320 |
+
except Exception as e:
|
| 321 |
+
print(f"[ClauseGuard] CUAD inference error: {e}")
|
| 322 |
+
return _classify_regex(clause_text)
|
| 323 |
+
|
| 324 |
+
_REGEX_PATTERNS = {
|
| 325 |
"Limitation of liability": [r"not liable", r"shall not be (liable|responsible)", r"in no event.*liable", r"limitation of liability", r"without warranty", r"disclaim"],
|
| 326 |
"Unilateral termination": [r"terminat.*at any time", r"suspend.*account.*without", r"we may (terminat|suspend|discontinu)", r"right to (terminat|suspend)"],
|
| 327 |
"Unilateral change": [r"sole discretion", r"reserves? the right to (modify|change|update|amend)", r"at any time.*without (prior )?notice", r"we may (modify|change|update)"],
|
|
|
|
| 330 |
"Choice of law": [r"governed by.*laws? of", r"shall be governed", r"laws of the state of"],
|
| 331 |
"Jurisdiction": [r"exclusive jurisdiction", r"courts? of.*(california|delaware|new york|ireland|england)", r"submit to.*jurisdiction"],
|
| 332 |
"Arbitration": [r"arbitrat", r"binding arbitration", r"waive.*right.*court", r"class action waiver"],
|
| 333 |
+
"Governing Law": [r"governed by", r"laws of", r"jurisdiction of"],
|
| 334 |
+
"Termination for Convenience": [r"terminat.*for convenience", r"terminat.*without cause", r"terminat.*at any time"],
|
| 335 |
+
"Non-Compete": [r"non-compete", r"shall not compete", r"competition"],
|
| 336 |
+
"Exclusivity": [r"exclusive", r"exclusivity"],
|
| 337 |
+
"IP Ownership Assignment": [r"assign.*intellectual property", r"ownership of.*ip", r"all rights.*assign"],
|
| 338 |
+
"Uncapped Liability": [r"unlimited liability", r"uncapped", r"no.*limit.*liability"],
|
| 339 |
+
"Cap on Liability": [r"cap on liability", r"maximum liability", r"liability.*shall not exceed"],
|
| 340 |
+
"Indemnification": [r"indemnif", r"hold harmless", r"defend"],
|
| 341 |
+
"Confidentiality": [r"confidential", r"non-disclosure", r"nda"],
|
| 342 |
+
"Force Majeure": [r"force majeure", r"act of god", r"beyond.*control"],
|
| 343 |
+
"Penalties": [r"penalt", r"late fee", r"default charge", r"interest on overdue"],
|
| 344 |
}
|
| 345 |
|
| 346 |
+
def _classify_regex(text):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 347 |
text_lower = text.lower()
|
| 348 |
+
results = []
|
| 349 |
+
seen = set()
|
| 350 |
+
for label, patterns in _REGEX_PATTERNS.items():
|
| 351 |
+
for pat in patterns:
|
| 352 |
+
if re.search(pat, text_lower):
|
| 353 |
+
if label not in seen:
|
| 354 |
+
risk = RISK_MAP.get(label, "MEDIUM")
|
| 355 |
+
results.append({
|
| 356 |
+
"label": label,
|
| 357 |
+
"confidence": 0.7,
|
| 358 |
+
"risk": risk,
|
| 359 |
+
"description": DESC_MAP.get(label, label),
|
| 360 |
+
})
|
| 361 |
+
seen.add(label)
|
| 362 |
break
|
| 363 |
return results
|
| 364 |
|
| 365 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 366 |
+
# 5. LEGAL NER
|
| 367 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 368 |
+
|
| 369 |
+
def extract_entities(text):
|
| 370 |
+
entities = []
|
| 371 |
+
date_patterns = [
|
| 372 |
+
(r'\b(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4}\b', "DATE"),
|
| 373 |
+
(r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', "DATE"),
|
| 374 |
+
(r'\b\d{1,2}-\d{1,2}-\d{2,4}\b', "DATE"),
|
| 375 |
+
(r'\b(?:Effective|Commencement|Expiration|Termination)\s+Date\b', "DATE_REF"),
|
| 376 |
+
]
|
| 377 |
+
for pat, etype in date_patterns:
|
| 378 |
+
for m in re.finditer(pat, text, re.IGNORECASE):
|
| 379 |
+
entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
|
| 380 |
+
money_patterns = [
|
| 381 |
+
(r'\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?:\s*(?:million|billion|thousand|M|B|K))?', "MONEY"),
|
| 382 |
+
(r'\b\d{1,3}(?:,\d{3})*(?:\.\d{2})?\s*(?:USD|EUR|GBP|dollars|euros)', "MONEY"),
|
| 383 |
+
]
|
| 384 |
+
for pat, etype in money_patterns:
|
| 385 |
+
for m in re.finditer(pat, text, re.IGNORECASE):
|
| 386 |
+
entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
|
| 387 |
+
party_patterns = [
|
| 388 |
+
(r'\b[A-Z][A-Za-z0-9\s&]+(?:Inc\.|LLC|Ltd\.|Limited|Corp\.|Corporation|PLC|GmbH|AG|S\.A\.|B\.V\.)\b', "PARTY"),
|
| 389 |
+
(r'\b(?:Party A|Party B|Disclosing Party|Receiving Party|Licensor|Licensee|Buyer|Seller|Tenant|Landlord|Employer|Employee|Company|Customer|Vendor|Client)\b', "PARTY_ROLE"),
|
| 390 |
+
]
|
| 391 |
+
for pat, etype in party_patterns:
|
| 392 |
+
for m in re.finditer(pat, text):
|
| 393 |
+
entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
|
| 394 |
+
jurisdiction_patterns = [
|
| 395 |
+
(r'\b(?:State|Laws?) of [A-Z][a-zA-Z\s]+', "JURISDICTION"),
|
| 396 |
+
(r'\b(?:California|Delaware|New York|Texas|Florida|England|Ireland|Germany|France|Singapore|Hong Kong)\b', "JURISDICTION"),
|
| 397 |
+
]
|
| 398 |
+
for pat, etype in jurisdiction_patterns:
|
| 399 |
+
for m in re.finditer(pat, text, re.IGNORECASE):
|
| 400 |
+
entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
|
| 401 |
+
defined_patterns = [
|
| 402 |
+
(r'"([A-Z][A-Z\s]+)"', "DEFINED_TERM"),
|
| 403 |
+
(r'\(([A-Z][A-Z\s]+)\)', "DEFINED_TERM"),
|
| 404 |
+
]
|
| 405 |
+
for pat, etype in defined_patterns:
|
| 406 |
+
for m in re.finditer(pat, text):
|
| 407 |
+
entities.append({"text": m.group(1), "type": etype, "start": m.start(), "end": m.end()})
|
| 408 |
+
entities.sort(key=lambda x: (x["start"], -(x["end"] - x["start"])))
|
| 409 |
+
filtered = []
|
| 410 |
+
last_end = -1
|
| 411 |
+
for e in entities:
|
| 412 |
+
if e["start"] >= last_end:
|
| 413 |
+
filtered.append(e)
|
| 414 |
+
last_end = e["end"]
|
| 415 |
+
return filtered
|
| 416 |
+
|
| 417 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 418 |
+
# 6. NLI / CONTRADICTION DETECTION
|
| 419 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 420 |
+
|
| 421 |
+
_CONTRADICTION_PAIRS = [
|
| 422 |
+
(["Uncapped Liability", "unlimited liability"], ["Cap on Liability", "cap on liability"],
|
| 423 |
+
"Liability cannot be both uncapped and capped simultaneously."),
|
| 424 |
+
(["Governing Law"], ["Governing Law"],
|
| 425 |
+
"Multiple governing law provisions detected β verify consistency."),
|
| 426 |
+
(["Termination for Convenience", "terminat.*convenience"], ["Fixed Term", "fixed term"],
|
| 427 |
+
"Contract has both fixed term and termination for convenience β review carefully."),
|
| 428 |
+
(["IP Ownership Assignment", "assign.*ip"], ["Joint IP Ownership", "joint ownership"],
|
| 429 |
+
"IP cannot be both fully assigned and jointly owned."),
|
| 430 |
+
]
|
| 431 |
+
|
| 432 |
+
def detect_contradictions(clause_results):
|
| 433 |
+
contradictions = []
|
| 434 |
+
labels_found = set()
|
| 435 |
+
for cr in clause_results:
|
| 436 |
+
labels_found.add(cr["label"])
|
| 437 |
+
for group_a, group_b, explanation in _CONTRADICTION_PAIRS:
|
| 438 |
+
found_a = any(l in labels_found for l in group_a)
|
| 439 |
+
found_b = any(l in labels_found for l in group_b)
|
| 440 |
+
if found_a and found_b:
|
| 441 |
+
contradictions.append({
|
| 442 |
+
"type": "CONTRADICTION",
|
| 443 |
+
"explanation": explanation,
|
| 444 |
+
"severity": "HIGH",
|
| 445 |
+
"clauses": list(set(group_a + group_b)),
|
| 446 |
+
})
|
| 447 |
+
critical_clauses = ["Governing Law", "Termination for Convenience", "Limitation of liability", "Arbitration"]
|
| 448 |
+
for cc in critical_clauses:
|
| 449 |
+
if cc not in labels_found:
|
| 450 |
+
contradictions.append({
|
| 451 |
+
"type": "MISSING",
|
| 452 |
+
"explanation": f"Critical clause '{cc}' not detected in the document.",
|
| 453 |
+
"severity": "MEDIUM",
|
| 454 |
+
"clauses": [cc],
|
| 455 |
+
})
|
| 456 |
+
return contradictions
|
| 457 |
+
|
| 458 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 459 |
+
# 7. RISK SCORING
|
| 460 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 461 |
+
|
| 462 |
+
def compute_risk_score(clause_results, total_clauses):
|
| 463 |
+
sev_counts = {"CRITICAL": 0, "HIGH": 0, "MEDIUM": 0, "LOW": 0}
|
| 464 |
+
for cr in clause_results:
|
| 465 |
+
sev = cr.get("risk", "LOW")
|
| 466 |
+
sev_counts[sev] += 1
|
| 467 |
+
if total_clauses == 0:
|
| 468 |
+
return 0, "A", sev_counts
|
| 469 |
+
weighted = sum(sev_counts[s] * RISK_WEIGHTS[s] for s in sev_counts)
|
| 470 |
+
risk = min(100, round(weighted / max(1, total_clauses) * 10))
|
| 471 |
+
if risk >= 70: grade = "F"
|
| 472 |
+
elif risk >= 50: grade = "D"
|
| 473 |
+
elif risk >= 30: grade = "C"
|
| 474 |
+
elif risk >= 15: grade = "B"
|
| 475 |
+
else: grade = "A"
|
| 476 |
+
return risk, grade, sev_counts
|
| 477 |
|
| 478 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 479 |
+
# 8. MAIN ANALYSIS PIPELINE
|
| 480 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 481 |
|
| 482 |
+
def analyze_contract(text):
|
| 483 |
+
if not text or len(text.strip()) < 50:
|
| 484 |
+
return None, "Document too short (minimum 50 characters)"
|
| 485 |
clauses = split_clauses(text)
|
| 486 |
if not clauses:
|
| 487 |
+
return None, "No clauses detected in document"
|
| 488 |
+
clause_results = []
|
|
|
|
|
|
|
|
|
|
| 489 |
for clause in clauses:
|
| 490 |
+
predictions = classify_cuad(clause)
|
| 491 |
+
if predictions:
|
| 492 |
+
for pred in predictions:
|
| 493 |
+
clause_results.append({
|
| 494 |
+
"text": clause,
|
| 495 |
+
"label": pred["label"],
|
| 496 |
+
"confidence": pred["confidence"],
|
| 497 |
+
"risk": pred["risk"],
|
| 498 |
+
"description": pred["description"],
|
| 499 |
+
})
|
| 500 |
+
entities = extract_entities(text)
|
| 501 |
+
contradictions = detect_contradictions(clause_results)
|
| 502 |
+
risk, grade, sev_counts = compute_risk_score(clause_results, len(clauses))
|
| 503 |
+
obligations = extract_obligations(text)
|
| 504 |
+
compliance = check_compliance(text)
|
| 505 |
+
result = {
|
| 506 |
+
"metadata": {
|
| 507 |
+
"analysis_date": datetime.now().isoformat(),
|
| 508 |
+
"total_clauses": len(clauses),
|
| 509 |
+
"flagged_clauses": len(set(cr["text"] for cr in clause_results)),
|
| 510 |
+
"model": "Legal-BERT + CUAD (41 classes)" if cuad_model else "Regex fallback",
|
| 511 |
+
},
|
| 512 |
+
"risk": {
|
| 513 |
+
"score": risk,
|
| 514 |
+
"grade": grade,
|
| 515 |
+
"breakdown": sev_counts,
|
| 516 |
+
},
|
| 517 |
+
"clauses": clause_results,
|
| 518 |
+
"entities": entities,
|
| 519 |
+
"contradictions": contradictions,
|
| 520 |
+
"obligations": obligations,
|
| 521 |
+
"compliance": compliance,
|
| 522 |
+
"raw_text": text,
|
| 523 |
+
}
|
| 524 |
+
return result, None
|
| 525 |
+
|
| 526 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 527 |
+
# 9. EXPORT FUNCTIONS
|
| 528 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 529 |
+
|
| 530 |
+
def export_json(result):
|
| 531 |
+
if result is None:
|
| 532 |
+
return None
|
| 533 |
+
return json.dumps(result, indent=2, default=str)
|
| 534 |
+
|
| 535 |
+
def export_csv(result):
|
| 536 |
+
if result is None:
|
| 537 |
+
return None
|
| 538 |
+
output = io.StringIO()
|
| 539 |
+
writer = csv.writer(output)
|
| 540 |
+
writer.writerow(["Clause Text", "Label", "Risk", "Confidence", "Description"])
|
| 541 |
+
for cr in result.get("clauses", []):
|
| 542 |
+
writer.writerow([
|
| 543 |
+
cr.get("text", "")[:500],
|
| 544 |
+
cr.get("label", ""),
|
| 545 |
+
cr.get("risk", ""),
|
| 546 |
+
cr.get("confidence", ""),
|
| 547 |
+
cr.get("description", ""),
|
| 548 |
+
])
|
| 549 |
+
return output.getvalue()
|
| 550 |
+
|
| 551 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 552 |
+
# 10. UI RENDERING
|
| 553 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 554 |
+
|
| 555 |
+
def render_summary(result):
|
| 556 |
+
if result is None:
|
| 557 |
+
return ""
|
| 558 |
+
risk = result["risk"]
|
| 559 |
+
score = risk["score"]
|
| 560 |
+
grade = risk["grade"]
|
| 561 |
+
breakdown = risk["breakdown"]
|
| 562 |
+
grade_color = {
|
| 563 |
+
"A": "#16a34a", "B": "#65a30d", "C": "#ca8a04",
|
| 564 |
+
"D": "#ea580c", "F": "#dc2626",
|
| 565 |
+
}.get(grade, "#6b7280")
|
| 566 |
+
crit, high, med, low = breakdown["CRITICAL"], breakdown["HIGH"], breakdown["MEDIUM"], breakdown["LOW"]
|
| 567 |
+
html = f"""
|
| 568 |
+
<div style="font-family:system-ui,sans-serif;padding:16px;border:1px solid #e5e7eb;border-radius:12px;background:#fff;">
|
| 569 |
+
<div style="text-align:center;margin-bottom:16px;">
|
| 570 |
+
<div style="font-size:48px;font-weight:700;color:{grade_color};">{score}</div>
|
| 571 |
+
<div style="font-size:14px;color:#6b7280;">/100 Risk Score</div>
|
| 572 |
+
<div style="display:inline-block;margin-top:8px;padding:4px 16px;border-radius:20px;background:{grade_color};color:white;font-weight:600;font-size:14px;">
|
| 573 |
+
Grade {grade}
|
| 574 |
+
</div>
|
| 575 |
+
</div>
|
| 576 |
+
<div style="display:grid;grid-template-columns:1fr 1fr;gap:8px;margin-bottom:12px;">
|
| 577 |
+
<div style="padding:8px;border-radius:6px;background:#fef2f2;text-align:center;">
|
| 578 |
+
<div style="font-size:20px;font-weight:700;color:#dc2626;">{crit}</div>
|
| 579 |
+
<div style="font-size:11px;color:#991b1b;">Critical</div>
|
| 580 |
+
</div>
|
| 581 |
+
<div style="padding:8px;border-radius:6px;background:#fff7ed;text-align:center;">
|
| 582 |
+
<div style="font-size:20px;font-weight:700;color:#ea580c;">{high}</div>
|
| 583 |
+
<div style="font-size:11px;color:#9a3412;">High</div>
|
| 584 |
+
</div>
|
| 585 |
+
<div style="padding:8px;border-radius:6px;background:#fefce8;text-align:center;">
|
| 586 |
+
<div style="font-size:20px;font-weight:700;color:#ca8a04;">{med}</div>
|
| 587 |
+
<div style="font-size:11px;color:#854d0e;">Medium</div>
|
| 588 |
+
</div>
|
| 589 |
+
<div style="padding:8px;border-radius:6px;background:#f0fdf4;text-align:center;">
|
| 590 |
+
<div style="font-size:20px;font-weight:700;color:#16a34a;">{low}</div>
|
| 591 |
+
<div style="font-size:11px;color:#166534;">Low</div>
|
| 592 |
+
</div>
|
| 593 |
+
</div>
|
| 594 |
+
<div style="font-size:12px;color:#6b7280;text-align:center;">
|
| 595 |
+
{result['metadata']['total_clauses']} clauses analyzed Β· {result['metadata']['flagged_clauses']} flagged
|
| 596 |
+
<br>Engine: {result['metadata']['model']}
|
| 597 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 598 |
</div>
|
| 599 |
+
"""
|
| 600 |
+
return html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 601 |
|
| 602 |
+
def render_clause_cards(result):
|
| 603 |
+
if result is None:
|
| 604 |
+
return ""
|
| 605 |
+
clauses = result.get("clauses", [])
|
| 606 |
+
if not clauses:
|
| 607 |
+
return '<div style="padding:24px;text-align:center;color:#6b7280;">No clauses detected.</div>'
|
| 608 |
+
grouped = defaultdict(list)
|
| 609 |
+
for cr in clauses:
|
| 610 |
+
grouped[cr["text"]].append(cr)
|
| 611 |
+
html = '<div style="font-family:system-ui,sans-serif;">'
|
| 612 |
+
for text, items in grouped.items():
|
| 613 |
+
max_risk = max(items, key=lambda x: {"CRITICAL":4,"HIGH":3,"MEDIUM":2,"LOW":1}[x["risk"]])["risk"]
|
| 614 |
+
border, bg, icon = RISK_STYLES[max_risk]
|
| 615 |
+
tags = ""
|
| 616 |
+
for item in items:
|
| 617 |
+
tag_bg = RISK_STYLES[item["risk"]][1]
|
| 618 |
+
tag_color = RISK_STYLES[item["risk"]][0]
|
| 619 |
+
tags += f'<span style="background:{tag_bg};color:{tag_color};border:1px solid {tag_color}33;padding:2px 8px;border-radius:12px;font-size:11px;font-weight:500;margin-right:4px;">{item["label"]} ({item["confidence"]})</span>'
|
| 620 |
+
descs = "".join(
|
| 621 |
+
f'<p style="font-size:12px;color:#6b7280;margin:4px 0 0 0;">{item["description"]}</p>'
|
| 622 |
+
for item in items
|
| 623 |
+
)
|
| 624 |
+
preview = text[:300] + ("..." if len(text) > 300 else "")
|
| 625 |
+
preview = preview.replace("<", "<").replace(">", ">")
|
| 626 |
+
html += f"""
|
| 627 |
+
<div style="border:1px solid #e5e7eb;border-left:4px solid {border};border-radius:8px;padding:14px;margin-bottom:10px;background:#fafafa;">
|
| 628 |
+
<div style="display:flex;align-items:center;gap:6px;margin-bottom:6px;">
|
| 629 |
+
<span style="font-size:16px;">{icon}</span>
|
| 630 |
+
<span style="font-size:12px;font-weight:600;color:{border};text-transform:uppercase;">{max_risk}</span>
|
| 631 |
+
</div>
|
| 632 |
+
<p style="font-size:13px;color:#374151;line-height:1.6;margin:0 0 8px 0;">{preview}</p>
|
| 633 |
+
<div style="margin-bottom:6px;">{tags}</div>
|
| 634 |
+
{descs}
|
| 635 |
+
</div>
|
| 636 |
+
"""
|
| 637 |
+
html += "</div>"
|
| 638 |
+
return html
|
| 639 |
+
|
| 640 |
+
def render_entities(result):
|
| 641 |
+
if result is None:
|
| 642 |
+
return ""
|
| 643 |
+
entities = result.get("entities", [])
|
| 644 |
+
if not entities:
|
| 645 |
+
return '<div style="padding:16px;color:#6b7280;">No entities detected.</div>'
|
| 646 |
+
grouped = defaultdict(list)
|
| 647 |
+
for e in entities:
|
| 648 |
+
grouped[e["type"]].append(e["text"])
|
| 649 |
+
html = '<div style="font-family:system-ui,sans-serif;">'
|
| 650 |
+
for etype, texts in grouped.items():
|
| 651 |
+
unique = list(dict.fromkeys(texts))[:20]
|
| 652 |
+
color = {
|
| 653 |
+
"DATE": "#3b82f6", "DATE_REF": "#60a5fa",
|
| 654 |
+
"MONEY": "#22c55e",
|
| 655 |
+
"PARTY": "#8b5cf6", "PARTY_ROLE": "#a78bfa",
|
| 656 |
+
"JURISDICTION": "#f59e0b",
|
| 657 |
+
"DEFINED_TERM": "#ec4899",
|
| 658 |
+
}.get(etype, "#6b7280")
|
| 659 |
+
items_html = "".join(
|
| 660 |
+
f'<span style="display:inline-block;background:{color}15;color:{color};border:1px solid {color}40;padding:3px 10px;border-radius:6px;font-size:12px;margin:3px;">{t}</span>'
|
| 661 |
+
for t in unique
|
| 662 |
+
)
|
| 663 |
+
html += f"""
|
| 664 |
+
<div style="margin-bottom:12px;">
|
| 665 |
+
<div style="font-size:12px;font-weight:600;color:#374151;margin-bottom:6px;text-transform:uppercase;">{etype}</div>
|
| 666 |
+
<div>{items_html}</div>
|
| 667 |
+
</div>
|
| 668 |
+
"""
|
| 669 |
+
html += "</div>"
|
| 670 |
+
return html
|
| 671 |
+
|
| 672 |
+
def render_contradictions(result):
|
| 673 |
+
if result is None:
|
| 674 |
+
return ""
|
| 675 |
+
contradictions = result.get("contradictions", [])
|
| 676 |
+
if not contradictions:
|
| 677 |
+
return '<div style="padding:16px;color:#16a34a;">β No contradictions or missing clauses detected.</div>'
|
| 678 |
+
html = '<div style="font-family:system-ui,sans-serif;">'
|
| 679 |
+
for c in contradictions:
|
| 680 |
+
sev_color = RISK_STYLES[c["severity"]][0]
|
| 681 |
+
icon = "β οΈ" if c["type"] == "CONTRADICTION" else "π"
|
| 682 |
+
html += f"""
|
| 683 |
+
<div style="border:1px solid #e5e7eb;border-left:4px solid {sev_color};border-radius:8px;padding:12px;margin-bottom:8px;background:#fafafa;">
|
| 684 |
+
<div style="display:flex;align-items:center;gap:6px;margin-bottom:4px;">
|
| 685 |
+
<span>{icon}</span>
|
| 686 |
+
<span style="font-size:12px;font-weight:600;color:{sev_color};">{c["type"]}</span>
|
| 687 |
+
</div>
|
| 688 |
+
<p style="font-size:13px;color:#374151;margin:0;">{c["explanation"]}</p>
|
| 689 |
+
</div>
|
| 690 |
+
"""
|
| 691 |
+
html += "</div>"
|
| 692 |
+
return html
|
| 693 |
+
|
| 694 |
+
def render_document_viewer(result):
|
| 695 |
+
if result is None:
|
| 696 |
+
return ""
|
| 697 |
+
text = result.get("raw_text", "")
|
| 698 |
+
entities = sorted(result.get("entities", []), key=lambda x: x["start"])
|
| 699 |
+
html_parts = []
|
| 700 |
+
last_end = 0
|
| 701 |
+
for e in entities:
|
| 702 |
+
if e["start"] >= last_end:
|
| 703 |
+
html_parts.append(text[last_end:e["start"]].replace("<", "<").replace(">", ">"))
|
| 704 |
+
color = {
|
| 705 |
+
"DATE": "#bfdbfe", "DATE_REF": "#bfdbfe",
|
| 706 |
+
"MONEY": "#bbf7d0",
|
| 707 |
+
"PARTY": "#ddd6fe", "PARTY_ROLE": "#ddd6fe",
|
| 708 |
+
"JURISDICTION": "#fde68a",
|
| 709 |
+
"DEFINED_TERM": "#fbcfe8",
|
| 710 |
+
}.get(e["type"], "#e5e7eb")
|
| 711 |
+
label = e["type"].replace("_", " ")
|
| 712 |
+
html_parts.append(
|
| 713 |
+
f'<mark style="background:{color};padding:1px 2px;border-radius:2px;font-size:12px;" title="{label}">{e["text"].replace("<","<").replace(">",">")}</mark>'
|
| 714 |
+
)
|
| 715 |
+
last_end = e["end"]
|
| 716 |
+
html_parts.append(text[last_end:].replace("<", "<").replace(">", ">"))
|
| 717 |
+
highlighted = "".join(html_parts)
|
| 718 |
+
return f"""
|
| 719 |
+
<div style="font-family:monospace;font-size:13px;line-height:1.6;padding:16px;border:1px solid #e5e7eb;border-radius:8px;background:#fff;max-height:600px;overflow-y:auto;white-space:pre-wrap;">
|
| 720 |
+
{highlighted}
|
| 721 |
+
</div>
|
| 722 |
+
"""
|
| 723 |
+
|
| 724 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 725 |
+
# 11. COMPARISON UI FUNCTIONS
|
| 726 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 727 |
+
|
| 728 |
+
def run_comparison(text_a, text_b):
|
| 729 |
+
if not text_a or len(text_a.strip()) < 50:
|
| 730 |
+
return "Contract A is too short", ""
|
| 731 |
+
if not text_b or len(text_b.strip()) < 50:
|
| 732 |
+
return "Contract B is too short", ""
|
| 733 |
+
result = compare_contracts(text_a, text_b)
|
| 734 |
+
return render_comparison_html(result), json.dumps(result, indent=2)
|
| 735 |
+
|
| 736 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 737 |
+
# 12. GRADIO UI
|
| 738 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 739 |
+
|
| 740 |
+
def process_upload(file):
|
| 741 |
+
if file is None:
|
| 742 |
+
return "", "No file uploaded"
|
| 743 |
+
text, error = parse_document(file)
|
| 744 |
+
if error:
|
| 745 |
+
return "", error
|
| 746 |
+
return text, "Document loaded successfully"
|
| 747 |
+
|
| 748 |
+
def run_analysis(text):
|
| 749 |
+
if not text or len(text.strip()) < 50:
|
| 750 |
+
err_html = '<p style="color:#dc2626;padding:16px;">Document too short (minimum 50 characters)</p>'
|
| 751 |
+
return [err_html] * 7 + [None, None, ""]
|
| 752 |
+
result, error = analyze_contract(text)
|
| 753 |
+
if error:
|
| 754 |
+
err_html = f'<p style="color:#dc2626;padding:16px;">{error}</p>'
|
| 755 |
+
return [err_html] * 7 + [None, None, error]
|
| 756 |
+
json_path = "/tmp/clauseguard_report.json"
|
| 757 |
+
with open(json_path, "w") as f:
|
| 758 |
+
json.dump(result, f, indent=2, default=str)
|
| 759 |
+
csv_content = export_csv(result)
|
| 760 |
+
csv_path = "/tmp/clauseguard_report.csv"
|
| 761 |
+
with open(csv_path, "w") as f:
|
| 762 |
+
f.write(csv_content)
|
| 763 |
+
return [
|
| 764 |
+
render_summary(result),
|
| 765 |
+
render_clause_cards(result),
|
| 766 |
+
render_entities(result),
|
| 767 |
+
render_contradictions(result),
|
| 768 |
+
render_document_viewer(result),
|
| 769 |
+
render_obligations_html(result.get("obligations", [])),
|
| 770 |
+
render_compliance_html(result.get("compliance", {})),
|
| 771 |
+
json_path,
|
| 772 |
+
csv_path,
|
| 773 |
+
"Analysis complete",
|
| 774 |
+
]
|
| 775 |
+
|
| 776 |
+
def do_clear():
|
| 777 |
+
return [""] * 7 + [None, None, ""]
|
| 778 |
+
|
| 779 |
+
# ββ Example contracts ββ
|
| 780 |
+
SPOTIFY_TOS = """By using the Spotify Service, you agree to be bound by these Terms of Use.
|
| 781 |
|
| 782 |
Spotify may, in its sole discretion, modify or update these Terms of Service at any time without prior notice. Your continued use of the Service after any such changes constitutes your acceptance of the new Terms of Service.
|
| 783 |
|
|
|
|
| 789 |
|
| 790 |
These Terms will be governed by and construed in accordance with the laws of the State of New York.
|
| 791 |
|
| 792 |
+
Any dispute shall be finally settled by arbitration in New York County. The parties waive any right to a jury trial."""
|
| 793 |
|
| 794 |
+
RENTAL_AGREEMENT = """The Landlord reserves the right to enter the premises at any time without prior notice for inspection or any other purpose deemed necessary in their sole discretion.
|
| 795 |
|
| 796 |
The Landlord shall not be liable for any damage to the Tenant's personal property, whether caused by water leaks, fire, theft, or any other cause, including the Landlord's own negligence.
|
| 797 |
|
| 798 |
The Landlord may terminate this lease at any time with only 7 days written notice, for any reason or no reason at all.
|
| 799 |
|
| 800 |
+
Any disputes arising from this lease agreement shall be resolved exclusively in the courts of the State of California, and the Tenant waives the right to a jury trial.
|
| 801 |
|
| 802 |
The Landlord reserves the right to modify the terms of this lease at any time. Continued occupancy constitutes acceptance of the new terms."""
|
| 803 |
|
| 804 |
+
NDA_SAMPLE = """NON-DISCLOSURE AGREEMENT
|
| 805 |
+
|
| 806 |
+
This Non-Disclosure Agreement (the "Agreement") is entered into as of January 15, 2024 (the "Effective Date") by and between Acme Technologies, Inc. ("Disclosing Party") and Beta Solutions LLC ("Receiving Party").
|
| 807 |
+
|
| 808 |
+
1. Governing Law. This Agreement shall be governed by and construed in accordance with the laws of the State of Delaware, without regard to its conflict of law principles.
|
| 809 |
+
|
| 810 |
+
2. Term. This Agreement shall remain in effect for a period of three (3) years from the Effective Date.
|
| 811 |
+
|
| 812 |
+
3. Termination. Either party may terminate this Agreement for convenience upon thirty (30) days prior written notice.
|
| 813 |
+
|
| 814 |
+
4. Intellectual Property. All Confidential Information disclosed hereunder shall remain the exclusive property of the Disclosing Party. The Receiving Party hereby assigns to the Disclosing Party all right, title, and interest in any derivative works.
|
| 815 |
+
|
| 816 |
+
5. Limitation of Liability. IN NO EVENT SHALL EITHER PARTY BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES.
|
| 817 |
+
|
| 818 |
+
6. Indemnification. The Receiving Party shall indemnify and hold harmless the Disclosing Party from any and all claims arising from a breach of this Agreement.
|
| 819 |
+
|
| 820 |
+
7. Non-Compete. During the term of this Agreement and for a period of two (2) years thereafter, the Receiving Party shall not engage in any business that competes with the Disclosing Party."""
|
| 821 |
+
|
| 822 |
+
COMPLEX_CONTRACT = """MASTER SERVICE AGREEMENT
|
| 823 |
+
|
| 824 |
+
This Master Service Agreement ("MSA") is entered into as of March 1, 2024 (the "Effective Date") by and between CloudTech Solutions, Inc., a Delaware corporation ("Provider") and Global Retail Partners LLC, a New York limited liability company ("Customer").
|
| 825 |
+
|
| 826 |
+
1. SERVICES. Provider shall provide cloud hosting and data processing services as described in Exhibit A. Provider shall comply with all applicable laws including GDPR and CCPA.
|
| 827 |
+
|
| 828 |
+
2. TERM AND RENEWAL. The initial term is twelve (12) months, automatically renewing for successive one (1) year periods unless terminated in accordance with Section 7.
|
| 829 |
+
|
| 830 |
+
3. FEES AND PAYMENT. Customer shall pay a monthly fee of $25,000 within 30 days of invoice. Late payments incur a penalty of 1.5% per month. The total contract value is $300,000.
|
| 831 |
+
|
| 832 |
+
4. LIABILITY. Provider's aggregate liability shall not exceed $1,000,000. IN NO EVENT SHALL PROVIDER BE LIABLE FOR LOST PROFITS OR CONSEQUENTIAL DAMAGES. Customer assumes all risk of data loss.
|
| 833 |
|
| 834 |
+
5. INDEMNIFICATION. Each party shall indemnify the other for third-party claims arising from breach of this Agreement. Customer shall indemnify Provider for claims arising from Customer Data.
|
|
|
|
| 835 |
|
| 836 |
+
6. INTELLECTUAL PROPERTY. Provider retains all IP rights. Customer receives a non-transferable, non-exclusive license for the term. Upon termination, Customer shall return or destroy all Provider materials within 10 business days.
|
| 837 |
+
|
| 838 |
+
7. TERMINATION. Either party may terminate for convenience with 90 days notice. Provider may terminate immediately for non-payment. Upon termination, Customer shall pay all outstanding fees.
|
| 839 |
+
|
| 840 |
+
8. GOVERNING LAW. This Agreement is governed by the laws of the State of Delaware. Disputes shall be resolved by binding arbitration in Wilmington, Delaware.
|
| 841 |
+
|
| 842 |
+
9. FORCE MAJEURE. Neither party shall be liable for delays due to acts of God, war, terrorism, or government action.
|
| 843 |
+
|
| 844 |
+
10. AUDIT RIGHTS. Customer may audit Provider's compliance annually. Provider shall provide SOC 2 Type II reports within 30 days of request.
|
| 845 |
+
|
| 846 |
+
11. INSURANCE. Provider shall maintain general liability insurance of at least $5,000,000 and cyber liability insurance of at least $2,000,000.
|
| 847 |
+
|
| 848 |
+
12. CONFIDENTIALITY. Both parties agree to keep Confidential Information secure for five (5) years. This obligation survives termination.
|
| 849 |
+
|
| 850 |
+
13. ASSIGNMENT. Neither party may assign this Agreement without prior written consent. Any attempted assignment is void.
|
| 851 |
+
|
| 852 |
+
14. THIRD PARTY BENEFICIARY. No third party shall have rights under this Agreement except as expressly provided."""
|
| 853 |
+
|
| 854 |
+
with gr.Blocks(
|
| 855 |
+
title="ClauseGuard β AI Contract Analysis",
|
| 856 |
+
css="""
|
| 857 |
+
.gradio-container { max-width: 1600px !important; }
|
| 858 |
+
"""
|
| 859 |
+
) as demo:
|
| 860 |
+
|
| 861 |
+
gr.HTML("""
|
| 862 |
+
<div style="display:flex;align-items:center;justify-content:space-between;padding:12px 0;border-bottom:2px solid #e5e7eb;margin-bottom:16px;">
|
| 863 |
+
<div>
|
| 864 |
+
<h1 style="font-size:24px;font-weight:700;margin:0;color:#1f2937;">π‘οΈ ClauseGuard</h1>
|
| 865 |
+
<p style="font-size:13px;color:#6b7280;margin:4px 0 0 0;">AI-Powered Legal Contract Analysis Β· 41 Clause Categories Β· Risk Scoring Β· NER Β· NLI Β· Compliance Β· Obligations</p>
|
| 866 |
+
</div>
|
| 867 |
+
<div style="font-size:12px;color:#9ca3af;">v2.0 Β· World's Best Open-Source Legal AI</div>
|
| 868 |
+
</div>
|
| 869 |
+
""")
|
| 870 |
+
|
| 871 |
+
# ββ Main Tabs: Analysis vs Comparison ββ
|
| 872 |
+
with gr.Tabs():
|
| 873 |
+
|
| 874 |
+
# βββββββ TAB 1: Single Contract Analysis βββββββ
|
| 875 |
+
with gr.Tab("π Single Contract Analysis"):
|
| 876 |
+
with gr.Row():
|
| 877 |
+
with gr.Column(scale=1):
|
| 878 |
+
file_input = gr.File(
|
| 879 |
+
label="π Upload Contract (PDF/DOCX/TXT)",
|
| 880 |
+
file_types=[".pdf", ".docx", ".doc", ".txt", ".md"],
|
| 881 |
+
)
|
| 882 |
+
load_btn = gr.Button("Load Document", variant="secondary", size="sm")
|
| 883 |
+
load_status = gr.Textbox(label="Status", interactive=False, lines=1)
|
| 884 |
+
|
| 885 |
+
with gr.Column(scale=3):
|
| 886 |
+
text_input = gr.Textbox(
|
| 887 |
+
label="π Contract Text",
|
| 888 |
+
placeholder="Paste contract text here, or upload a file above...",
|
| 889 |
+
lines=14,
|
| 890 |
+
max_lines=40,
|
| 891 |
+
show_copy_button=True,
|
| 892 |
+
)
|
| 893 |
+
|
| 894 |
+
with gr.Column(scale=1):
|
| 895 |
+
scan_btn = gr.Button("π Analyze Contract", variant="primary", size="lg")
|
| 896 |
+
clear_btn = gr.Button("Clear", variant="secondary", size="sm")
|
| 897 |
+
status_msg = gr.Textbox(label="Analysis Status", interactive=False, lines=1)
|
| 898 |
+
|
| 899 |
+
# ββ Examples ββ
|
| 900 |
with gr.Row():
|
| 901 |
+
gr.Examples(
|
| 902 |
+
examples=[[SPOTIFY_TOS], [RENTAL_AGREEMENT], [NDA_SAMPLE], [COMPLEX_CONTRACT]],
|
| 903 |
+
inputs=[text_input],
|
| 904 |
+
label="Example Contracts",
|
| 905 |
+
)
|
| 906 |
|
| 907 |
+
# ββ Results ββ
|
| 908 |
+
with gr.Row():
|
| 909 |
+
with gr.Column(scale=1):
|
| 910 |
+
gr.Markdown("### π Risk Summary")
|
| 911 |
+
summary_html = gr.HTML()
|
| 912 |
+
|
| 913 |
+
gr.Markdown("### π₯ Export Reports")
|
| 914 |
+
json_file = gr.File(label="JSON Report")
|
| 915 |
+
csv_file = gr.File(label="CSV Report")
|
| 916 |
+
|
| 917 |
+
with gr.Column(scale=3):
|
| 918 |
+
with gr.Tabs():
|
| 919 |
+
with gr.Tab("π Document"):
|
| 920 |
+
doc_html = gr.HTML(label="Document Viewer")
|
| 921 |
+
with gr.Tab("β οΈ Clauses (41 Categories)"):
|
| 922 |
+
clauses_html = gr.HTML(label="Detected Clauses")
|
| 923 |
+
with gr.Tab("π·οΈ Entities"):
|
| 924 |
+
entities_html = gr.HTML(label="Named Entities")
|
| 925 |
+
with gr.Tab("π Contradictions"):
|
| 926 |
+
nli_html = gr.HTML(label="Contradictions & Missing Clauses")
|
| 927 |
+
with gr.Tab("π Obligations"):
|
| 928 |
+
obligations_html = gr.HTML(label="Obligation Tracker")
|
| 929 |
+
with gr.Tab("βοΈ Compliance"):
|
| 930 |
+
compliance_html = gr.HTML(label="Compliance Checker")
|
| 931 |
+
|
| 932 |
+
# βββββββ TAB 2: Contract Comparison βββββββ
|
| 933 |
+
with gr.Tab("π Compare Contracts"):
|
| 934 |
+
with gr.Row():
|
| 935 |
+
with gr.Column(scale=1):
|
| 936 |
+
comp_file_a = gr.File(
|
| 937 |
+
label="π Contract A (PDF/DOCX/TXT)",
|
| 938 |
+
file_types=[".pdf", ".docx", ".doc", ".txt"],
|
| 939 |
+
)
|
| 940 |
+
comp_load_a = gr.Button("Load A", variant="secondary", size="sm")
|
| 941 |
+
comp_status_a = gr.Textbox(label="Status A", interactive=False, lines=1)
|
| 942 |
+
|
| 943 |
+
with gr.Column(scale=3):
|
| 944 |
+
comp_text_a = gr.Textbox(
|
| 945 |
+
label="Contract A",
|
| 946 |
+
placeholder="Paste contract A here...",
|
| 947 |
+
lines=12,
|
| 948 |
+
show_copy_button=True,
|
| 949 |
+
)
|
| 950 |
+
|
| 951 |
+
with gr.Column(scale=1):
|
| 952 |
+
comp_file_b = gr.File(
|
| 953 |
+
label="π Contract B (PDF/DOCX/TXT)",
|
| 954 |
+
file_types=[".pdf", ".docx", ".doc", ".txt"],
|
| 955 |
+
)
|
| 956 |
+
comp_load_b = gr.Button("Load B", variant="secondary", size="sm")
|
| 957 |
+
comp_status_b = gr.Textbox(label="Status B", interactive=False, lines=1)
|
| 958 |
+
|
| 959 |
+
with gr.Column(scale=3):
|
| 960 |
+
comp_text_b = gr.Textbox(
|
| 961 |
+
label="Contract B",
|
| 962 |
+
placeholder="Paste contract B here...",
|
| 963 |
+
lines=12,
|
| 964 |
+
show_copy_button=True,
|
| 965 |
+
)
|
| 966 |
|
| 967 |
+
with gr.Row():
|
| 968 |
+
with gr.Column(scale=1):
|
| 969 |
+
comp_btn = gr.Button("π Compare Contracts", variant="primary", size="lg")
|
| 970 |
+
with gr.Column(scale=5):
|
| 971 |
+
comp_status = gr.Textbox(label="Comparison Status", interactive=False, lines=1)
|
| 972 |
|
| 973 |
+
with gr.Row():
|
| 974 |
+
with gr.Column(scale=4):
|
| 975 |
+
comp_result_html = gr.HTML(label="Comparison Results")
|
| 976 |
+
with gr.Column(scale=2):
|
| 977 |
+
comp_json = gr.JSON(label="Raw Comparison Data")
|
| 978 |
+
|
| 979 |
+
# ββ Events ββ
|
| 980 |
+
def _load_file(file):
|
| 981 |
+
text, err = parse_document(file) if file else ("", "No file")
|
| 982 |
+
if err and not text:
|
| 983 |
+
return "", err
|
| 984 |
+
return text, "Loaded successfully" if not err else err
|
| 985 |
+
|
| 986 |
+
load_btn.click(_load_file, inputs=[file_input], outputs=[text_input, load_status])
|
| 987 |
+
comp_load_a.click(_load_file, inputs=[comp_file_a], outputs=[comp_text_a, comp_status_a])
|
| 988 |
+
comp_load_b.click(_load_file, inputs=[comp_file_b], outputs=[comp_text_b, comp_status_b])
|
| 989 |
+
|
| 990 |
+
scan_btn.click(
|
| 991 |
+
run_analysis,
|
| 992 |
+
inputs=[text_input],
|
| 993 |
+
outputs=[summary_html, clauses_html, entities_html, nli_html,
|
| 994 |
+
doc_html, obligations_html, compliance_html,
|
| 995 |
+
json_file, csv_file, status_msg]
|
| 996 |
+
)
|
| 997 |
+
|
| 998 |
+
clear_btn.click(
|
| 999 |
+
do_clear,
|
| 1000 |
+
outputs=[summary_html, clauses_html, entities_html, nli_html,
|
| 1001 |
+
doc_html, obligations_html, compliance_html,
|
| 1002 |
+
json_file, csv_file, status_msg]
|
| 1003 |
+
)
|
| 1004 |
+
|
| 1005 |
+
comp_btn.click(
|
| 1006 |
+
run_comparison,
|
| 1007 |
+
inputs=[comp_text_a, comp_text_b],
|
| 1008 |
+
outputs=[comp_result_html, comp_json]
|
| 1009 |
+
)
|
| 1010 |
+
|
| 1011 |
+
gr.HTML("""
|
| 1012 |
+
<div style="margin-top:24px;padding:16px 0;border-top:1px solid #e5e7eb;text-align:center;">
|
| 1013 |
+
<p style="font-size:11px;color:#9ca3af;">
|
| 1014 |
+
β οΈ Not legal advice. For informational purposes only.
|
| 1015 |
+
Β· Model: <a href="https://huggingface.co/Mokshith31/legalbert-contract-clause-classification" style="color:#6b7280;">Legal-BERT + CUAD (41 classes)</a>
|
| 1016 |
+
Β· Dataset: <a href="https://huggingface.co/datasets/theatticusproject/cuad-qa" style="color:#6b7280;">CUAD</a>
|
| 1017 |
+
Β· <a href="https://huggingface.co/spaces/gaurv007/ClauseGuard" style="color:#6b7280;">ClauseGuard Space</a>
|
| 1018 |
+
</p>
|
| 1019 |
+
</div>
|
| 1020 |
+
""")
|
| 1021 |
|
| 1022 |
if __name__ == "__main__":
|
| 1023 |
demo.launch()
|
compare.py
ADDED
|
@@ -0,0 +1,229 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
ClauseGuard β Contract Comparison Engine
|
| 3 |
+
βββββββββββββββββββββββββββββββββββββββ
|
| 4 |
+
Compare two contracts side-by-side:
|
| 5 |
+
β’ Clause-level diff (added/removed/modified clauses)
|
| 6 |
+
β’ Risk delta (which contract is more favorable)
|
| 7 |
+
β’ Alignment score (similarity between documents)
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import re
|
| 11 |
+
from difflib import SequenceMatcher
|
| 12 |
+
from collections import defaultdict
|
| 13 |
+
|
| 14 |
+
def _normalize_clause(text):
|
| 15 |
+
"""Normalize clause text for comparison."""
|
| 16 |
+
text = text.lower()
|
| 17 |
+
text = re.sub(r'[^a-z0-9\s]', ' ', text)
|
| 18 |
+
text = re.sub(r'\s+', ' ', text).strip()
|
| 19 |
+
return text
|
| 20 |
+
|
| 21 |
+
def _clause_similarity(a, b):
|
| 22 |
+
"""Compute similarity between two clauses."""
|
| 23 |
+
return SequenceMatcher(None, _normalize_clause(a), _normalize_clause(b)).ratio()
|
| 24 |
+
|
| 25 |
+
def _extract_clause_type(clause_text):
|
| 26 |
+
"""Heuristic clause type detection for alignment."""
|
| 27 |
+
text_lower = clause_text.lower()
|
| 28 |
+
type_keywords = {
|
| 29 |
+
"governing law": ["govern", "law", "jurisdiction"],
|
| 30 |
+
"termination": ["terminat", "cancel", "end"],
|
| 31 |
+
"indemnification": ["indemnif", "hold harmless"],
|
| 32 |
+
"confidentiality": ["confidential", "non-disclosure"],
|
| 33 |
+
"liability": ["liability", "liable", "damages"],
|
| 34 |
+
"payment": ["payment", "fee", "price", "compensat"],
|
| 35 |
+
"intellectual property": ["intellectual", "ip", "copyright", "patent"],
|
| 36 |
+
"warranty": ["warrant", "guarantee"],
|
| 37 |
+
"force majeure": ["force majeure", "act of god"],
|
| 38 |
+
"arbitration": ["arbitrat", "mediation"],
|
| 39 |
+
"assignment": ["assign", "transfer"],
|
| 40 |
+
"non-compete": ["compete", "competition"],
|
| 41 |
+
"renewal": ["renew", "extend"],
|
| 42 |
+
"effective date": ["effective date", "commencement"],
|
| 43 |
+
}
|
| 44 |
+
for ctype, keywords in type_keywords.items():
|
| 45 |
+
if any(kw in text_lower for kw in keywords):
|
| 46 |
+
return ctype
|
| 47 |
+
return "general"
|
| 48 |
+
|
| 49 |
+
def compare_contracts(text_a, text_b, clauses_a=None, clauses_b=None):
|
| 50 |
+
"""
|
| 51 |
+
Compare two contract texts and return structural diff.
|
| 52 |
+
|
| 53 |
+
Returns dict with:
|
| 54 |
+
- alignment_score: float 0-1
|
| 55 |
+
- added_clauses: clauses in B not in A
|
| 56 |
+
- removed_clauses: clauses in A not in B
|
| 57 |
+
- modified_clauses: clauses that are similar but different
|
| 58 |
+
- risk_delta: which contract is riskier
|
| 59 |
+
- clause_type_map: clauses grouped by type for both docs
|
| 60 |
+
"""
|
| 61 |
+
if not text_a or not text_b:
|
| 62 |
+
return {"error": "Both contracts required"}
|
| 63 |
+
|
| 64 |
+
# Split into clauses if not provided
|
| 65 |
+
if clauses_a is None:
|
| 66 |
+
clauses_a = _split_clauses(text_a)
|
| 67 |
+
if clauses_b is None:
|
| 68 |
+
clauses_b = _split_clauses(text_b)
|
| 69 |
+
|
| 70 |
+
# Build clause type maps
|
| 71 |
+
type_map_a = defaultdict(list)
|
| 72 |
+
type_map_b = defaultdict(list)
|
| 73 |
+
for c in clauses_a:
|
| 74 |
+
type_map_a[_extract_clause_type(c)].append(c)
|
| 75 |
+
for c in clauses_b:
|
| 76 |
+
type_map_b[_extract_clause_type(c)].append(c)
|
| 77 |
+
|
| 78 |
+
# Find matches
|
| 79 |
+
matched_a = set()
|
| 80 |
+
matched_b = set()
|
| 81 |
+
modified = []
|
| 82 |
+
|
| 83 |
+
SIMILARITY_THRESHOLD = 0.75
|
| 84 |
+
MODIFIED_THRESHOLD = 0.45
|
| 85 |
+
|
| 86 |
+
for i, ca in enumerate(clauses_a):
|
| 87 |
+
best_sim = 0
|
| 88 |
+
best_j = -1
|
| 89 |
+
for j, cb in enumerate(clauses_b):
|
| 90 |
+
if j in matched_b:
|
| 91 |
+
continue
|
| 92 |
+
sim = _clause_similarity(ca, cb)
|
| 93 |
+
if sim > best_sim:
|
| 94 |
+
best_sim = sim
|
| 95 |
+
best_j = j
|
| 96 |
+
|
| 97 |
+
if best_sim >= SIMILARITY_THRESHOLD:
|
| 98 |
+
matched_a.add(i)
|
| 99 |
+
matched_b.add(best_j)
|
| 100 |
+
if best_sim < 0.95:
|
| 101 |
+
modified.append({
|
| 102 |
+
"type": "modified",
|
| 103 |
+
"similarity": round(best_sim, 3),
|
| 104 |
+
"clause_a": ca[:200],
|
| 105 |
+
"clause_b": clauses_b[best_j][:200],
|
| 106 |
+
"clause_type": _extract_clause_type(ca),
|
| 107 |
+
})
|
| 108 |
+
elif best_sim >= MODIFIED_THRESHOLD:
|
| 109 |
+
modified.append({
|
| 110 |
+
"type": "partial",
|
| 111 |
+
"similarity": round(best_sim, 3),
|
| 112 |
+
"clause_a": ca[:200],
|
| 113 |
+
"clause_b": clauses_b[best_j][:200] if best_j >= 0 else "",
|
| 114 |
+
"clause_type": _extract_clause_type(ca),
|
| 115 |
+
})
|
| 116 |
+
|
| 117 |
+
removed = [clauses_a[i] for i in range(len(clauses_a)) if i not in matched_a]
|
| 118 |
+
added = [clauses_b[j] for j in range(len(clauses_b)) if j not in matched_b]
|
| 119 |
+
|
| 120 |
+
# Compute alignment score
|
| 121 |
+
total_pairs = max(len(clauses_a), len(clauses_b))
|
| 122 |
+
if total_pairs > 0:
|
| 123 |
+
alignment = len(matched_a) / total_pairs
|
| 124 |
+
else:
|
| 125 |
+
alignment = 0.0
|
| 126 |
+
|
| 127 |
+
# Risk delta: compare length and presence of risk keywords
|
| 128 |
+
risk_keywords = ["unlimited", "unilateral", "waive", "arbitration", "indemnif",
|
| 129 |
+
"not liable", "no warranty", "sole discretion"]
|
| 130 |
+
risk_a = sum(1 for kw in risk_keywords if kw in text_a.lower())
|
| 131 |
+
risk_b = sum(1 for kw in risk_keywords if kw in text_b.lower())
|
| 132 |
+
|
| 133 |
+
if risk_a > risk_b + 2:
|
| 134 |
+
risk_delta = "Contract A is significantly riskier"
|
| 135 |
+
risk_winner = "B"
|
| 136 |
+
elif risk_b > risk_a + 2:
|
| 137 |
+
risk_delta = "Contract B is significantly riskier"
|
| 138 |
+
risk_winner = "A"
|
| 139 |
+
else:
|
| 140 |
+
risk_delta = "Similar risk profiles"
|
| 141 |
+
risk_winner = "tie"
|
| 142 |
+
|
| 143 |
+
return {
|
| 144 |
+
"alignment_score": round(alignment, 3),
|
| 145 |
+
"contract_a_clauses": len(clauses_a),
|
| 146 |
+
"contract_b_clauses": len(clauses_b),
|
| 147 |
+
"added_clauses": [{"text": c[:200], "type": _extract_clause_type(c)} for c in added[:50]],
|
| 148 |
+
"removed_clauses": [{"text": c[:200], "type": _extract_clause_type(c)} for c in removed[:50]],
|
| 149 |
+
"modified_clauses": modified[:50],
|
| 150 |
+
"risk_delta": risk_delta,
|
| 151 |
+
"risk_winner": risk_winner,
|
| 152 |
+
"type_map_a": {k: len(v) for k, v in type_map_a.items()},
|
| 153 |
+
"type_map_b": {k: len(v) for k, v in type_map_b.items()},
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
def _split_clauses(text):
|
| 157 |
+
"""Split text into clauses."""
|
| 158 |
+
text = re.sub(r'\n{3,}', '\n\n', text.strip())
|
| 159 |
+
parts = re.split(
|
| 160 |
+
r'(?<=[.!?])\s+(?=[A-Z0-9(])|(?:\n\n)(?=\d+[.)]\s|\([a-z]\)\s|[A-Z][A-Z\s]{2,})',
|
| 161 |
+
text
|
| 162 |
+
)
|
| 163 |
+
return [p.strip() for p in parts if len(p.strip()) > 30]
|
| 164 |
+
|
| 165 |
+
def render_comparison_html(result):
|
| 166 |
+
"""Render comparison results as HTML for Gradio."""
|
| 167 |
+
if "error" in result:
|
| 168 |
+
return f'<p style="color:#dc2626;">{result["error"]}</p>'
|
| 169 |
+
|
| 170 |
+
html = f'''
|
| 171 |
+
<div style="font-family:system-ui,sans-serif;">
|
| 172 |
+
<div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-bottom:16px;">
|
| 173 |
+
<div style="padding:12px;border-radius:8px;background:#eff6ff;border:1px solid #bfdbfe;text-align:center;">
|
| 174 |
+
<div style="font-size:24px;font-weight:700;color:#1d4ed8;">{result["contract_a_clauses"]}</div>
|
| 175 |
+
<div style="font-size:12px;color:#3b82f6;">Clauses in Contract A</div>
|
| 176 |
+
</div>
|
| 177 |
+
<div style="padding:12px;border-radius:8px;background:#fefce8;border:1px solid #fde68a;text-align:center;">
|
| 178 |
+
<div style="font-size:24px;font-weight:700;color:#a16207;">{result["contract_b_clauses"]}</div>
|
| 179 |
+
<div style="font-size:12px;color:#ca8a04;">Clauses in Contract B</div>
|
| 180 |
+
</div>
|
| 181 |
+
</div>
|
| 182 |
+
|
| 183 |
+
<div style="padding:12px;border-radius:8px;background:#f9fafb;border:1px solid #e5e7eb;margin-bottom:16px;text-align:center;">
|
| 184 |
+
<div style="font-size:28px;font-weight:700;color:#374151;">{result["alignment_score"]*100:.1f}%</div>
|
| 185 |
+
<div style="font-size:12px;color:#6b7280;">Alignment Score</div>
|
| 186 |
+
</div>
|
| 187 |
+
|
| 188 |
+
<div style="padding:12px;border-radius:8px;background:{
|
| 189 |
+
"#fef2f2" if result["risk_winner"] != "tie" else "#f0fdf4"
|
| 190 |
+
};border:1px solid {
|
| 191 |
+
"#fecaca" if result["risk_winner"] != "tie" else "#bbf7d0"
|
| 192 |
+
};margin-bottom:16px;text-align:center;">
|
| 193 |
+
<span style="font-size:14px;font-weight:600;color:{
|
| 194 |
+
"#dc2626" if result["risk_winner"] != "tie" else "#16a34a"
|
| 195 |
+
};">βοΈ {result["risk_delta"]}</span>
|
| 196 |
+
</div>
|
| 197 |
+
'''
|
| 198 |
+
|
| 199 |
+
# Modified clauses
|
| 200 |
+
if result["modified_clauses"]:
|
| 201 |
+
html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">π Modified Clauses</h3>'
|
| 202 |
+
for m in result["modified_clauses"][:20]:
|
| 203 |
+
html += f'''
|
| 204 |
+
<div style="border:1px solid #e5e7eb;border-radius:6px;padding:10px;margin-bottom:8px;">
|
| 205 |
+
<div style="font-size:11px;color:#6b7280;margin-bottom:4px;">{m["clause_type"].upper()} Β· Similarity: {m["similarity"]*100:.0f}%</div>
|
| 206 |
+
<div style="display:grid;grid-template-columns:1fr 1fr;gap:8px;">
|
| 207 |
+
<div style="background:#fef2f2;padding:6px;border-radius:4px;font-size:12px;color:#991b1b;">{m["clause_a"][:150]}...</div>
|
| 208 |
+
<div style="background:#f0fdf4;padding:6px;border-radius:4px;font-size:12px;color:#166534;">{m["clause_b"][:150]}...</div>
|
| 209 |
+
</div>
|
| 210 |
+
</div>
|
| 211 |
+
'''
|
| 212 |
+
html += '</div>'
|
| 213 |
+
|
| 214 |
+
# Added clauses
|
| 215 |
+
if result["added_clauses"]:
|
| 216 |
+
html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">β Added in Contract B</h3>'
|
| 217 |
+
for a in result["added_clauses"][:15]:
|
| 218 |
+
html += f'<div style="background:#f0fdf4;padding:8px;border-radius:4px;font-size:12px;color:#166534;margin-bottom:4px;border-left:3px solid #22c55e;"><b>{a["type"].upper()}</b> Β· {a["text"][:150]}...</div>'
|
| 219 |
+
html += '</div>'
|
| 220 |
+
|
| 221 |
+
# Removed clauses
|
| 222 |
+
if result["removed_clauses"]:
|
| 223 |
+
html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">β Removed from Contract A</h3>'
|
| 224 |
+
for r in result["removed_clauses"][:15]:
|
| 225 |
+
html += f'<div style="background:#fef2f2;padding:8px;border-radius:4px;font-size:12px;color:#991b1b;margin-bottom:4px;border-left:3px solid #ef4444;"><b>{r["type"].upper()}</b> Β· {r["text"][:150]}...</div>'
|
| 226 |
+
html += '</div>'
|
| 227 |
+
|
| 228 |
+
html += '</div>'
|
| 229 |
+
return html
|
compliance.py
ADDED
|
@@ -0,0 +1,245 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
ClauseGuard β Compliance Checker
|
| 3 |
+
ββββββββββββββββββββββββββββββββ
|
| 4 |
+
Check contracts against regulatory frameworks:
|
| 5 |
+
β’ GDPR (EU General Data Protection Regulation)
|
| 6 |
+
β’ CCPA (California Consumer Privacy Act)
|
| 7 |
+
β’ SOX (Sarbanes-Oxley)
|
| 8 |
+
β’ HIPAA (Health Insurance Portability and Accountability Act)
|
| 9 |
+
β’ FINRA (Financial Industry Regulatory Authority)
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
import re
|
| 13 |
+
from collections import defaultdict
|
| 14 |
+
|
| 15 |
+
# Regulatory requirement definitions
|
| 16 |
+
REGULATIONS = {
|
| 17 |
+
"GDPR": {
|
| 18 |
+
"description": "EU General Data Protection Regulation (Regulation 2016/679)",
|
| 19 |
+
"requirements": {
|
| 20 |
+
"lawful_basis": {
|
| 21 |
+
"keywords": ["lawful basis", "legal basis", "legitimate interest", "consent", "performance of contract", "legal obligation"],
|
| 22 |
+
"description": "Must specify lawful basis for data processing (Art. 6)",
|
| 23 |
+
"severity": "HIGH",
|
| 24 |
+
},
|
| 25 |
+
"data_subject_rights": {
|
| 26 |
+
"keywords": ["right to access", "right to erasure", "right to be forgotten", "data portability", "rectification", "object to processing"],
|
| 27 |
+
"description": "Must acknowledge data subject rights (Arts. 15-22)",
|
| 28 |
+
"severity": "HIGH",
|
| 29 |
+
},
|
| 30 |
+
"data_breach_notification": {
|
| 31 |
+
"keywords": ["data breach", "breach notification", "notify supervisory authority", "72 hours"],
|
| 32 |
+
"description": "Must include data breach notification obligations (Art. 33)",
|
| 33 |
+
"severity": "MEDIUM",
|
| 34 |
+
},
|
| 35 |
+
"data_protection_officer": {
|
| 36 |
+
"keywords": ["data protection officer", "DPO"],
|
| 37 |
+
"description": "Should reference Data Protection Officer if applicable (Art. 37)",
|
| 38 |
+
"severity": "LOW",
|
| 39 |
+
},
|
| 40 |
+
"cross_border_transfer": {
|
| 41 |
+
"keywords": ["standard contractual clauses", "SCCs", "adequacy decision", "transfer mechanism", "third country"],
|
| 42 |
+
"description": "Must specify transfer safeguards for cross-border data (Arts. 44-49)",
|
| 43 |
+
"severity": "HIGH",
|
| 44 |
+
},
|
| 45 |
+
"privacy_by_design": {
|
| 46 |
+
"keywords": ["privacy by design", "privacy by default", "data minimization", "purpose limitation"],
|
| 47 |
+
"description": "Should reference privacy-by-design principles (Art. 25)",
|
| 48 |
+
"severity": "MEDIUM",
|
| 49 |
+
},
|
| 50 |
+
},
|
| 51 |
+
},
|
| 52 |
+
"CCPA": {
|
| 53 |
+
"description": "California Consumer Privacy Act (Cal. Civ. Code Β§ 1798.100 et seq.)",
|
| 54 |
+
"requirements": {
|
| 55 |
+
"consumer_rights": {
|
| 56 |
+
"keywords": ["right to know", "right to delete", "right to opt out", "right to non-discrimination", "consumer rights"],
|
| 57 |
+
"description": "Must acknowledge California consumer rights",
|
| 58 |
+
"severity": "HIGH",
|
| 59 |
+
},
|
| 60 |
+
"data_categories": {
|
| 61 |
+
"keywords": ["categories of personal information", "personal information categories", "identifiers", "commercial information"],
|
| 62 |
+
"description": "Must disclose categories of personal information collected",
|
| 63 |
+
"severity": "HIGH",
|
| 64 |
+
},
|
| 65 |
+
"sale_of_data": {
|
| 66 |
+
"keywords": ["do not sell my personal information", "opt-out of sale", "sale of personal information"],
|
| 67 |
+
"description": "Must provide opt-out mechanism for data sales",
|
| 68 |
+
"severity": "HIGH",
|
| 69 |
+
},
|
| 70 |
+
"service_providers": {
|
| 71 |
+
"keywords": ["service provider", "third party", "contractor", "business purpose"],
|
| 72 |
+
"description": "Should limit data use to business/service provider purposes",
|
| 73 |
+
"severity": "MEDIUM",
|
| 74 |
+
},
|
| 75 |
+
},
|
| 76 |
+
},
|
| 77 |
+
"SOX": {
|
| 78 |
+
"description": "Sarbanes-Oxley Act (US, 2002)",
|
| 79 |
+
"requirements": {
|
| 80 |
+
"internal_controls": {
|
| 81 |
+
"keywords": ["internal controls", "internal control over financial reporting", "ICFR"],
|
| 82 |
+
"description": "Must reference internal controls over financial reporting (Β§ 404)",
|
| 83 |
+
"severity": "HIGH",
|
| 84 |
+
},
|
| 85 |
+
"audit_committee": {
|
| 86 |
+
"keywords": ["audit committee", "independent auditor", "PCAOB"],
|
| 87 |
+
"description": "Should reference audit committee oversight",
|
| 88 |
+
"severity": "MEDIUM",
|
| 89 |
+
},
|
| 90 |
+
"whistleblower": {
|
| 91 |
+
"keywords": ["whistleblower", "anonymous reporting", "reporting hotline", "retaliation"],
|
| 92 |
+
"description": "Should protect whistleblower provisions (Β§ 806)",
|
| 93 |
+
"severity": "HIGH",
|
| 94 |
+
},
|
| 95 |
+
"document_retention": {
|
| 96 |
+
"keywords": ["document retention", "record retention", "retention policy", "preserve records"],
|
| 97 |
+
"description": "Must include document retention obligations (Β§ 802)",
|
| 98 |
+
"severity": "HIGH",
|
| 99 |
+
},
|
| 100 |
+
},
|
| 101 |
+
},
|
| 102 |
+
"HIPAA": {
|
| 103 |
+
"description": "Health Insurance Portability and Accountability Act (US, 1996)",
|
| 104 |
+
"requirements": {
|
| 105 |
+
"phi_protection": {
|
| 106 |
+
"keywords": ["protected health information", "PHI", "health information", "ePHI"],
|
| 107 |
+
"description": "Must protect PHI and limit uses/disclosures",
|
| 108 |
+
"severity": "CRITICAL",
|
| 109 |
+
},
|
| 110 |
+
"business_associate": {
|
| 111 |
+
"keywords": ["business associate agreement", "BAA", "business associate", "covered entity"],
|
| 112 |
+
"description": "Should reference Business Associate Agreement (Β§ 164.504(e))",
|
| 113 |
+
"severity": "HIGH",
|
| 114 |
+
},
|
| 115 |
+
"security_safeguards": {
|
| 116 |
+
"keywords": ["administrative safeguards", "technical safeguards", "physical safeguards", "encryption", "access controls"],
|
| 117 |
+
"description": "Must implement security safeguards (Β§ 164.308-312)",
|
| 118 |
+
"severity": "HIGH",
|
| 119 |
+
},
|
| 120 |
+
"breach_notification": {
|
| 121 |
+
"keywords": ["breach notification", "notification of breach", "unauthorized access"],
|
| 122 |
+
"description": "Must include breach notification obligations (Β§ 164.400-414)",
|
| 123 |
+
"severity": "HIGH",
|
| 124 |
+
},
|
| 125 |
+
},
|
| 126 |
+
},
|
| 127 |
+
"FINRA": {
|
| 128 |
+
"description": "Financial Industry Regulatory Authority (US)",
|
| 129 |
+
"requirements": {
|
| 130 |
+
"recordkeeping": {
|
| 131 |
+
"keywords": ["recordkeeping", "books and records", "retain records", "SEC Rule 17a-4"],
|
| 132 |
+
"description": "Must comply with recordkeeping rules (FINRA Rule 4511)",
|
| 133 |
+
"severity": "HIGH",
|
| 134 |
+
},
|
| 135 |
+
"supervision": {
|
| 136 |
+
"keywords": ["supervision", "supervisory system", "review and approval"],
|
| 137 |
+
"description": "Should reference supervisory obligations (FINRA Rule 3110)",
|
| 138 |
+
"severity": "MEDIUM",
|
| 139 |
+
},
|
| 140 |
+
"anti_money_laundering": {
|
| 141 |
+
"keywords": ["anti-money laundering", "AML", "suspicious activity", "SAR", "OFAC"],
|
| 142 |
+
"description": "Must reference AML compliance (FINRA Rule 3310)",
|
| 143 |
+
"severity": "HIGH",
|
| 144 |
+
},
|
| 145 |
+
"privacy": {
|
| 146 |
+
"keywords": ["privacy policy", "customer information", "Regulation S-P", "nonpublic personal information"],
|
| 147 |
+
"description": "Must protect customer information (Regulation S-P)",
|
| 148 |
+
"severity": "HIGH",
|
| 149 |
+
},
|
| 150 |
+
},
|
| 151 |
+
},
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
RISK_STYLES = {
|
| 155 |
+
"CRITICAL": ("#dc2626", "#fef2f2"),
|
| 156 |
+
"HIGH": ("#ea580c", "#fff7ed"),
|
| 157 |
+
"MEDIUM": ("#ca8a04", "#fefce8"),
|
| 158 |
+
"LOW": ("#16a34a", "#f0fdf4"),
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
def check_compliance(text):
|
| 163 |
+
"""Check contract text against all regulatory frameworks."""
|
| 164 |
+
text_lower = text.lower()
|
| 165 |
+
results = {}
|
| 166 |
+
|
| 167 |
+
for reg_name, reg_data in REGULATIONS.items():
|
| 168 |
+
checks = []
|
| 169 |
+
for req_name, req_data in reg_data["requirements"].items():
|
| 170 |
+
matched = False
|
| 171 |
+
matched_keywords = []
|
| 172 |
+
for kw in req_data["keywords"]:
|
| 173 |
+
if kw.lower() in text_lower:
|
| 174 |
+
matched = True
|
| 175 |
+
matched_keywords.append(kw)
|
| 176 |
+
checks.append({
|
| 177 |
+
"requirement": req_name,
|
| 178 |
+
"description": req_data["description"],
|
| 179 |
+
"severity": req_data["severity"],
|
| 180 |
+
"status": "PASS" if matched else "MISSING",
|
| 181 |
+
"matched_keywords": matched_keywords,
|
| 182 |
+
})
|
| 183 |
+
|
| 184 |
+
passed = sum(1 for c in checks if c["status"] == "PASS")
|
| 185 |
+
total = len(checks)
|
| 186 |
+
compliance_rate = round(passed / total * 100) if total > 0 else 0
|
| 187 |
+
|
| 188 |
+
results[reg_name] = {
|
| 189 |
+
"description": reg_data["description"],
|
| 190 |
+
"compliance_rate": compliance_rate,
|
| 191 |
+
"checks": checks,
|
| 192 |
+
"overall_status": "COMPLIANT" if compliance_rate >= 80 else "PARTIAL" if compliance_rate >= 40 else "NON-COMPLIANT",
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
return results
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
def render_compliance_html(results):
|
| 199 |
+
"""Render compliance results as HTML for Gradio."""
|
| 200 |
+
html = '<div style="font-family:system-ui,sans-serif;">'
|
| 201 |
+
|
| 202 |
+
for reg_name, reg_result in results.items():
|
| 203 |
+
rate = reg_result["compliance_rate"]
|
| 204 |
+
status = reg_result["overall_status"]
|
| 205 |
+
status_color = "#16a34a" if status == "COMPLIANT" else "#ca8a04" if status == "PARTIAL" else "#dc2626"
|
| 206 |
+
status_bg = "#f0fdf4" if status == "COMPLIANT" else "#fefce8" if status == "PARTIAL" else "#fef2f2"
|
| 207 |
+
|
| 208 |
+
html += f'''
|
| 209 |
+
<div style="border:1px solid #e5e7eb;border-radius:10px;margin-bottom:16px;overflow:hidden;">
|
| 210 |
+
<div style="display:flex;justify-content:space-between;align-items:center;padding:12px 16px;background:{status_bg};border-bottom:1px solid #e5e7eb;">
|
| 211 |
+
<div>
|
| 212 |
+
<span style="font-size:16px;font-weight:700;color:#1f2937;">{reg_name}</span>
|
| 213 |
+
<p style="font-size:11px;color:#6b7280;margin:2px 0 0 0;">{reg_result["description"]}</p>
|
| 214 |
+
</div>
|
| 215 |
+
<div style="text-align:right;">
|
| 216 |
+
<div style="font-size:24px;font-weight:700;color:{status_color};">{rate}%</div>
|
| 217 |
+
<div style="font-size:11px;color:{status_color};font-weight:500;">{status}</div>
|
| 218 |
+
</div>
|
| 219 |
+
</div>
|
| 220 |
+
<div style="padding:8px 16px;">
|
| 221 |
+
'''
|
| 222 |
+
|
| 223 |
+
for check in reg_result["checks"]:
|
| 224 |
+
color, bg = RISK_STYLES[check["severity"]]
|
| 225 |
+
status_icon = "β
" if check["status"] == "PASS" else "β"
|
| 226 |
+
status_text = "Found" if check["status"] == "PASS" else "Missing"
|
| 227 |
+
keywords = ", ".join(check["matched_keywords"][:3]) if check["matched_keywords"] else "β"
|
| 228 |
+
|
| 229 |
+
html += f'''
|
| 230 |
+
<div style="display:flex;justify-content:space-between;align-items:flex-start;padding:8px 0;border-bottom:1px solid #f3f4f6;">
|
| 231 |
+
<div style="flex:1;">
|
| 232 |
+
<div style="font-size:12px;font-weight:500;color:#374151;">{check["description"]}</div>
|
| 233 |
+
<div style="font-size:10px;color:#9ca3af;margin-top:2px;">Keywords: {keywords}</div>
|
| 234 |
+
</div>
|
| 235 |
+
<div style="display:flex;align-items:center;gap:6px;margin-left:8px;">
|
| 236 |
+
<span style="font-size:10px;color:{color};font-weight:600;background:{bg};padding:2px 8px;border-radius:4px;">{check["severity"]}</span>
|
| 237 |
+
<span style="font-size:13px;">{status_icon}</span>
|
| 238 |
+
</div>
|
| 239 |
+
</div>
|
| 240 |
+
'''
|
| 241 |
+
|
| 242 |
+
html += '</div></div>'
|
| 243 |
+
|
| 244 |
+
html += '</div>'
|
| 245 |
+
return html
|
obligations.py
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
ClauseGuard β Obligation Tracker
|
| 3 |
+
βββββββββββββββββββββββββββββββ
|
| 4 |
+
Extract action items, deadlines, and obligations from contracts.
|
| 5 |
+
Categorize: monetary, compliance, reporting, delivery
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import re
|
| 9 |
+
from collections import defaultdict
|
| 10 |
+
from datetime import datetime, timedelta
|
| 11 |
+
|
| 12 |
+
# Obligation keywords by category
|
| 13 |
+
OBLIGATION_PATTERNS = {
|
| 14 |
+
"monetary": [
|
| 15 |
+
r"(?:shall|must|will|agrees? to)\s+pay\s+(?:\$?[\d,]+(?:\.\d{2})?)",
|
| 16 |
+
r"(?:fee|payment|compensation|reimburs(?:e|ement))\s+of\s+(?:\$?[\d,]+(?:\.\d{2})?)",
|
| 17 |
+
r"(?:shall|must|will)\s+remit\s+(?:\$?[\d,]+(?:\.\d{2})?)",
|
| 18 |
+
r"(?:annual|monthly|quarterly)\s+(?:fee|payment)\s+of",
|
| 19 |
+
r"(?:liquidated damages|penalty)\s+of\s+(?:\$?[\d,]+(?:\.\d{2})?)",
|
| 20 |
+
],
|
| 21 |
+
"compliance": [
|
| 22 |
+
r"(?:shall|must|will)\s+comply\s+with",
|
| 23 |
+
r"(?:shall|must|will)\s+adhere\s+to",
|
| 24 |
+
r"(?:shall|must|will)\s+conform\s+to",
|
| 25 |
+
r"(?:shall|must|will)\s+follow\s+(?:the|all)\s+(?:applicable|relevant)\s+(?:laws|regulations|standards)",
|
| 26 |
+
r"(?:GDPR|CCPA|HIPAA|SOX|PCI-DSS|ISO\s+\d+)",
|
| 27 |
+
r"(?:confidential|privacy|data protection)",
|
| 28 |
+
r"(?:shall|must|will)\s+obtain\s+(?:necessary|required)\s+(?:approvals?|permits?|licenses?)",
|
| 29 |
+
r"(?:shall|must|will)\s+maintain\s+(?:insurance|coverage|bond)",
|
| 30 |
+
],
|
| 31 |
+
"reporting": [
|
| 32 |
+
r"(?:shall|must|will)\s+report",
|
| 33 |
+
r"(?:shall|must|will)\s+provide\s+(?:regular|monthly|quarterly|annual)\s+(?:reports?|updates?|status)",
|
| 34 |
+
r"(?:shall|must|will)\s+notify",
|
| 35 |
+
r"(?:shall|must|will)\s+inform",
|
| 36 |
+
r"(?:shall|must|will)\s+deliver\s+(?:a|an|the)\s+report",
|
| 37 |
+
r"(?:audit|inspection)\s+(?:reports?|rights?)",
|
| 38 |
+
],
|
| 39 |
+
"delivery": [
|
| 40 |
+
r"(?:shall|must|will)\s+deliver",
|
| 41 |
+
r"(?:shall|must|will)\s+provide",
|
| 42 |
+
r"(?:shall|must|will)\s+furnish",
|
| 43 |
+
r"(?:shall|must|will)\s+supply",
|
| 44 |
+
r"(?:shall|must|will)\s+submit",
|
| 45 |
+
r"(?:delivery|performance)\s+(?:date|schedule|timeline)",
|
| 46 |
+
r"(?:within|no later than|by)\s+(?:\d+)\s+(?:days?|weeks?|months?|years?)",
|
| 47 |
+
],
|
| 48 |
+
"termination": [
|
| 49 |
+
r"(?:shall|must|will)\s+return",
|
| 50 |
+
r"(?:shall|must|will)\s+destroy",
|
| 51 |
+
r"(?:shall|must|will)\s+cease",
|
| 52 |
+
r"(?:upon|after)\s+termination",
|
| 53 |
+
r"(?:post-termination|surviving)\s+obligations?",
|
| 54 |
+
],
|
| 55 |
+
}
|
| 56 |
+
|
| 57 |
+
# Timeframe extraction
|
| 58 |
+
TIME_PATTERNS = [
|
| 59 |
+
(r"within\s+(\d+)\s+(day|week|month|year)s?", "relative"),
|
| 60 |
+
(r"no\s+later\s+than\s+(\d+)\s+(day|week|month|year)s?", "relative"),
|
| 61 |
+
(r"within\s+(\d+)\s+business\s+days?", "business_days"),
|
| 62 |
+
(r"by\s+([A-Z][a-z]+\s+\d{1,2},?\s+\d{4})", "absolute"),
|
| 63 |
+
(r"on\s+or\s+before\s+([A-Z][a-z]+\s+\d{1,2},?\s+\d{4})", "absolute"),
|
| 64 |
+
(r"(\d{1,2}/\d{1,2}/\d{2,4})", "absolute_date"),
|
| 65 |
+
(r"(\d{1,2}-\d{1,2}-\d{2,4})", "absolute_date"),
|
| 66 |
+
]
|
| 67 |
+
|
| 68 |
+
PARTY_PATTERNS = [
|
| 69 |
+
r"\b(?:Party A|Party B|Disclosing Party|Receiving Party|Licensor|Licensee|Buyer|Seller|Tenant|Landlord|Employer|Employee|Company|Customer|Vendor|Client)\b",
|
| 70 |
+
r"\b[A-Z][A-Za-z0-9\s&]+(?:Inc\.?|LLC|Ltd\.?|Limited|Corp\.?|Corporation|PLC|GmbH|AG|S\.A\.?|B\.V\.)\b",
|
| 71 |
+
]
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
def extract_obligations(text):
|
| 75 |
+
"""Extract obligations from contract text."""
|
| 76 |
+
obligations = []
|
| 77 |
+
|
| 78 |
+
# Split into sentences
|
| 79 |
+
sentences = re.split(r'(?<=[.!?])\s+(?=[A-Z])', text)
|
| 80 |
+
|
| 81 |
+
for sentence in sentences:
|
| 82 |
+
sentence = sentence.strip()
|
| 83 |
+
if len(sentence) < 30:
|
| 84 |
+
continue
|
| 85 |
+
|
| 86 |
+
found_types = set()
|
| 87 |
+
for otype, patterns in OBLIGATION_PATTERNS.items():
|
| 88 |
+
for pat in patterns:
|
| 89 |
+
if re.search(pat, sentence, re.IGNORECASE):
|
| 90 |
+
found_types.add(otype)
|
| 91 |
+
break
|
| 92 |
+
|
| 93 |
+
if not found_types:
|
| 94 |
+
continue
|
| 95 |
+
|
| 96 |
+
# Extract party
|
| 97 |
+
party = "Unknown"
|
| 98 |
+
for pp in PARTY_PATTERNS:
|
| 99 |
+
m = re.search(pp, sentence)
|
| 100 |
+
if m:
|
| 101 |
+
party = m.group(0)
|
| 102 |
+
break
|
| 103 |
+
|
| 104 |
+
# Extract timeframe
|
| 105 |
+
deadline = "Not specified"
|
| 106 |
+
for pat, ptype in TIME_PATTERNS:
|
| 107 |
+
m = re.search(pat, sentence, re.IGNORECASE)
|
| 108 |
+
if m:
|
| 109 |
+
if ptype == "relative":
|
| 110 |
+
num = m.group(1)
|
| 111 |
+
unit = m.group(2)
|
| 112 |
+
deadline = f"Within {num} {unit}(s)"
|
| 113 |
+
elif ptype == "business_days":
|
| 114 |
+
num = m.group(1)
|
| 115 |
+
deadline = f"Within {num} business day(s)"
|
| 116 |
+
elif ptype in ("absolute", "absolute_date"):
|
| 117 |
+
deadline = m.group(1)
|
| 118 |
+
break
|
| 119 |
+
|
| 120 |
+
for otype in found_types:
|
| 121 |
+
obligations.append({
|
| 122 |
+
"type": otype,
|
| 123 |
+
"party": party,
|
| 124 |
+
"description": sentence[:250] + ("..." if len(sentence) > 250 else ""),
|
| 125 |
+
"deadline": deadline,
|
| 126 |
+
"full_text": sentence,
|
| 127 |
+
})
|
| 128 |
+
|
| 129 |
+
return obligations
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
def render_obligations_html(obligations):
|
| 133 |
+
"""Render obligations as HTML cards for Gradio."""
|
| 134 |
+
if not obligations:
|
| 135 |
+
return '<div style="padding:16px;color:#6b7280;text-align:center;">No obligations detected.</div>'
|
| 136 |
+
|
| 137 |
+
# Group by type
|
| 138 |
+
grouped = defaultdict(list)
|
| 139 |
+
for ob in obligations:
|
| 140 |
+
grouped[ob["type"]].append(ob)
|
| 141 |
+
|
| 142 |
+
type_icons = {
|
| 143 |
+
"monetary": "π°",
|
| 144 |
+
"compliance": "βοΈ",
|
| 145 |
+
"reporting": "π",
|
| 146 |
+
"delivery": "π¦",
|
| 147 |
+
"termination": "π",
|
| 148 |
+
}
|
| 149 |
+
type_colors = {
|
| 150 |
+
"monetary": "#22c55e",
|
| 151 |
+
"compliance": "#f59e0b",
|
| 152 |
+
"reporting": "#3b82f6",
|
| 153 |
+
"delivery": "#8b5cf6",
|
| 154 |
+
"termination": "#ef4444",
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
html = '<div style="font-family:system-ui,sans-serif;">'
|
| 158 |
+
|
| 159 |
+
# Summary counts
|
| 160 |
+
html += '<div style="display:grid;grid-template-columns:repeat(auto-fit,minmax(120px,1fr));gap:8px;margin-bottom:16px;">'
|
| 161 |
+
for otype, obs in sorted(grouped.items()):
|
| 162 |
+
color = type_colors.get(otype, "#6b7280")
|
| 163 |
+
icon = type_icons.get(otype, "π")
|
| 164 |
+
html += f'''
|
| 165 |
+
<div style="text-align:center;padding:10px;border-radius:8px;background:{color}15;border:1px solid {color}30;">
|
| 166 |
+
<div style="font-size:20px;">{icon}</div>
|
| 167 |
+
<div style="font-size:20px;font-weight:700;color:{color};">{len(obs)}</div>
|
| 168 |
+
<div style="font-size:11px;color:{color};text-transform:capitalize;">{otype}</div>
|
| 169 |
+
</div>
|
| 170 |
+
'''
|
| 171 |
+
html += '</div>'
|
| 172 |
+
|
| 173 |
+
# Individual cards
|
| 174 |
+
for otype, obs in sorted(grouped.items()):
|
| 175 |
+
color = type_colors.get(otype, "#6b7280")
|
| 176 |
+
icon = type_icons.get(otype, "π")
|
| 177 |
+
html += f'<h3 style="font-size:14px;color:#374151;margin:16px 0 8px 0;border-bottom:2px solid {color}30;padding-bottom:4px;">{icon} {otype.title()} Obligations</h3>'
|
| 178 |
+
for ob in obs:
|
| 179 |
+
html += f'''
|
| 180 |
+
<div style="border:1px solid #e5e7eb;border-left:4px solid {color};border-radius:6px;padding:10px;margin-bottom:8px;background:#fafafa;">
|
| 181 |
+
<div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:4px;">
|
| 182 |
+
<span style="font-size:12px;font-weight:600;color:{color};">{ob["party"]}</span>
|
| 183 |
+
<span style="font-size:11px;color:#6b7280;background:#f3f4f6;padding:2px 8px;border-radius:4px;">{ob["deadline"]}</span>
|
| 184 |
+
</div>
|
| 185 |
+
<p style="font-size:12px;color:#4b5563;margin:0;line-height:1.5;">{ob["description"]}</p>
|
| 186 |
+
</div>
|
| 187 |
+
'''
|
| 188 |
+
|
| 189 |
+
html += '</div>'
|
| 190 |
+
return html
|
requirements.txt
CHANGED
|
@@ -1,4 +1,11 @@
|
|
| 1 |
-
gradio>=5.0
|
| 2 |
-
transformers>=5.
|
| 3 |
-
torch
|
| 4 |
-
numpy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=5.23.0
|
| 2 |
+
transformers>=5.6.1
|
| 3 |
+
torch>=2.5.0
|
| 4 |
+
numpy>=2.0.0
|
| 5 |
+
pdfplumber>=0.11.0
|
| 6 |
+
python-docx>=1.1.0
|
| 7 |
+
spacy>=3.8.0
|
| 8 |
+
scikit-learn>=1.6.0
|
| 9 |
+
peft>=0.15.0
|
| 10 |
+
accelerate>=1.2.0
|
| 11 |
+
pandas>=2.2.0
|