anky2002 commited on
Commit
51ac51a
Β·
2 Parent(s): dbd088abe855a6

Merge branch 'main' of https://huggingface.co/spaces/gaurv007/ClauseGuard

Browse files
Files changed (6) hide show
  1. README.md +99 -0
  2. app.py +962 -137
  3. compare.py +229 -0
  4. compliance.py +245 -0
  5. obligations.py +190 -0
  6. requirements.txt +11 -4
README.md CHANGED
@@ -9,3 +9,102 @@ python_version: "3.12"
9
  app_file: app.py
10
  pinned: false
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  app_file: app.py
10
  pinned: false
11
  ---
12
+
13
+ # πŸ›‘οΈ ClauseGuard β€” World's Best Open-Source Legal Contract Analysis
14
+
15
+ **ClauseGuard** is the most comprehensive open-source AI-powered legal contract analysis tool. It analyzes contracts using state-of-the-art legal NLP models and provides actionable risk assessments.
16
+
17
+ ## ✨ Core Features
18
+
19
+ ### Analysis Engine
20
+ | Feature | Description |
21
+ |---------|-------------|
22
+ | **41 CUAD Clause Categories** | Full taxonomy: Document Name, Parties, Governing Law, Indemnification, Termination, Non-Compete, IP Ownership, Audit Rights, Force Majeure, and more |
23
+ | **4-Tier Risk Scoring** | Critical πŸ”΄ / High 🟠 / Medium 🟑 / Low 🟒 with visual risk matrix |
24
+ | **Legal NER** | Extracts parties, dates, monetary values ($), jurisdictions, defined terms, and party roles |
25
+ | **NLI Contradiction Detection** | Identifies conflicting clauses (e.g., uncapped + capped liability) and missing critical provisions |
26
+ | **Obligation Tracker** | Categorizes action items: monetary πŸ’°, compliance βš–οΈ, reporting πŸ“Š, delivery πŸ“¦, termination πŸ›‘ |
27
+ | **Compliance Checker** | Validates against GDPR, CCPA, SOX, HIPAA, and FINRA requirements |
28
+ | **Contract Comparison** | Side-by-side diff between two contracts with alignment scoring |
29
+
30
+ ### Document Support
31
+ - **PDF** parsing via `pdfplumber`
32
+ - **DOCX/DOC** parsing via `python-docx`
33
+ - **TXT / Markdown** direct text input
34
+
35
+ ### UI/UX
36
+ - **3-Panel Professional Layout** β€” Upload sidebar + Main analysis + Summary dashboard
37
+ - **Document Viewer** β€” Inline entity highlights (colored annotations)
38
+ - **Clause Cards** β€” Expandable risk-badged cards with confidence scores
39
+ - **Export Reports** β€” JSON (structured) and CSV (tabular) downloads
40
+ - **Color-Coded Risk Badges** β€” Instant visual triage
41
+
42
+ ## 🧠 Models & Architecture
43
+
44
+ | Component | Technology |
45
+ |-----------|------------|
46
+ | Clause Classification | `Mokshith31/legalbert-contract-clause-classification` β€” LoRA adapter on `nlpaueb/legal-bert-base-uncased`, fine-tuned on CUAD 41-class taxonomy |
47
+ | NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
48
+ | NLI | Heuristic contradiction detection with 5 conflict patterns + missing-clause detection |
49
+ | Compliance | Regulatory keyword matching across GDPR, CCPA, SOX, HIPAA, FINRA |
50
+ | Comparison | SequenceMatcher-based clause alignment with risk delta analysis |
51
+ | Obligations | Regex pattern matching across 5 obligation categories |
52
+
53
+ ## πŸ“Š Risk Scoring Methodology
54
+
55
+ Risk scores combine clause detection with weighted severity:
56
+ - **CRITICAL**: 40 pts (Uncapped Liability, Arbitration, IP Assignment, etc.)
57
+ - **HIGH**: 20 pts (Non-Compete, Exclusivity, Unilateral Change, etc.)
58
+ - **MEDIUM**: 10 pts (Governing Law, Jurisdiction, etc.)
59
+ - **LOW**: 3 pts (Document Name, Dates, etc.)
60
+
61
+ Final score normalized to 0-100 with letter grades:
62
+ - A (0-14): Low risk
63
+ - B (15-29): Moderate risk
64
+ - C (30-49): Elevated risk
65
+ - D (50-69): High risk
66
+ - F (70+): Critical risk
67
+
68
+ ## πŸ“š Datasets & Research
69
+
70
+ - [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β€” 510 contracts, 13K annotations, 41 clause categories
71
+ - [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β€” 322 legal reasoning tasks
72
+ - [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β€” Unfair Terms of Service classification
73
+ - Paper: [CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review](https://arxiv.org/abs/2103.06268) (Hendrycks et al., 2021)
74
+
75
+ ## πŸš€ Usage
76
+
77
+ 1. **Upload** a contract (PDF, DOCX, or TXT) or paste text directly
78
+ 2. Click **Analyze Contract**
79
+ 3. View results across tabs:
80
+ - **Document**: Full text with inline entity highlights
81
+ - **Clauses**: Detected clauses with risk badges
82
+ - **Entities**: Extracted parties, dates, money, jurisdictions
83
+ - **Contradictions**: Conflicting clauses and missing provisions
84
+ - **Obligations**: Action items categorized by type
85
+ - **Compliance**: Regulatory framework checks
86
+ 4. **Export** JSON/CSV reports
87
+
88
+ ## πŸ”€ Compare Contracts
89
+
90
+ Switch to the **Compare Contracts** tab to:
91
+ - Upload or paste two contracts side-by-side
92
+ - See clause-level diffs (added, removed, modified)
93
+ - Get an alignment score and risk delta
94
+ - View raw JSON comparison data
95
+
96
+ ## ⚠️ Disclaimer
97
+
98
+ *Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions. The tool may miss nuances and should be used as a preliminary screening aid, not a substitute for professional legal review.*
99
+
100
+ ## πŸ”— Links
101
+
102
+ - [ClauseGuard Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
103
+ - [Clause Classifier Model](https://huggingface.co/Mokshith31/legalbert-contract-clause-classification)
104
+ - [Legal-BERT Base](https://huggingface.co/nlpaueb/legal-bert-base-uncased)
105
+ - [CUAD Dataset](https://huggingface.co/datasets/theatticusproject/cuad-qa)
106
+ - [CUAD Paper (arXiv:2103.06268)](https://arxiv.org/abs/2103.06268)
107
+
108
+ ---
109
+
110
+ *Built with ❀️ using Gradio, Hugging Face Transformers, and Legal-BERT. Open source and free for all.*
app.py CHANGED
@@ -1,37 +1,327 @@
1
  """
2
- ClauseGuard β€” AI Fine Print Scanner
3
- Uses Legal-BERT fine-tuned on CLAUDETTE/LexGLUE unfair_tos (8 categories).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  """
5
 
6
- import gradio as gr
7
  import re
 
 
 
 
 
 
 
8
  import numpy as np
9
 
10
- # ─── Load ML model ───
11
- MODEL_ID = "gaurv007/clauseguard-legal-bert"
12
- ml_pipeline = None
 
 
 
 
 
 
 
 
 
13
 
 
14
  try:
15
- from transformers import pipeline
16
- ml_pipeline = pipeline("text-classification", model=MODEL_ID, top_k=None, device=-1)
17
- print(f"Loaded model: {MODEL_ID}")
18
- except Exception as e:
19
- print(f"Model load failed ({e}), using regex fallback")
20
-
21
- # ─── Label metadata ───
22
- LABELS = {
23
- "Limitation of liability": ("HIGH", "Company avoids responsibility for damages or losses."),
24
- "Unilateral termination": ("HIGH", "They can close your account without reason."),
25
- "Unilateral change": ("MEDIUM", "Terms can change without your consent."),
26
- "Content removal": ("MEDIUM", "Your content can be deleted without notice."),
27
- "Contract by using": ("LOW", "You agree just by visiting or using the site."),
28
- "Choice of law": ("MEDIUM", "Foreign law applies instead of your local protections."),
29
- "Jurisdiction": ("MEDIUM", "Disputes handled in their preferred court, not yours."),
30
- "Arbitration": ("HIGH", "You waive your right to sue in court."),
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  }
32
 
33
- # ─── Regex fallback ───
34
- PATTERNS = {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  "Limitation of liability": [r"not liable", r"shall not be (liable|responsible)", r"in no event.*liable", r"limitation of liability", r"without warranty", r"disclaim"],
36
  "Unilateral termination": [r"terminat.*at any time", r"suspend.*account.*without", r"we may (terminat|suspend|discontinu)", r"right to (terminat|suspend)"],
37
  "Unilateral change": [r"sole discretion", r"reserves? the right to (modify|change|update|amend)", r"at any time.*without (prior )?notice", r"we may (modify|change|update)"],
@@ -40,115 +330,454 @@ PATTERNS = {
40
  "Choice of law": [r"governed by.*laws? of", r"shall be governed", r"laws of the state of"],
41
  "Jurisdiction": [r"exclusive jurisdiction", r"courts? of.*(california|delaware|new york|ireland|england)", r"submit to.*jurisdiction"],
42
  "Arbitration": [r"arbitrat", r"binding arbitration", r"waive.*right.*court", r"class action waiver"],
 
 
 
 
 
 
 
 
 
 
 
43
  }
44
 
45
- def classify_ml(text):
46
- """Classify using the trained Legal-BERT model."""
47
- if not ml_pipeline:
48
- return classify_regex(text)
49
- try:
50
- preds = ml_pipeline(text, truncation=True, max_length=512)
51
- results = []
52
- for p in preds[0] if isinstance(preds[0], list) else preds:
53
- if p["score"] > 0.5 and p["label"] in LABELS:
54
- sev, desc = LABELS[p["label"]]
55
- results.append({"name": p["label"], "severity": sev, "desc": desc, "confidence": round(p["score"], 2)})
56
- return results
57
- except Exception:
58
- return classify_regex(text)
59
-
60
- def classify_regex(text):
61
- """Fallback regex classifier."""
62
- results = []
63
  text_lower = text.lower()
64
- for name, pats in PATTERNS.items():
65
- for p in pats:
66
- if re.search(p, text_lower):
67
- sev, desc = LABELS[name]
68
- results.append({"name": name, "severity": sev, "desc": desc, "confidence": 0.7})
 
 
 
 
 
 
 
 
 
69
  break
70
  return results
71
 
72
- def split_clauses(text):
73
- text = re.sub(r'\n{2,}', '\n', text.strip())
74
- parts = re.split(r'(?<=[.!?])\s+(?=[A-Z0-9(])|(?:\n)(?=\d+[.)]\s|\([a-z]\)\s)', text)
75
- return [c.strip() for c in parts if len(c.strip()) > 30]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
- def analyze(text):
78
- if not text or len(text.strip()) < 50:
79
- return "", ""
80
 
 
 
 
81
  clauses = split_clauses(text)
82
  if not clauses:
83
- return "", ""
84
-
85
- flagged = []
86
- sev_counts = {"HIGH": 0, "MEDIUM": 0, "LOW": 0}
87
-
88
  for clause in clauses:
89
- hits = classify_ml(clause)
90
- if hits:
91
- flagged.append({"text": clause, "hits": hits})
92
- for h in hits:
93
- sev_counts[h["severity"]] += 1
94
-
95
- total = len(clauses)
96
- risk = min(100, round((sev_counts["HIGH"] * 20 + sev_counts["MEDIUM"] * 10 + sev_counts["LOW"] * 5) / max(1, total) * 100))
97
-
98
- if risk >= 60: grade = "F"
99
- elif risk >= 40: grade = "D"
100
- elif risk >= 20: grade = "C"
101
- elif risk >= 10: grade = "B"
102
- else: grade = "A"
103
-
104
- engine = "Legal-BERT" if ml_pipeline else "Pattern matching"
105
-
106
- # Build HTML
107
- summary = f"""<div style="font-family:system-ui,sans-serif;">
108
- <div style="border:1px solid #e4e4e7;border-radius:8px;padding:20px;margin-bottom:16px;">
109
- <div style="display:flex;justify-content:space-between;align-items:baseline;">
110
- <div>
111
- <span style="font-size:32px;font-weight:600;">{risk}</span>
112
- <span style="font-size:13px;color:#a1a1aa;">/100 risk</span>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  </div>
114
- <span style="font-size:13px;font-weight:500;padding:2px 10px;border-radius:4px;{
115
- 'background:#fef2f2;color:#b91c1c;' if grade in ('F','D') else
116
- 'background:#fffbeb;color:#a16207;' if grade == 'C' else
117
- 'background:#f0fdf4;color:#15803d;'
118
- }">Grade {grade}</span>
119
  </div>
120
- <p style="margin-top:8px;font-size:12px;color:#a1a1aa;">{total} clauses Β· {len(flagged)} flagged Β· {sev_counts['HIGH']} high Β· {sev_counts['MEDIUM']} medium Β· {sev_counts['LOW']} low Β· Engine: {engine}</p>
121
- </div>"""
122
-
123
- if not flagged:
124
- summary += '<div style="border:1px solid #e4e4e7;border-radius:8px;padding:24px;text-align:center;"><p style="font-size:14px;color:#71717a;">No unfair clauses found.</p></div>'
125
- else:
126
- for item in flagged:
127
- max_sev = max(item["hits"], key=lambda h: {"HIGH":3,"MEDIUM":2,"LOW":1}[h["severity"]])["severity"]
128
- border = {"HIGH":"#fca5a5","MEDIUM":"#fcd34d","LOW":"#93c5fd"}[max_sev]
129
-
130
- tags = ""
131
- for h in item["hits"]:
132
- ts = {"HIGH":"background:#fef2f2;color:#b91c1c;border:1px solid #fecaca;",
133
- "MEDIUM":"background:#fffbeb;color:#a16207;border:1px solid #fde68a;",
134
- "LOW":"background:#eff6ff;color:#1d4ed8;border:1px solid #bfdbfe;"}[h["severity"]]
135
- conf = f' ({h["confidence"]})' if h.get("confidence") and ml_pipeline else ""
136
- tags += f'<span style="{ts}font-size:11px;font-weight:500;padding:1px 8px;border-radius:3px;margin-right:4px;">{h["name"]}{conf}</span>'
137
 
138
- descs = "".join(f'<p style="font-size:12px;color:#71717a;margin-top:4px;">{h["desc"]}</p>' for h in item["hits"])
139
- preview = item["text"][:200] + ("..." if len(item["text"]) > 200 else "")
140
-
141
- summary += f'''<div style="border:1px solid #e4e4e7;border-left:3px solid {border};border-radius:8px;padding:14px;margin-bottom:8px;">
142
- <p style="font-size:13px;color:#3f3f46;line-height:1.6;">{preview}</p>
143
- <div style="margin-top:8px;">{tags}</div>
144
- {descs}
145
- </div>'''
146
-
147
- summary += "</div>"
148
- return summary, ""
149
-
150
-
151
- SPOTIFY = """By using the Spotify Service, you agree to be bound by these Terms of Use.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  Spotify may, in its sole discretion, modify or update these Terms of Service at any time without prior notice. Your continued use of the Service after any such changes constitutes your acceptance of the new Terms of Service.
154
 
@@ -160,39 +789,235 @@ Spotify may terminate your account or suspend your access at any time, with or w
160
 
161
  These Terms will be governed by and construed in accordance with the laws of the State of New York.
162
 
163
- Any dispute shall be finally settled by arbitration in New York County."""
164
 
165
- RENTAL = """The Landlord reserves the right to enter the premises at any time without prior notice for inspection or any other purpose deemed necessary in their sole discretion.
166
 
167
  The Landlord shall not be liable for any damage to the Tenant's personal property, whether caused by water leaks, fire, theft, or any other cause, including the Landlord's own negligence.
168
 
169
  The Landlord may terminate this lease at any time with only 7 days written notice, for any reason or no reason at all.
170
 
171
- Any disputes arising from this lease agreement shall be resolved exclusively in the courts of the Landlord's choosing, and the Tenant waives the right to a jury trial.
172
 
173
  The Landlord reserves the right to modify the terms of this lease at any time. Continued occupancy constitutes acceptance of the new terms."""
174
 
175
- demo = gr.Blocks(title="ClauseGuard")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
- with demo:
178
- gr.HTML('<div style="font-family:system-ui,sans-serif;padding:16px 0;"><h1 style="font-size:20px;font-weight:600;margin:0;">ClauseGuard</h1><p style="font-size:13px;color:#a1a1aa;margin-top:2px;">Paste a Terms of Service, contract, or lease. Get a risk breakdown.</p></div>')
179
 
180
- with gr.Row():
181
- with gr.Column(scale=1):
182
- text_input = gr.Textbox(label="Document text", placeholder="Paste here...", lines=14, max_lines=40)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
  with gr.Row():
184
- scan_btn = gr.Button("Scan", variant="primary")
185
- clear_btn = gr.Button("Clear", variant="secondary")
186
- gr.Examples(examples=[[SPOTIFY], [RENTAL]], inputs=[text_input], label="Examples")
 
 
187
 
188
- with gr.Column(scale=1):
189
- results_html = gr.HTML(label="Results")
190
- hidden = gr.HTML(visible=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
 
192
- scan_btn.click(fn=analyze, inputs=[text_input], outputs=[results_html, hidden])
193
- clear_btn.click(fn=lambda: ("", "", ""), outputs=[text_input, results_html, hidden])
 
 
 
194
 
195
- gr.HTML('<p style="font-family:system-ui,sans-serif;font-size:11px;color:#a1a1aa;text-align:center;padding:16px 0;border-top:1px solid #f4f4f5;margin-top:16px;">Not legal advice. Model: Legal-BERT fine-tuned on CLAUDETTE. <a href="https://huggingface.co/gaurv007/clauseguard-legal-bert" style="color:#71717a;">Model</a> Β· <a href="https://huggingface.co/datasets/coastalcph/lex_glue" style="color:#71717a;">Dataset</a></p>')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
196
 
197
  if __name__ == "__main__":
198
  demo.launch()
 
1
  """
2
+ ClauseGuard β€” World's Best Legal Contract Analysis Tool
3
+ ════════════════════════════════════════════════════════
4
+ Features:
5
+ β€’ 41 CUAD clause categories via fine-tuned Legal-BERT
6
+ β€’ 4-tier risk scoring (Critical / High / Medium / Low)
7
+ β€’ Legal NER: parties, dates, monetary values, jurisdictions, defined terms
8
+ β€’ NLI contradiction & missing-clause detection
9
+ β€’ Contract comparison engine (diff between 2 contracts)
10
+ β€’ Obligation tracker (monetary, compliance, reporting, delivery)
11
+ β€’ Compliance checker (GDPR, CCPA, SOX, HIPAA, FINRA)
12
+ β€’ PDF / DOCX / TXT parsing
13
+ β€’ Professional 3-panel Gradio UI
14
+ β€’ JSON & CSV export
15
+
16
+ Models:
17
+ β€’ Clause classifier: Mokshith31/legalbert-contract-clause-classification
18
+ (LoRA adapter on nlpaueb/legal-bert-base-uncased, 41 CUAD classes)
19
  """
20
 
21
+ import os
22
  import re
23
+ import json
24
+ import csv
25
+ import io
26
+ from collections import defaultdict
27
+ from datetime import datetime
28
+
29
+ import gradio as gr
30
  import numpy as np
31
 
32
+ # ── Document parsers (soft-fail) ────────────────────────────────────
33
+ try:
34
+ import pdfplumber
35
+ _HAS_PDF = True
36
+ except Exception:
37
+ _HAS_PDF = False
38
+
39
+ try:
40
+ from docx import Document as DocxDocument
41
+ _HAS_DOCX = True
42
+ except Exception:
43
+ _HAS_DOCX = False
44
 
45
+ # ── PyTorch / Transformers (soft-fail) ────────────────────────────────
46
  try:
47
+ import torch
48
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
49
+ from peft import PeftModel
50
+ _HAS_TORCH = True
51
+ except Exception:
52
+ _HAS_TORCH = False
53
+
54
+ # ── Import submodules ───────────────────────────────────────────────
55
+ from compare import compare_contracts, render_comparison_html
56
+ from obligations import extract_obligations, render_obligations_html
57
+ from compliance import check_compliance, render_compliance_html
58
+
59
+ # ═══════════════════════════════════════════════════════════════════════
60
+ # 1. CONFIGURATION
61
+ # ═══════════════════════════════════════════════════════════════════════
62
+
63
+ CUAD_LABELS = [
64
+ "Document Name", "Parties", "Agreement Date", "Effective Date",
65
+ "Expiration Date", "Renewal Term", "Governing Law", "Most Favored Nation",
66
+ "Non-Compete", "Exclusivity", "No-Solicit of Customers",
67
+ "No-Solicit of Employees", "Non-Disparagement",
68
+ "Termination for Convenience", "ROFR/ROFO/ROFN", "Change of Control",
69
+ "Anti-Assignment", "Revenue/Profit Sharing", "Price Restriction",
70
+ "Minimum Commitment", "Volume Restriction", "IP Ownership Assignment",
71
+ "Joint IP Ownership", "License Grant", "Non-Transferable License",
72
+ "Affiliate License-Licensor", "Affiliate License-Licensee",
73
+ "Unlimited/All-You-Can-Eat License", "Irrevocable or Perpetual License",
74
+ "Source Code Escrow", "Post-Termination Services", "Audit Rights",
75
+ "Uncapped Liability", "Cap on Liability", "Liquidated Damages",
76
+ "Warranty Duration", "Insurance", "Covenant Not to Sue",
77
+ "Third Party Beneficiary", "Other"
78
+ ]
79
+
80
+ _UNFAIR_LABELS = [
81
+ "Limitation of liability", "Unilateral termination", "Unilateral change",
82
+ "Content removal", "Contract by using", "Choice of law",
83
+ "Jurisdiction", "Arbitration"
84
+ ]
85
+
86
+ _ALL_LABELS = CUAD_LABELS + _UNFAIR_LABELS
87
+
88
+ RISK_MAP = {
89
+ # Critical
90
+ "Uncapped Liability": "CRITICAL",
91
+ "Arbitration": "CRITICAL",
92
+ "IP Ownership Assignment": "CRITICAL",
93
+ "Termination for Convenience": "CRITICAL",
94
+ "Limitation of liability": "CRITICAL",
95
+ "Unilateral termination": "CRITICAL",
96
+ "Liquidated Damages": "CRITICAL",
97
+ # High
98
+ "Non-Compete": "HIGH",
99
+ "Exclusivity": "HIGH",
100
+ "Change of Control": "HIGH",
101
+ "No-Solicit of Customers": "HIGH",
102
+ "No-Solicit of Employees": "HIGH",
103
+ "Unilateral change": "HIGH",
104
+ "Content removal": "HIGH",
105
+ "Anti-Assignment": "HIGH",
106
+ # Medium
107
+ "Governing Law": "MEDIUM",
108
+ "Jurisdiction": "MEDIUM",
109
+ "Choice of law": "MEDIUM",
110
+ "Price Restriction": "MEDIUM",
111
+ "Minimum Commitment": "MEDIUM",
112
+ "Volume Restriction": "MEDIUM",
113
+ "Non-Disparagement": "MEDIUM",
114
+ "Most Favored Nation": "MEDIUM",
115
+ "Revenue/Profit Sharing": "MEDIUM",
116
+ "Warranty Duration": "MEDIUM",
117
+ # Low
118
+ "Document Name": "LOW",
119
+ "Parties": "LOW",
120
+ "Agreement Date": "LOW",
121
+ "Effective Date": "LOW",
122
+ "Expiration Date": "LOW",
123
+ "Renewal Term": "LOW",
124
+ "Joint IP Ownership": "LOW",
125
+ "License Grant": "LOW",
126
+ "Non-Transferable License": "LOW",
127
+ "Affiliate License-Licensor": "LOW",
128
+ "Affiliate License-Licensee": "LOW",
129
+ "Unlimited/All-You-Can-Eat License": "LOW",
130
+ "Irrevocable or Perpetual License": "LOW",
131
+ "Source Code Escrow": "LOW",
132
+ "Post-Termination Services": "LOW",
133
+ "Audit Rights": "LOW",
134
+ "Cap on Liability": "LOW",
135
+ "Insurance": "LOW",
136
+ "Covenant Not to Sue": "LOW",
137
+ "Third Party Beneficiary": "LOW",
138
+ "Other": "LOW",
139
+ "ROFR/ROFO/ROFN": "LOW",
140
+ "Contract by using": "LOW",
141
  }
142
 
143
+ DESC_MAP = {label: label.replace("_", " ") for label in _ALL_LABELS}
144
+ DESC_MAP.update({
145
+ "Limitation of liability": "Company limits or excludes liability for losses, data breaches, or service failures.",
146
+ "Unilateral termination": "Company can terminate your account at any time without reason.",
147
+ "Unilateral change": "Company can change terms at any time without your consent.",
148
+ "Content removal": "Company can delete your content without notice or justification.",
149
+ "Contract by using": "You are bound to the contract simply by using the service.",
150
+ "Choice of law": "Governing law may differ from your country, reducing your legal protections.",
151
+ "Jurisdiction": "Disputes must be resolved in a jurisdiction that may disadvantage you.",
152
+ "Arbitration": "Forces disputes to arbitration instead of court. You waive your right to sue.",
153
+ "Uncapped Liability": "No financial limit on damages the party may be liable for.",
154
+ "Cap on Liability": "Maximum financial liability is explicitly capped.",
155
+ "Non-Compete": "Restrictions on competing with the counter-party.",
156
+ "Exclusivity": "Obligation to deal exclusively with one party.",
157
+ "IP Ownership Assignment": "Intellectual property rights are transferred entirely.",
158
+ "Termination for Convenience": "Either party may terminate without cause or notice.",
159
+ "Governing Law": "Specifies which jurisdiction's laws apply.",
160
+ "Non-Disparagement": "Agreement not to speak negatively about the other party.",
161
+ "ROFR/ROFO/ROFN": "Right of First Refusal / Offer / Negotiation clause.",
162
+ "Change of Control": "Provisions triggered by ownership or control changes.",
163
+ "Anti-Assignment": "Restrictions on transferring contract rights to third parties.",
164
+ "Liquidated Damages": "Pre-determined damages amount for breach of contract.",
165
+ "Source Code Escrow": "Third-party holds source code for release under defined conditions.",
166
+ "Post-Termination Services": "Services to be provided after the contract ends.",
167
+ "Audit Rights": "Right to inspect records or verify compliance.",
168
+ "Warranty Duration": "Length of time warranties remain in effect.",
169
+ "Covenant Not to Sue": "Agreement not to bring legal action against a party.",
170
+ "Third Party Beneficiary": "Non-party who benefits from the contract terms.",
171
+ "Insurance": "Insurance coverage requirements.",
172
+ "Revenue/Profit Sharing": "Revenue or profit sharing arrangements between parties.",
173
+ "Price Restriction": "Restrictions on pricing or discounting.",
174
+ "Minimum Commitment": "Minimum purchase or usage commitment.",
175
+ "Volume Restriction": "Limits on volume of goods or services.",
176
+ "License Grant": "Permission to use intellectual property.",
177
+ "Non-Transferable License": "License that cannot be transferred to third parties.",
178
+ "Irrevocable or Perpetual License": "License that cannot be revoked or lasts indefinitely.",
179
+ "Unlimited/All-You-Can-Eat License": "License with no usage limits.",
180
+ })
181
+
182
+ RISK_WEIGHTS = {"CRITICAL": 40, "HIGH": 20, "MEDIUM": 10, "LOW": 3}
183
+
184
+ RISK_STYLES = {
185
+ "CRITICAL": ("#dc2626", "#fef2f2", "⚠️"),
186
+ "HIGH": ("#ea580c", "#fff7ed", "⚑"),
187
+ "MEDIUM": ("#ca8a04", "#fefce8", "πŸ“‹"),
188
+ "LOW": ("#16a34a", "#f0fdf4", "βœ“"),
189
+ }
190
+
191
+ # ═══════════════════════════════════════════════════════════════════════
192
+ # 2. MODEL LOADING
193
+ # ═══════════════════════════════════════════════════════════════════════
194
+
195
+ cuad_tokenizer = None
196
+ cuad_model = None
197
+
198
+ def _load_cuad_model():
199
+ global cuad_tokenizer, cuad_model
200
+ if not _HAS_TORCH:
201
+ print("[ClauseGuard] PyTorch not available β€” using regex fallback")
202
+ return
203
+ try:
204
+ base = "nlpaueb/legal-bert-base-uncased"
205
+ adapter = "Mokshith31/legalbert-contract-clause-classification"
206
+ print(f"[ClauseGuard] Loading CUAD classifier: {adapter}")
207
+ cuad_tokenizer = AutoTokenizer.from_pretrained(base)
208
+ base_model = AutoModelForSequenceClassification.from_pretrained(
209
+ base, num_labels=41, ignore_mismatched_sizes=True
210
+ )
211
+ cuad_model = PeftModel.from_pretrained(base_model, adapter)
212
+ cuad_model.eval()
213
+ print("[ClauseGuard] CUAD model loaded successfully")
214
+ except Exception as e:
215
+ print(f"[ClauseGuard] CUAD model load failed: {e}")
216
+ cuad_tokenizer = None
217
+ cuad_model = None
218
+
219
+ _load_cuad_model()
220
+
221
+ # ═══════════════════════════════════════════════════════════════════════
222
+ # 3. DOCUMENT PARSING
223
+ # ═══════════════════════════════════════════════════════════════════════
224
+
225
+ def parse_pdf(file_path):
226
+ if not _HAS_PDF:
227
+ return None, "PDF parsing not available (pdfplumber not installed)"
228
+ try:
229
+ text = ""
230
+ with pdfplumber.open(file_path) as pdf:
231
+ for page in pdf.pages:
232
+ page_text = page.extract_text()
233
+ if page_text:
234
+ text += page_text + "\n\n"
235
+ return text.strip(), None
236
+ except Exception as e:
237
+ return None, f"PDF parse error: {e}"
238
+
239
+ def parse_docx(file_path):
240
+ if not _HAS_DOCX:
241
+ return None, "DOCX parsing not available (python-docx not installed)"
242
+ try:
243
+ doc = DocxDocument(file_path)
244
+ paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
245
+ return "\n\n".join(paragraphs), None
246
+ except Exception as e:
247
+ return None, f"DOCX parse error: {e}"
248
+
249
+ def parse_document(file_path):
250
+ if file_path is None:
251
+ return None, "No file uploaded"
252
+ ext = os.path.splitext(file_path)[1].lower()
253
+ if ext == ".pdf":
254
+ return parse_pdf(file_path)
255
+ elif ext in (".docx", ".doc"):
256
+ return parse_docx(file_path)
257
+ elif ext in (".txt", ".md", ".rst"):
258
+ try:
259
+ with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
260
+ return f.read(), None
261
+ except Exception as e:
262
+ return None, f"Text read error: {e}"
263
+ else:
264
+ return None, f"Unsupported file type: {ext}"
265
+
266
+ # ═══════════════════════════════════════════════════════════════════════
267
+ # 4. CLAUSE DETECTION
268
+ # ═══════════════════════════════════════════════════════════════════════
269
+
270
+ def split_clauses(text):
271
+ text = re.sub(r'\n{3,}', '\n\n', text.strip())
272
+ parts = re.split(
273
+ r'(?<=[.!?])\s+(?=[A-Z0-9(])|(?:\n\n)(?=\d+[.)]\s|\([a-z]\)\s|[A-Z][A-Z\s]{2,})',
274
+ text
275
+ )
276
+ clauses = []
277
+ for p in parts:
278
+ p = p.strip()
279
+ if len(p) > 30:
280
+ clauses.append(p)
281
+ return clauses
282
+
283
+ def classify_cuad(clause_text):
284
+ if cuad_model is None or cuad_tokenizer is None:
285
+ return _classify_regex(clause_text)
286
+ try:
287
+ inputs = cuad_tokenizer(
288
+ clause_text,
289
+ return_tensors="pt",
290
+ truncation=True,
291
+ max_length=256,
292
+ padding=True
293
+ )
294
+ with torch.no_grad():
295
+ logits = cuad_model(**inputs).logits
296
+ probs = torch.softmax(logits, dim=-1)[0]
297
+ threshold = 0.15
298
+ results = []
299
+ for i, prob in enumerate(probs):
300
+ if prob > threshold and i < len(CUAD_LABELS):
301
+ label = CUAD_LABELS[i]
302
+ risk = RISK_MAP.get(label, "LOW")
303
+ results.append({
304
+ "label": label,
305
+ "confidence": round(float(prob), 3),
306
+ "risk": risk,
307
+ "description": DESC_MAP.get(label, label),
308
+ })
309
+ results.sort(key=lambda x: x["confidence"], reverse=True)
310
+ if not results:
311
+ top_idx = int(probs.argmax())
312
+ label = CUAD_LABELS[top_idx] if top_idx < len(CUAD_LABELS) else "Other"
313
+ results.append({
314
+ "label": label,
315
+ "confidence": round(float(probs[top_idx]), 3),
316
+ "risk": RISK_MAP.get(label, "LOW"),
317
+ "description": DESC_MAP.get(label, label),
318
+ })
319
+ return results
320
+ except Exception as e:
321
+ print(f"[ClauseGuard] CUAD inference error: {e}")
322
+ return _classify_regex(clause_text)
323
+
324
+ _REGEX_PATTERNS = {
325
  "Limitation of liability": [r"not liable", r"shall not be (liable|responsible)", r"in no event.*liable", r"limitation of liability", r"without warranty", r"disclaim"],
326
  "Unilateral termination": [r"terminat.*at any time", r"suspend.*account.*without", r"we may (terminat|suspend|discontinu)", r"right to (terminat|suspend)"],
327
  "Unilateral change": [r"sole discretion", r"reserves? the right to (modify|change|update|amend)", r"at any time.*without (prior )?notice", r"we may (modify|change|update)"],
 
330
  "Choice of law": [r"governed by.*laws? of", r"shall be governed", r"laws of the state of"],
331
  "Jurisdiction": [r"exclusive jurisdiction", r"courts? of.*(california|delaware|new york|ireland|england)", r"submit to.*jurisdiction"],
332
  "Arbitration": [r"arbitrat", r"binding arbitration", r"waive.*right.*court", r"class action waiver"],
333
+ "Governing Law": [r"governed by", r"laws of", r"jurisdiction of"],
334
+ "Termination for Convenience": [r"terminat.*for convenience", r"terminat.*without cause", r"terminat.*at any time"],
335
+ "Non-Compete": [r"non-compete", r"shall not compete", r"competition"],
336
+ "Exclusivity": [r"exclusive", r"exclusivity"],
337
+ "IP Ownership Assignment": [r"assign.*intellectual property", r"ownership of.*ip", r"all rights.*assign"],
338
+ "Uncapped Liability": [r"unlimited liability", r"uncapped", r"no.*limit.*liability"],
339
+ "Cap on Liability": [r"cap on liability", r"maximum liability", r"liability.*shall not exceed"],
340
+ "Indemnification": [r"indemnif", r"hold harmless", r"defend"],
341
+ "Confidentiality": [r"confidential", r"non-disclosure", r"nda"],
342
+ "Force Majeure": [r"force majeure", r"act of god", r"beyond.*control"],
343
+ "Penalties": [r"penalt", r"late fee", r"default charge", r"interest on overdue"],
344
  }
345
 
346
+ def _classify_regex(text):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
  text_lower = text.lower()
348
+ results = []
349
+ seen = set()
350
+ for label, patterns in _REGEX_PATTERNS.items():
351
+ for pat in patterns:
352
+ if re.search(pat, text_lower):
353
+ if label not in seen:
354
+ risk = RISK_MAP.get(label, "MEDIUM")
355
+ results.append({
356
+ "label": label,
357
+ "confidence": 0.7,
358
+ "risk": risk,
359
+ "description": DESC_MAP.get(label, label),
360
+ })
361
+ seen.add(label)
362
  break
363
  return results
364
 
365
+ # ═══════════════════════════════════════════════════════════════════════
366
+ # 5. LEGAL NER
367
+ # ═══════════════════════════════════════════════════════════════════════
368
+
369
+ def extract_entities(text):
370
+ entities = []
371
+ date_patterns = [
372
+ (r'\b(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4}\b', "DATE"),
373
+ (r'\b\d{1,2}/\d{1,2}/\d{2,4}\b', "DATE"),
374
+ (r'\b\d{1,2}-\d{1,2}-\d{2,4}\b', "DATE"),
375
+ (r'\b(?:Effective|Commencement|Expiration|Termination)\s+Date\b', "DATE_REF"),
376
+ ]
377
+ for pat, etype in date_patterns:
378
+ for m in re.finditer(pat, text, re.IGNORECASE):
379
+ entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
380
+ money_patterns = [
381
+ (r'\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?(?:\s*(?:million|billion|thousand|M|B|K))?', "MONEY"),
382
+ (r'\b\d{1,3}(?:,\d{3})*(?:\.\d{2})?\s*(?:USD|EUR|GBP|dollars|euros)', "MONEY"),
383
+ ]
384
+ for pat, etype in money_patterns:
385
+ for m in re.finditer(pat, text, re.IGNORECASE):
386
+ entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
387
+ party_patterns = [
388
+ (r'\b[A-Z][A-Za-z0-9\s&]+(?:Inc\.|LLC|Ltd\.|Limited|Corp\.|Corporation|PLC|GmbH|AG|S\.A\.|B\.V\.)\b', "PARTY"),
389
+ (r'\b(?:Party A|Party B|Disclosing Party|Receiving Party|Licensor|Licensee|Buyer|Seller|Tenant|Landlord|Employer|Employee|Company|Customer|Vendor|Client)\b', "PARTY_ROLE"),
390
+ ]
391
+ for pat, etype in party_patterns:
392
+ for m in re.finditer(pat, text):
393
+ entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
394
+ jurisdiction_patterns = [
395
+ (r'\b(?:State|Laws?) of [A-Z][a-zA-Z\s]+', "JURISDICTION"),
396
+ (r'\b(?:California|Delaware|New York|Texas|Florida|England|Ireland|Germany|France|Singapore|Hong Kong)\b', "JURISDICTION"),
397
+ ]
398
+ for pat, etype in jurisdiction_patterns:
399
+ for m in re.finditer(pat, text, re.IGNORECASE):
400
+ entities.append({"text": m.group(), "type": etype, "start": m.start(), "end": m.end()})
401
+ defined_patterns = [
402
+ (r'"([A-Z][A-Z\s]+)"', "DEFINED_TERM"),
403
+ (r'\(([A-Z][A-Z\s]+)\)', "DEFINED_TERM"),
404
+ ]
405
+ for pat, etype in defined_patterns:
406
+ for m in re.finditer(pat, text):
407
+ entities.append({"text": m.group(1), "type": etype, "start": m.start(), "end": m.end()})
408
+ entities.sort(key=lambda x: (x["start"], -(x["end"] - x["start"])))
409
+ filtered = []
410
+ last_end = -1
411
+ for e in entities:
412
+ if e["start"] >= last_end:
413
+ filtered.append(e)
414
+ last_end = e["end"]
415
+ return filtered
416
+
417
+ # ═══════════════════════════════════════════════════════════════════════
418
+ # 6. NLI / CONTRADICTION DETECTION
419
+ # ═══════════════════════════════════════════════════════════════════════
420
+
421
+ _CONTRADICTION_PAIRS = [
422
+ (["Uncapped Liability", "unlimited liability"], ["Cap on Liability", "cap on liability"],
423
+ "Liability cannot be both uncapped and capped simultaneously."),
424
+ (["Governing Law"], ["Governing Law"],
425
+ "Multiple governing law provisions detected β€” verify consistency."),
426
+ (["Termination for Convenience", "terminat.*convenience"], ["Fixed Term", "fixed term"],
427
+ "Contract has both fixed term and termination for convenience β€” review carefully."),
428
+ (["IP Ownership Assignment", "assign.*ip"], ["Joint IP Ownership", "joint ownership"],
429
+ "IP cannot be both fully assigned and jointly owned."),
430
+ ]
431
+
432
+ def detect_contradictions(clause_results):
433
+ contradictions = []
434
+ labels_found = set()
435
+ for cr in clause_results:
436
+ labels_found.add(cr["label"])
437
+ for group_a, group_b, explanation in _CONTRADICTION_PAIRS:
438
+ found_a = any(l in labels_found for l in group_a)
439
+ found_b = any(l in labels_found for l in group_b)
440
+ if found_a and found_b:
441
+ contradictions.append({
442
+ "type": "CONTRADICTION",
443
+ "explanation": explanation,
444
+ "severity": "HIGH",
445
+ "clauses": list(set(group_a + group_b)),
446
+ })
447
+ critical_clauses = ["Governing Law", "Termination for Convenience", "Limitation of liability", "Arbitration"]
448
+ for cc in critical_clauses:
449
+ if cc not in labels_found:
450
+ contradictions.append({
451
+ "type": "MISSING",
452
+ "explanation": f"Critical clause '{cc}' not detected in the document.",
453
+ "severity": "MEDIUM",
454
+ "clauses": [cc],
455
+ })
456
+ return contradictions
457
+
458
+ # ═══════════════════════════════════════════════════════════════════════
459
+ # 7. RISK SCORING
460
+ # ═══════════════════════════════════════════════════════════════════════
461
+
462
+ def compute_risk_score(clause_results, total_clauses):
463
+ sev_counts = {"CRITICAL": 0, "HIGH": 0, "MEDIUM": 0, "LOW": 0}
464
+ for cr in clause_results:
465
+ sev = cr.get("risk", "LOW")
466
+ sev_counts[sev] += 1
467
+ if total_clauses == 0:
468
+ return 0, "A", sev_counts
469
+ weighted = sum(sev_counts[s] * RISK_WEIGHTS[s] for s in sev_counts)
470
+ risk = min(100, round(weighted / max(1, total_clauses) * 10))
471
+ if risk >= 70: grade = "F"
472
+ elif risk >= 50: grade = "D"
473
+ elif risk >= 30: grade = "C"
474
+ elif risk >= 15: grade = "B"
475
+ else: grade = "A"
476
+ return risk, grade, sev_counts
477
 
478
+ # ═══════════════════════════════════════════════════════════════════════
479
+ # 8. MAIN ANALYSIS PIPELINE
480
+ # ═══════════════════════════════════════════════════════════════════════
481
 
482
+ def analyze_contract(text):
483
+ if not text or len(text.strip()) < 50:
484
+ return None, "Document too short (minimum 50 characters)"
485
  clauses = split_clauses(text)
486
  if not clauses:
487
+ return None, "No clauses detected in document"
488
+ clause_results = []
 
 
 
489
  for clause in clauses:
490
+ predictions = classify_cuad(clause)
491
+ if predictions:
492
+ for pred in predictions:
493
+ clause_results.append({
494
+ "text": clause,
495
+ "label": pred["label"],
496
+ "confidence": pred["confidence"],
497
+ "risk": pred["risk"],
498
+ "description": pred["description"],
499
+ })
500
+ entities = extract_entities(text)
501
+ contradictions = detect_contradictions(clause_results)
502
+ risk, grade, sev_counts = compute_risk_score(clause_results, len(clauses))
503
+ obligations = extract_obligations(text)
504
+ compliance = check_compliance(text)
505
+ result = {
506
+ "metadata": {
507
+ "analysis_date": datetime.now().isoformat(),
508
+ "total_clauses": len(clauses),
509
+ "flagged_clauses": len(set(cr["text"] for cr in clause_results)),
510
+ "model": "Legal-BERT + CUAD (41 classes)" if cuad_model else "Regex fallback",
511
+ },
512
+ "risk": {
513
+ "score": risk,
514
+ "grade": grade,
515
+ "breakdown": sev_counts,
516
+ },
517
+ "clauses": clause_results,
518
+ "entities": entities,
519
+ "contradictions": contradictions,
520
+ "obligations": obligations,
521
+ "compliance": compliance,
522
+ "raw_text": text,
523
+ }
524
+ return result, None
525
+
526
+ # ═══════════════════════════════════════════════════════════════════════
527
+ # 9. EXPORT FUNCTIONS
528
+ # ═══════════════════════════════════════════════════════════════════════
529
+
530
+ def export_json(result):
531
+ if result is None:
532
+ return None
533
+ return json.dumps(result, indent=2, default=str)
534
+
535
+ def export_csv(result):
536
+ if result is None:
537
+ return None
538
+ output = io.StringIO()
539
+ writer = csv.writer(output)
540
+ writer.writerow(["Clause Text", "Label", "Risk", "Confidence", "Description"])
541
+ for cr in result.get("clauses", []):
542
+ writer.writerow([
543
+ cr.get("text", "")[:500],
544
+ cr.get("label", ""),
545
+ cr.get("risk", ""),
546
+ cr.get("confidence", ""),
547
+ cr.get("description", ""),
548
+ ])
549
+ return output.getvalue()
550
+
551
+ # ═══════════════════════════════════════════════════════════════════════
552
+ # 10. UI RENDERING
553
+ # ═══════════════════════════════════════════════════════════════════════
554
+
555
+ def render_summary(result):
556
+ if result is None:
557
+ return ""
558
+ risk = result["risk"]
559
+ score = risk["score"]
560
+ grade = risk["grade"]
561
+ breakdown = risk["breakdown"]
562
+ grade_color = {
563
+ "A": "#16a34a", "B": "#65a30d", "C": "#ca8a04",
564
+ "D": "#ea580c", "F": "#dc2626",
565
+ }.get(grade, "#6b7280")
566
+ crit, high, med, low = breakdown["CRITICAL"], breakdown["HIGH"], breakdown["MEDIUM"], breakdown["LOW"]
567
+ html = f"""
568
+ <div style="font-family:system-ui,sans-serif;padding:16px;border:1px solid #e5e7eb;border-radius:12px;background:#fff;">
569
+ <div style="text-align:center;margin-bottom:16px;">
570
+ <div style="font-size:48px;font-weight:700;color:{grade_color};">{score}</div>
571
+ <div style="font-size:14px;color:#6b7280;">/100 Risk Score</div>
572
+ <div style="display:inline-block;margin-top:8px;padding:4px 16px;border-radius:20px;background:{grade_color};color:white;font-weight:600;font-size:14px;">
573
+ Grade {grade}
574
+ </div>
575
+ </div>
576
+ <div style="display:grid;grid-template-columns:1fr 1fr;gap:8px;margin-bottom:12px;">
577
+ <div style="padding:8px;border-radius:6px;background:#fef2f2;text-align:center;">
578
+ <div style="font-size:20px;font-weight:700;color:#dc2626;">{crit}</div>
579
+ <div style="font-size:11px;color:#991b1b;">Critical</div>
580
+ </div>
581
+ <div style="padding:8px;border-radius:6px;background:#fff7ed;text-align:center;">
582
+ <div style="font-size:20px;font-weight:700;color:#ea580c;">{high}</div>
583
+ <div style="font-size:11px;color:#9a3412;">High</div>
584
+ </div>
585
+ <div style="padding:8px;border-radius:6px;background:#fefce8;text-align:center;">
586
+ <div style="font-size:20px;font-weight:700;color:#ca8a04;">{med}</div>
587
+ <div style="font-size:11px;color:#854d0e;">Medium</div>
588
+ </div>
589
+ <div style="padding:8px;border-radius:6px;background:#f0fdf4;text-align:center;">
590
+ <div style="font-size:20px;font-weight:700;color:#16a34a;">{low}</div>
591
+ <div style="font-size:11px;color:#166534;">Low</div>
592
+ </div>
593
+ </div>
594
+ <div style="font-size:12px;color:#6b7280;text-align:center;">
595
+ {result['metadata']['total_clauses']} clauses analyzed Β· {result['metadata']['flagged_clauses']} flagged
596
+ <br>Engine: {result['metadata']['model']}
597
  </div>
 
 
 
 
 
598
  </div>
599
+ """
600
+ return html
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
601
 
602
+ def render_clause_cards(result):
603
+ if result is None:
604
+ return ""
605
+ clauses = result.get("clauses", [])
606
+ if not clauses:
607
+ return '<div style="padding:24px;text-align:center;color:#6b7280;">No clauses detected.</div>'
608
+ grouped = defaultdict(list)
609
+ for cr in clauses:
610
+ grouped[cr["text"]].append(cr)
611
+ html = '<div style="font-family:system-ui,sans-serif;">'
612
+ for text, items in grouped.items():
613
+ max_risk = max(items, key=lambda x: {"CRITICAL":4,"HIGH":3,"MEDIUM":2,"LOW":1}[x["risk"]])["risk"]
614
+ border, bg, icon = RISK_STYLES[max_risk]
615
+ tags = ""
616
+ for item in items:
617
+ tag_bg = RISK_STYLES[item["risk"]][1]
618
+ tag_color = RISK_STYLES[item["risk"]][0]
619
+ tags += f'<span style="background:{tag_bg};color:{tag_color};border:1px solid {tag_color}33;padding:2px 8px;border-radius:12px;font-size:11px;font-weight:500;margin-right:4px;">{item["label"]} ({item["confidence"]})</span>'
620
+ descs = "".join(
621
+ f'<p style="font-size:12px;color:#6b7280;margin:4px 0 0 0;">{item["description"]}</p>'
622
+ for item in items
623
+ )
624
+ preview = text[:300] + ("..." if len(text) > 300 else "")
625
+ preview = preview.replace("<", "&lt;").replace(">", "&gt;")
626
+ html += f"""
627
+ <div style="border:1px solid #e5e7eb;border-left:4px solid {border};border-radius:8px;padding:14px;margin-bottom:10px;background:#fafafa;">
628
+ <div style="display:flex;align-items:center;gap:6px;margin-bottom:6px;">
629
+ <span style="font-size:16px;">{icon}</span>
630
+ <span style="font-size:12px;font-weight:600;color:{border};text-transform:uppercase;">{max_risk}</span>
631
+ </div>
632
+ <p style="font-size:13px;color:#374151;line-height:1.6;margin:0 0 8px 0;">{preview}</p>
633
+ <div style="margin-bottom:6px;">{tags}</div>
634
+ {descs}
635
+ </div>
636
+ """
637
+ html += "</div>"
638
+ return html
639
+
640
+ def render_entities(result):
641
+ if result is None:
642
+ return ""
643
+ entities = result.get("entities", [])
644
+ if not entities:
645
+ return '<div style="padding:16px;color:#6b7280;">No entities detected.</div>'
646
+ grouped = defaultdict(list)
647
+ for e in entities:
648
+ grouped[e["type"]].append(e["text"])
649
+ html = '<div style="font-family:system-ui,sans-serif;">'
650
+ for etype, texts in grouped.items():
651
+ unique = list(dict.fromkeys(texts))[:20]
652
+ color = {
653
+ "DATE": "#3b82f6", "DATE_REF": "#60a5fa",
654
+ "MONEY": "#22c55e",
655
+ "PARTY": "#8b5cf6", "PARTY_ROLE": "#a78bfa",
656
+ "JURISDICTION": "#f59e0b",
657
+ "DEFINED_TERM": "#ec4899",
658
+ }.get(etype, "#6b7280")
659
+ items_html = "".join(
660
+ f'<span style="display:inline-block;background:{color}15;color:{color};border:1px solid {color}40;padding:3px 10px;border-radius:6px;font-size:12px;margin:3px;">{t}</span>'
661
+ for t in unique
662
+ )
663
+ html += f"""
664
+ <div style="margin-bottom:12px;">
665
+ <div style="font-size:12px;font-weight:600;color:#374151;margin-bottom:6px;text-transform:uppercase;">{etype}</div>
666
+ <div>{items_html}</div>
667
+ </div>
668
+ """
669
+ html += "</div>"
670
+ return html
671
+
672
+ def render_contradictions(result):
673
+ if result is None:
674
+ return ""
675
+ contradictions = result.get("contradictions", [])
676
+ if not contradictions:
677
+ return '<div style="padding:16px;color:#16a34a;">βœ“ No contradictions or missing clauses detected.</div>'
678
+ html = '<div style="font-family:system-ui,sans-serif;">'
679
+ for c in contradictions:
680
+ sev_color = RISK_STYLES[c["severity"]][0]
681
+ icon = "⚠️" if c["type"] == "CONTRADICTION" else "πŸ“‹"
682
+ html += f"""
683
+ <div style="border:1px solid #e5e7eb;border-left:4px solid {sev_color};border-radius:8px;padding:12px;margin-bottom:8px;background:#fafafa;">
684
+ <div style="display:flex;align-items:center;gap:6px;margin-bottom:4px;">
685
+ <span>{icon}</span>
686
+ <span style="font-size:12px;font-weight:600;color:{sev_color};">{c["type"]}</span>
687
+ </div>
688
+ <p style="font-size:13px;color:#374151;margin:0;">{c["explanation"]}</p>
689
+ </div>
690
+ """
691
+ html += "</div>"
692
+ return html
693
+
694
+ def render_document_viewer(result):
695
+ if result is None:
696
+ return ""
697
+ text = result.get("raw_text", "")
698
+ entities = sorted(result.get("entities", []), key=lambda x: x["start"])
699
+ html_parts = []
700
+ last_end = 0
701
+ for e in entities:
702
+ if e["start"] >= last_end:
703
+ html_parts.append(text[last_end:e["start"]].replace("<", "&lt;").replace(">", "&gt;"))
704
+ color = {
705
+ "DATE": "#bfdbfe", "DATE_REF": "#bfdbfe",
706
+ "MONEY": "#bbf7d0",
707
+ "PARTY": "#ddd6fe", "PARTY_ROLE": "#ddd6fe",
708
+ "JURISDICTION": "#fde68a",
709
+ "DEFINED_TERM": "#fbcfe8",
710
+ }.get(e["type"], "#e5e7eb")
711
+ label = e["type"].replace("_", " ")
712
+ html_parts.append(
713
+ f'<mark style="background:{color};padding:1px 2px;border-radius:2px;font-size:12px;" title="{label}">{e["text"].replace("<","&lt;").replace(">","&gt;")}</mark>'
714
+ )
715
+ last_end = e["end"]
716
+ html_parts.append(text[last_end:].replace("<", "&lt;").replace(">", "&gt;"))
717
+ highlighted = "".join(html_parts)
718
+ return f"""
719
+ <div style="font-family:monospace;font-size:13px;line-height:1.6;padding:16px;border:1px solid #e5e7eb;border-radius:8px;background:#fff;max-height:600px;overflow-y:auto;white-space:pre-wrap;">
720
+ {highlighted}
721
+ </div>
722
+ """
723
+
724
+ # ═══════════════════════════════════════════════════════════════════════
725
+ # 11. COMPARISON UI FUNCTIONS
726
+ # ═══════════════════════════════════════════════════════════════════════
727
+
728
+ def run_comparison(text_a, text_b):
729
+ if not text_a or len(text_a.strip()) < 50:
730
+ return "Contract A is too short", ""
731
+ if not text_b or len(text_b.strip()) < 50:
732
+ return "Contract B is too short", ""
733
+ result = compare_contracts(text_a, text_b)
734
+ return render_comparison_html(result), json.dumps(result, indent=2)
735
+
736
+ # ═══════════════════════════════════════════════════════════════════════
737
+ # 12. GRADIO UI
738
+ # ═══════════════════════════════════════════════════════════════════════
739
+
740
+ def process_upload(file):
741
+ if file is None:
742
+ return "", "No file uploaded"
743
+ text, error = parse_document(file)
744
+ if error:
745
+ return "", error
746
+ return text, "Document loaded successfully"
747
+
748
+ def run_analysis(text):
749
+ if not text or len(text.strip()) < 50:
750
+ err_html = '<p style="color:#dc2626;padding:16px;">Document too short (minimum 50 characters)</p>'
751
+ return [err_html] * 7 + [None, None, ""]
752
+ result, error = analyze_contract(text)
753
+ if error:
754
+ err_html = f'<p style="color:#dc2626;padding:16px;">{error}</p>'
755
+ return [err_html] * 7 + [None, None, error]
756
+ json_path = "/tmp/clauseguard_report.json"
757
+ with open(json_path, "w") as f:
758
+ json.dump(result, f, indent=2, default=str)
759
+ csv_content = export_csv(result)
760
+ csv_path = "/tmp/clauseguard_report.csv"
761
+ with open(csv_path, "w") as f:
762
+ f.write(csv_content)
763
+ return [
764
+ render_summary(result),
765
+ render_clause_cards(result),
766
+ render_entities(result),
767
+ render_contradictions(result),
768
+ render_document_viewer(result),
769
+ render_obligations_html(result.get("obligations", [])),
770
+ render_compliance_html(result.get("compliance", {})),
771
+ json_path,
772
+ csv_path,
773
+ "Analysis complete",
774
+ ]
775
+
776
+ def do_clear():
777
+ return [""] * 7 + [None, None, ""]
778
+
779
+ # ── Example contracts ──
780
+ SPOTIFY_TOS = """By using the Spotify Service, you agree to be bound by these Terms of Use.
781
 
782
  Spotify may, in its sole discretion, modify or update these Terms of Service at any time without prior notice. Your continued use of the Service after any such changes constitutes your acceptance of the new Terms of Service.
783
 
 
789
 
790
  These Terms will be governed by and construed in accordance with the laws of the State of New York.
791
 
792
+ Any dispute shall be finally settled by arbitration in New York County. The parties waive any right to a jury trial."""
793
 
794
+ RENTAL_AGREEMENT = """The Landlord reserves the right to enter the premises at any time without prior notice for inspection or any other purpose deemed necessary in their sole discretion.
795
 
796
  The Landlord shall not be liable for any damage to the Tenant's personal property, whether caused by water leaks, fire, theft, or any other cause, including the Landlord's own negligence.
797
 
798
  The Landlord may terminate this lease at any time with only 7 days written notice, for any reason or no reason at all.
799
 
800
+ Any disputes arising from this lease agreement shall be resolved exclusively in the courts of the State of California, and the Tenant waives the right to a jury trial.
801
 
802
  The Landlord reserves the right to modify the terms of this lease at any time. Continued occupancy constitutes acceptance of the new terms."""
803
 
804
+ NDA_SAMPLE = """NON-DISCLOSURE AGREEMENT
805
+
806
+ This Non-Disclosure Agreement (the "Agreement") is entered into as of January 15, 2024 (the "Effective Date") by and between Acme Technologies, Inc. ("Disclosing Party") and Beta Solutions LLC ("Receiving Party").
807
+
808
+ 1. Governing Law. This Agreement shall be governed by and construed in accordance with the laws of the State of Delaware, without regard to its conflict of law principles.
809
+
810
+ 2. Term. This Agreement shall remain in effect for a period of three (3) years from the Effective Date.
811
+
812
+ 3. Termination. Either party may terminate this Agreement for convenience upon thirty (30) days prior written notice.
813
+
814
+ 4. Intellectual Property. All Confidential Information disclosed hereunder shall remain the exclusive property of the Disclosing Party. The Receiving Party hereby assigns to the Disclosing Party all right, title, and interest in any derivative works.
815
+
816
+ 5. Limitation of Liability. IN NO EVENT SHALL EITHER PARTY BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES.
817
+
818
+ 6. Indemnification. The Receiving Party shall indemnify and hold harmless the Disclosing Party from any and all claims arising from a breach of this Agreement.
819
+
820
+ 7. Non-Compete. During the term of this Agreement and for a period of two (2) years thereafter, the Receiving Party shall not engage in any business that competes with the Disclosing Party."""
821
+
822
+ COMPLEX_CONTRACT = """MASTER SERVICE AGREEMENT
823
+
824
+ This Master Service Agreement ("MSA") is entered into as of March 1, 2024 (the "Effective Date") by and between CloudTech Solutions, Inc., a Delaware corporation ("Provider") and Global Retail Partners LLC, a New York limited liability company ("Customer").
825
+
826
+ 1. SERVICES. Provider shall provide cloud hosting and data processing services as described in Exhibit A. Provider shall comply with all applicable laws including GDPR and CCPA.
827
+
828
+ 2. TERM AND RENEWAL. The initial term is twelve (12) months, automatically renewing for successive one (1) year periods unless terminated in accordance with Section 7.
829
+
830
+ 3. FEES AND PAYMENT. Customer shall pay a monthly fee of $25,000 within 30 days of invoice. Late payments incur a penalty of 1.5% per month. The total contract value is $300,000.
831
+
832
+ 4. LIABILITY. Provider's aggregate liability shall not exceed $1,000,000. IN NO EVENT SHALL PROVIDER BE LIABLE FOR LOST PROFITS OR CONSEQUENTIAL DAMAGES. Customer assumes all risk of data loss.
833
 
834
+ 5. INDEMNIFICATION. Each party shall indemnify the other for third-party claims arising from breach of this Agreement. Customer shall indemnify Provider for claims arising from Customer Data.
 
835
 
836
+ 6. INTELLECTUAL PROPERTY. Provider retains all IP rights. Customer receives a non-transferable, non-exclusive license for the term. Upon termination, Customer shall return or destroy all Provider materials within 10 business days.
837
+
838
+ 7. TERMINATION. Either party may terminate for convenience with 90 days notice. Provider may terminate immediately for non-payment. Upon termination, Customer shall pay all outstanding fees.
839
+
840
+ 8. GOVERNING LAW. This Agreement is governed by the laws of the State of Delaware. Disputes shall be resolved by binding arbitration in Wilmington, Delaware.
841
+
842
+ 9. FORCE MAJEURE. Neither party shall be liable for delays due to acts of God, war, terrorism, or government action.
843
+
844
+ 10. AUDIT RIGHTS. Customer may audit Provider's compliance annually. Provider shall provide SOC 2 Type II reports within 30 days of request.
845
+
846
+ 11. INSURANCE. Provider shall maintain general liability insurance of at least $5,000,000 and cyber liability insurance of at least $2,000,000.
847
+
848
+ 12. CONFIDENTIALITY. Both parties agree to keep Confidential Information secure for five (5) years. This obligation survives termination.
849
+
850
+ 13. ASSIGNMENT. Neither party may assign this Agreement without prior written consent. Any attempted assignment is void.
851
+
852
+ 14. THIRD PARTY BENEFICIARY. No third party shall have rights under this Agreement except as expressly provided."""
853
+
854
+ with gr.Blocks(
855
+ title="ClauseGuard β€” AI Contract Analysis",
856
+ css="""
857
+ .gradio-container { max-width: 1600px !important; }
858
+ """
859
+ ) as demo:
860
+
861
+ gr.HTML("""
862
+ <div style="display:flex;align-items:center;justify-content:space-between;padding:12px 0;border-bottom:2px solid #e5e7eb;margin-bottom:16px;">
863
+ <div>
864
+ <h1 style="font-size:24px;font-weight:700;margin:0;color:#1f2937;">πŸ›‘οΈ ClauseGuard</h1>
865
+ <p style="font-size:13px;color:#6b7280;margin:4px 0 0 0;">AI-Powered Legal Contract Analysis Β· 41 Clause Categories Β· Risk Scoring Β· NER Β· NLI Β· Compliance Β· Obligations</p>
866
+ </div>
867
+ <div style="font-size:12px;color:#9ca3af;">v2.0 Β· World's Best Open-Source Legal AI</div>
868
+ </div>
869
+ """)
870
+
871
+ # ── Main Tabs: Analysis vs Comparison ──
872
+ with gr.Tabs():
873
+
874
+ # ═══════ TAB 1: Single Contract Analysis ═══════
875
+ with gr.Tab("πŸ“„ Single Contract Analysis"):
876
+ with gr.Row():
877
+ with gr.Column(scale=1):
878
+ file_input = gr.File(
879
+ label="πŸ“ Upload Contract (PDF/DOCX/TXT)",
880
+ file_types=[".pdf", ".docx", ".doc", ".txt", ".md"],
881
+ )
882
+ load_btn = gr.Button("Load Document", variant="secondary", size="sm")
883
+ load_status = gr.Textbox(label="Status", interactive=False, lines=1)
884
+
885
+ with gr.Column(scale=3):
886
+ text_input = gr.Textbox(
887
+ label="πŸ“„ Contract Text",
888
+ placeholder="Paste contract text here, or upload a file above...",
889
+ lines=14,
890
+ max_lines=40,
891
+ show_copy_button=True,
892
+ )
893
+
894
+ with gr.Column(scale=1):
895
+ scan_btn = gr.Button("πŸ” Analyze Contract", variant="primary", size="lg")
896
+ clear_btn = gr.Button("Clear", variant="secondary", size="sm")
897
+ status_msg = gr.Textbox(label="Analysis Status", interactive=False, lines=1)
898
+
899
+ # ── Examples ──
900
  with gr.Row():
901
+ gr.Examples(
902
+ examples=[[SPOTIFY_TOS], [RENTAL_AGREEMENT], [NDA_SAMPLE], [COMPLEX_CONTRACT]],
903
+ inputs=[text_input],
904
+ label="Example Contracts",
905
+ )
906
 
907
+ # ── Results ──
908
+ with gr.Row():
909
+ with gr.Column(scale=1):
910
+ gr.Markdown("### πŸ“Š Risk Summary")
911
+ summary_html = gr.HTML()
912
+
913
+ gr.Markdown("### πŸ“₯ Export Reports")
914
+ json_file = gr.File(label="JSON Report")
915
+ csv_file = gr.File(label="CSV Report")
916
+
917
+ with gr.Column(scale=3):
918
+ with gr.Tabs():
919
+ with gr.Tab("πŸ“„ Document"):
920
+ doc_html = gr.HTML(label="Document Viewer")
921
+ with gr.Tab("⚠️ Clauses (41 Categories)"):
922
+ clauses_html = gr.HTML(label="Detected Clauses")
923
+ with gr.Tab("🏷️ Entities"):
924
+ entities_html = gr.HTML(label="Named Entities")
925
+ with gr.Tab("πŸ” Contradictions"):
926
+ nli_html = gr.HTML(label="Contradictions & Missing Clauses")
927
+ with gr.Tab("πŸ“‹ Obligations"):
928
+ obligations_html = gr.HTML(label="Obligation Tracker")
929
+ with gr.Tab("βš–οΈ Compliance"):
930
+ compliance_html = gr.HTML(label="Compliance Checker")
931
+
932
+ # ═══════ TAB 2: Contract Comparison ═══════
933
+ with gr.Tab("πŸ”€ Compare Contracts"):
934
+ with gr.Row():
935
+ with gr.Column(scale=1):
936
+ comp_file_a = gr.File(
937
+ label="πŸ“ Contract A (PDF/DOCX/TXT)",
938
+ file_types=[".pdf", ".docx", ".doc", ".txt"],
939
+ )
940
+ comp_load_a = gr.Button("Load A", variant="secondary", size="sm")
941
+ comp_status_a = gr.Textbox(label="Status A", interactive=False, lines=1)
942
+
943
+ with gr.Column(scale=3):
944
+ comp_text_a = gr.Textbox(
945
+ label="Contract A",
946
+ placeholder="Paste contract A here...",
947
+ lines=12,
948
+ show_copy_button=True,
949
+ )
950
+
951
+ with gr.Column(scale=1):
952
+ comp_file_b = gr.File(
953
+ label="πŸ“ Contract B (PDF/DOCX/TXT)",
954
+ file_types=[".pdf", ".docx", ".doc", ".txt"],
955
+ )
956
+ comp_load_b = gr.Button("Load B", variant="secondary", size="sm")
957
+ comp_status_b = gr.Textbox(label="Status B", interactive=False, lines=1)
958
+
959
+ with gr.Column(scale=3):
960
+ comp_text_b = gr.Textbox(
961
+ label="Contract B",
962
+ placeholder="Paste contract B here...",
963
+ lines=12,
964
+ show_copy_button=True,
965
+ )
966
 
967
+ with gr.Row():
968
+ with gr.Column(scale=1):
969
+ comp_btn = gr.Button("πŸ”€ Compare Contracts", variant="primary", size="lg")
970
+ with gr.Column(scale=5):
971
+ comp_status = gr.Textbox(label="Comparison Status", interactive=False, lines=1)
972
 
973
+ with gr.Row():
974
+ with gr.Column(scale=4):
975
+ comp_result_html = gr.HTML(label="Comparison Results")
976
+ with gr.Column(scale=2):
977
+ comp_json = gr.JSON(label="Raw Comparison Data")
978
+
979
+ # ── Events ──
980
+ def _load_file(file):
981
+ text, err = parse_document(file) if file else ("", "No file")
982
+ if err and not text:
983
+ return "", err
984
+ return text, "Loaded successfully" if not err else err
985
+
986
+ load_btn.click(_load_file, inputs=[file_input], outputs=[text_input, load_status])
987
+ comp_load_a.click(_load_file, inputs=[comp_file_a], outputs=[comp_text_a, comp_status_a])
988
+ comp_load_b.click(_load_file, inputs=[comp_file_b], outputs=[comp_text_b, comp_status_b])
989
+
990
+ scan_btn.click(
991
+ run_analysis,
992
+ inputs=[text_input],
993
+ outputs=[summary_html, clauses_html, entities_html, nli_html,
994
+ doc_html, obligations_html, compliance_html,
995
+ json_file, csv_file, status_msg]
996
+ )
997
+
998
+ clear_btn.click(
999
+ do_clear,
1000
+ outputs=[summary_html, clauses_html, entities_html, nli_html,
1001
+ doc_html, obligations_html, compliance_html,
1002
+ json_file, csv_file, status_msg]
1003
+ )
1004
+
1005
+ comp_btn.click(
1006
+ run_comparison,
1007
+ inputs=[comp_text_a, comp_text_b],
1008
+ outputs=[comp_result_html, comp_json]
1009
+ )
1010
+
1011
+ gr.HTML("""
1012
+ <div style="margin-top:24px;padding:16px 0;border-top:1px solid #e5e7eb;text-align:center;">
1013
+ <p style="font-size:11px;color:#9ca3af;">
1014
+ ⚠️ Not legal advice. For informational purposes only.
1015
+ Β· Model: <a href="https://huggingface.co/Mokshith31/legalbert-contract-clause-classification" style="color:#6b7280;">Legal-BERT + CUAD (41 classes)</a>
1016
+ Β· Dataset: <a href="https://huggingface.co/datasets/theatticusproject/cuad-qa" style="color:#6b7280;">CUAD</a>
1017
+ Β· <a href="https://huggingface.co/spaces/gaurv007/ClauseGuard" style="color:#6b7280;">ClauseGuard Space</a>
1018
+ </p>
1019
+ </div>
1020
+ """)
1021
 
1022
  if __name__ == "__main__":
1023
  demo.launch()
compare.py ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ClauseGuard β€” Contract Comparison Engine
3
+ ═══════════════════════════════════════
4
+ Compare two contracts side-by-side:
5
+ β€’ Clause-level diff (added/removed/modified clauses)
6
+ β€’ Risk delta (which contract is more favorable)
7
+ β€’ Alignment score (similarity between documents)
8
+ """
9
+
10
+ import re
11
+ from difflib import SequenceMatcher
12
+ from collections import defaultdict
13
+
14
+ def _normalize_clause(text):
15
+ """Normalize clause text for comparison."""
16
+ text = text.lower()
17
+ text = re.sub(r'[^a-z0-9\s]', ' ', text)
18
+ text = re.sub(r'\s+', ' ', text).strip()
19
+ return text
20
+
21
+ def _clause_similarity(a, b):
22
+ """Compute similarity between two clauses."""
23
+ return SequenceMatcher(None, _normalize_clause(a), _normalize_clause(b)).ratio()
24
+
25
+ def _extract_clause_type(clause_text):
26
+ """Heuristic clause type detection for alignment."""
27
+ text_lower = clause_text.lower()
28
+ type_keywords = {
29
+ "governing law": ["govern", "law", "jurisdiction"],
30
+ "termination": ["terminat", "cancel", "end"],
31
+ "indemnification": ["indemnif", "hold harmless"],
32
+ "confidentiality": ["confidential", "non-disclosure"],
33
+ "liability": ["liability", "liable", "damages"],
34
+ "payment": ["payment", "fee", "price", "compensat"],
35
+ "intellectual property": ["intellectual", "ip", "copyright", "patent"],
36
+ "warranty": ["warrant", "guarantee"],
37
+ "force majeure": ["force majeure", "act of god"],
38
+ "arbitration": ["arbitrat", "mediation"],
39
+ "assignment": ["assign", "transfer"],
40
+ "non-compete": ["compete", "competition"],
41
+ "renewal": ["renew", "extend"],
42
+ "effective date": ["effective date", "commencement"],
43
+ }
44
+ for ctype, keywords in type_keywords.items():
45
+ if any(kw in text_lower for kw in keywords):
46
+ return ctype
47
+ return "general"
48
+
49
+ def compare_contracts(text_a, text_b, clauses_a=None, clauses_b=None):
50
+ """
51
+ Compare two contract texts and return structural diff.
52
+
53
+ Returns dict with:
54
+ - alignment_score: float 0-1
55
+ - added_clauses: clauses in B not in A
56
+ - removed_clauses: clauses in A not in B
57
+ - modified_clauses: clauses that are similar but different
58
+ - risk_delta: which contract is riskier
59
+ - clause_type_map: clauses grouped by type for both docs
60
+ """
61
+ if not text_a or not text_b:
62
+ return {"error": "Both contracts required"}
63
+
64
+ # Split into clauses if not provided
65
+ if clauses_a is None:
66
+ clauses_a = _split_clauses(text_a)
67
+ if clauses_b is None:
68
+ clauses_b = _split_clauses(text_b)
69
+
70
+ # Build clause type maps
71
+ type_map_a = defaultdict(list)
72
+ type_map_b = defaultdict(list)
73
+ for c in clauses_a:
74
+ type_map_a[_extract_clause_type(c)].append(c)
75
+ for c in clauses_b:
76
+ type_map_b[_extract_clause_type(c)].append(c)
77
+
78
+ # Find matches
79
+ matched_a = set()
80
+ matched_b = set()
81
+ modified = []
82
+
83
+ SIMILARITY_THRESHOLD = 0.75
84
+ MODIFIED_THRESHOLD = 0.45
85
+
86
+ for i, ca in enumerate(clauses_a):
87
+ best_sim = 0
88
+ best_j = -1
89
+ for j, cb in enumerate(clauses_b):
90
+ if j in matched_b:
91
+ continue
92
+ sim = _clause_similarity(ca, cb)
93
+ if sim > best_sim:
94
+ best_sim = sim
95
+ best_j = j
96
+
97
+ if best_sim >= SIMILARITY_THRESHOLD:
98
+ matched_a.add(i)
99
+ matched_b.add(best_j)
100
+ if best_sim < 0.95:
101
+ modified.append({
102
+ "type": "modified",
103
+ "similarity": round(best_sim, 3),
104
+ "clause_a": ca[:200],
105
+ "clause_b": clauses_b[best_j][:200],
106
+ "clause_type": _extract_clause_type(ca),
107
+ })
108
+ elif best_sim >= MODIFIED_THRESHOLD:
109
+ modified.append({
110
+ "type": "partial",
111
+ "similarity": round(best_sim, 3),
112
+ "clause_a": ca[:200],
113
+ "clause_b": clauses_b[best_j][:200] if best_j >= 0 else "",
114
+ "clause_type": _extract_clause_type(ca),
115
+ })
116
+
117
+ removed = [clauses_a[i] for i in range(len(clauses_a)) if i not in matched_a]
118
+ added = [clauses_b[j] for j in range(len(clauses_b)) if j not in matched_b]
119
+
120
+ # Compute alignment score
121
+ total_pairs = max(len(clauses_a), len(clauses_b))
122
+ if total_pairs > 0:
123
+ alignment = len(matched_a) / total_pairs
124
+ else:
125
+ alignment = 0.0
126
+
127
+ # Risk delta: compare length and presence of risk keywords
128
+ risk_keywords = ["unlimited", "unilateral", "waive", "arbitration", "indemnif",
129
+ "not liable", "no warranty", "sole discretion"]
130
+ risk_a = sum(1 for kw in risk_keywords if kw in text_a.lower())
131
+ risk_b = sum(1 for kw in risk_keywords if kw in text_b.lower())
132
+
133
+ if risk_a > risk_b + 2:
134
+ risk_delta = "Contract A is significantly riskier"
135
+ risk_winner = "B"
136
+ elif risk_b > risk_a + 2:
137
+ risk_delta = "Contract B is significantly riskier"
138
+ risk_winner = "A"
139
+ else:
140
+ risk_delta = "Similar risk profiles"
141
+ risk_winner = "tie"
142
+
143
+ return {
144
+ "alignment_score": round(alignment, 3),
145
+ "contract_a_clauses": len(clauses_a),
146
+ "contract_b_clauses": len(clauses_b),
147
+ "added_clauses": [{"text": c[:200], "type": _extract_clause_type(c)} for c in added[:50]],
148
+ "removed_clauses": [{"text": c[:200], "type": _extract_clause_type(c)} for c in removed[:50]],
149
+ "modified_clauses": modified[:50],
150
+ "risk_delta": risk_delta,
151
+ "risk_winner": risk_winner,
152
+ "type_map_a": {k: len(v) for k, v in type_map_a.items()},
153
+ "type_map_b": {k: len(v) for k, v in type_map_b.items()},
154
+ }
155
+
156
+ def _split_clauses(text):
157
+ """Split text into clauses."""
158
+ text = re.sub(r'\n{3,}', '\n\n', text.strip())
159
+ parts = re.split(
160
+ r'(?<=[.!?])\s+(?=[A-Z0-9(])|(?:\n\n)(?=\d+[.)]\s|\([a-z]\)\s|[A-Z][A-Z\s]{2,})',
161
+ text
162
+ )
163
+ return [p.strip() for p in parts if len(p.strip()) > 30]
164
+
165
+ def render_comparison_html(result):
166
+ """Render comparison results as HTML for Gradio."""
167
+ if "error" in result:
168
+ return f'<p style="color:#dc2626;">{result["error"]}</p>'
169
+
170
+ html = f'''
171
+ <div style="font-family:system-ui,sans-serif;">
172
+ <div style="display:grid;grid-template-columns:1fr 1fr;gap:12px;margin-bottom:16px;">
173
+ <div style="padding:12px;border-radius:8px;background:#eff6ff;border:1px solid #bfdbfe;text-align:center;">
174
+ <div style="font-size:24px;font-weight:700;color:#1d4ed8;">{result["contract_a_clauses"]}</div>
175
+ <div style="font-size:12px;color:#3b82f6;">Clauses in Contract A</div>
176
+ </div>
177
+ <div style="padding:12px;border-radius:8px;background:#fefce8;border:1px solid #fde68a;text-align:center;">
178
+ <div style="font-size:24px;font-weight:700;color:#a16207;">{result["contract_b_clauses"]}</div>
179
+ <div style="font-size:12px;color:#ca8a04;">Clauses in Contract B</div>
180
+ </div>
181
+ </div>
182
+
183
+ <div style="padding:12px;border-radius:8px;background:#f9fafb;border:1px solid #e5e7eb;margin-bottom:16px;text-align:center;">
184
+ <div style="font-size:28px;font-weight:700;color:#374151;">{result["alignment_score"]*100:.1f}%</div>
185
+ <div style="font-size:12px;color:#6b7280;">Alignment Score</div>
186
+ </div>
187
+
188
+ <div style="padding:12px;border-radius:8px;background:{
189
+ "#fef2f2" if result["risk_winner"] != "tie" else "#f0fdf4"
190
+ };border:1px solid {
191
+ "#fecaca" if result["risk_winner"] != "tie" else "#bbf7d0"
192
+ };margin-bottom:16px;text-align:center;">
193
+ <span style="font-size:14px;font-weight:600;color:{
194
+ "#dc2626" if result["risk_winner"] != "tie" else "#16a34a"
195
+ };">βš–οΈ {result["risk_delta"]}</span>
196
+ </div>
197
+ '''
198
+
199
+ # Modified clauses
200
+ if result["modified_clauses"]:
201
+ html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">πŸ“ Modified Clauses</h3>'
202
+ for m in result["modified_clauses"][:20]:
203
+ html += f'''
204
+ <div style="border:1px solid #e5e7eb;border-radius:6px;padding:10px;margin-bottom:8px;">
205
+ <div style="font-size:11px;color:#6b7280;margin-bottom:4px;">{m["clause_type"].upper()} Β· Similarity: {m["similarity"]*100:.0f}%</div>
206
+ <div style="display:grid;grid-template-columns:1fr 1fr;gap:8px;">
207
+ <div style="background:#fef2f2;padding:6px;border-radius:4px;font-size:12px;color:#991b1b;">{m["clause_a"][:150]}...</div>
208
+ <div style="background:#f0fdf4;padding:6px;border-radius:4px;font-size:12px;color:#166534;">{m["clause_b"][:150]}...</div>
209
+ </div>
210
+ </div>
211
+ '''
212
+ html += '</div>'
213
+
214
+ # Added clauses
215
+ if result["added_clauses"]:
216
+ html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">βž• Added in Contract B</h3>'
217
+ for a in result["added_clauses"][:15]:
218
+ html += f'<div style="background:#f0fdf4;padding:8px;border-radius:4px;font-size:12px;color:#166534;margin-bottom:4px;border-left:3px solid #22c55e;"><b>{a["type"].upper()}</b> Β· {a["text"][:150]}...</div>'
219
+ html += '</div>'
220
+
221
+ # Removed clauses
222
+ if result["removed_clauses"]:
223
+ html += '<div style="margin-bottom:16px;"><h3 style="font-size:14px;color:#374151;margin-bottom:8px;">βž– Removed from Contract A</h3>'
224
+ for r in result["removed_clauses"][:15]:
225
+ html += f'<div style="background:#fef2f2;padding:8px;border-radius:4px;font-size:12px;color:#991b1b;margin-bottom:4px;border-left:3px solid #ef4444;"><b>{r["type"].upper()}</b> Β· {r["text"][:150]}...</div>'
226
+ html += '</div>'
227
+
228
+ html += '</div>'
229
+ return html
compliance.py ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ClauseGuard β€” Compliance Checker
3
+ ════════════════════════════════
4
+ Check contracts against regulatory frameworks:
5
+ β€’ GDPR (EU General Data Protection Regulation)
6
+ β€’ CCPA (California Consumer Privacy Act)
7
+ β€’ SOX (Sarbanes-Oxley)
8
+ β€’ HIPAA (Health Insurance Portability and Accountability Act)
9
+ β€’ FINRA (Financial Industry Regulatory Authority)
10
+ """
11
+
12
+ import re
13
+ from collections import defaultdict
14
+
15
+ # Regulatory requirement definitions
16
+ REGULATIONS = {
17
+ "GDPR": {
18
+ "description": "EU General Data Protection Regulation (Regulation 2016/679)",
19
+ "requirements": {
20
+ "lawful_basis": {
21
+ "keywords": ["lawful basis", "legal basis", "legitimate interest", "consent", "performance of contract", "legal obligation"],
22
+ "description": "Must specify lawful basis for data processing (Art. 6)",
23
+ "severity": "HIGH",
24
+ },
25
+ "data_subject_rights": {
26
+ "keywords": ["right to access", "right to erasure", "right to be forgotten", "data portability", "rectification", "object to processing"],
27
+ "description": "Must acknowledge data subject rights (Arts. 15-22)",
28
+ "severity": "HIGH",
29
+ },
30
+ "data_breach_notification": {
31
+ "keywords": ["data breach", "breach notification", "notify supervisory authority", "72 hours"],
32
+ "description": "Must include data breach notification obligations (Art. 33)",
33
+ "severity": "MEDIUM",
34
+ },
35
+ "data_protection_officer": {
36
+ "keywords": ["data protection officer", "DPO"],
37
+ "description": "Should reference Data Protection Officer if applicable (Art. 37)",
38
+ "severity": "LOW",
39
+ },
40
+ "cross_border_transfer": {
41
+ "keywords": ["standard contractual clauses", "SCCs", "adequacy decision", "transfer mechanism", "third country"],
42
+ "description": "Must specify transfer safeguards for cross-border data (Arts. 44-49)",
43
+ "severity": "HIGH",
44
+ },
45
+ "privacy_by_design": {
46
+ "keywords": ["privacy by design", "privacy by default", "data minimization", "purpose limitation"],
47
+ "description": "Should reference privacy-by-design principles (Art. 25)",
48
+ "severity": "MEDIUM",
49
+ },
50
+ },
51
+ },
52
+ "CCPA": {
53
+ "description": "California Consumer Privacy Act (Cal. Civ. Code Β§ 1798.100 et seq.)",
54
+ "requirements": {
55
+ "consumer_rights": {
56
+ "keywords": ["right to know", "right to delete", "right to opt out", "right to non-discrimination", "consumer rights"],
57
+ "description": "Must acknowledge California consumer rights",
58
+ "severity": "HIGH",
59
+ },
60
+ "data_categories": {
61
+ "keywords": ["categories of personal information", "personal information categories", "identifiers", "commercial information"],
62
+ "description": "Must disclose categories of personal information collected",
63
+ "severity": "HIGH",
64
+ },
65
+ "sale_of_data": {
66
+ "keywords": ["do not sell my personal information", "opt-out of sale", "sale of personal information"],
67
+ "description": "Must provide opt-out mechanism for data sales",
68
+ "severity": "HIGH",
69
+ },
70
+ "service_providers": {
71
+ "keywords": ["service provider", "third party", "contractor", "business purpose"],
72
+ "description": "Should limit data use to business/service provider purposes",
73
+ "severity": "MEDIUM",
74
+ },
75
+ },
76
+ },
77
+ "SOX": {
78
+ "description": "Sarbanes-Oxley Act (US, 2002)",
79
+ "requirements": {
80
+ "internal_controls": {
81
+ "keywords": ["internal controls", "internal control over financial reporting", "ICFR"],
82
+ "description": "Must reference internal controls over financial reporting (Β§ 404)",
83
+ "severity": "HIGH",
84
+ },
85
+ "audit_committee": {
86
+ "keywords": ["audit committee", "independent auditor", "PCAOB"],
87
+ "description": "Should reference audit committee oversight",
88
+ "severity": "MEDIUM",
89
+ },
90
+ "whistleblower": {
91
+ "keywords": ["whistleblower", "anonymous reporting", "reporting hotline", "retaliation"],
92
+ "description": "Should protect whistleblower provisions (Β§ 806)",
93
+ "severity": "HIGH",
94
+ },
95
+ "document_retention": {
96
+ "keywords": ["document retention", "record retention", "retention policy", "preserve records"],
97
+ "description": "Must include document retention obligations (Β§ 802)",
98
+ "severity": "HIGH",
99
+ },
100
+ },
101
+ },
102
+ "HIPAA": {
103
+ "description": "Health Insurance Portability and Accountability Act (US, 1996)",
104
+ "requirements": {
105
+ "phi_protection": {
106
+ "keywords": ["protected health information", "PHI", "health information", "ePHI"],
107
+ "description": "Must protect PHI and limit uses/disclosures",
108
+ "severity": "CRITICAL",
109
+ },
110
+ "business_associate": {
111
+ "keywords": ["business associate agreement", "BAA", "business associate", "covered entity"],
112
+ "description": "Should reference Business Associate Agreement (Β§ 164.504(e))",
113
+ "severity": "HIGH",
114
+ },
115
+ "security_safeguards": {
116
+ "keywords": ["administrative safeguards", "technical safeguards", "physical safeguards", "encryption", "access controls"],
117
+ "description": "Must implement security safeguards (Β§ 164.308-312)",
118
+ "severity": "HIGH",
119
+ },
120
+ "breach_notification": {
121
+ "keywords": ["breach notification", "notification of breach", "unauthorized access"],
122
+ "description": "Must include breach notification obligations (Β§ 164.400-414)",
123
+ "severity": "HIGH",
124
+ },
125
+ },
126
+ },
127
+ "FINRA": {
128
+ "description": "Financial Industry Regulatory Authority (US)",
129
+ "requirements": {
130
+ "recordkeeping": {
131
+ "keywords": ["recordkeeping", "books and records", "retain records", "SEC Rule 17a-4"],
132
+ "description": "Must comply with recordkeeping rules (FINRA Rule 4511)",
133
+ "severity": "HIGH",
134
+ },
135
+ "supervision": {
136
+ "keywords": ["supervision", "supervisory system", "review and approval"],
137
+ "description": "Should reference supervisory obligations (FINRA Rule 3110)",
138
+ "severity": "MEDIUM",
139
+ },
140
+ "anti_money_laundering": {
141
+ "keywords": ["anti-money laundering", "AML", "suspicious activity", "SAR", "OFAC"],
142
+ "description": "Must reference AML compliance (FINRA Rule 3310)",
143
+ "severity": "HIGH",
144
+ },
145
+ "privacy": {
146
+ "keywords": ["privacy policy", "customer information", "Regulation S-P", "nonpublic personal information"],
147
+ "description": "Must protect customer information (Regulation S-P)",
148
+ "severity": "HIGH",
149
+ },
150
+ },
151
+ },
152
+ }
153
+
154
+ RISK_STYLES = {
155
+ "CRITICAL": ("#dc2626", "#fef2f2"),
156
+ "HIGH": ("#ea580c", "#fff7ed"),
157
+ "MEDIUM": ("#ca8a04", "#fefce8"),
158
+ "LOW": ("#16a34a", "#f0fdf4"),
159
+ }
160
+
161
+
162
+ def check_compliance(text):
163
+ """Check contract text against all regulatory frameworks."""
164
+ text_lower = text.lower()
165
+ results = {}
166
+
167
+ for reg_name, reg_data in REGULATIONS.items():
168
+ checks = []
169
+ for req_name, req_data in reg_data["requirements"].items():
170
+ matched = False
171
+ matched_keywords = []
172
+ for kw in req_data["keywords"]:
173
+ if kw.lower() in text_lower:
174
+ matched = True
175
+ matched_keywords.append(kw)
176
+ checks.append({
177
+ "requirement": req_name,
178
+ "description": req_data["description"],
179
+ "severity": req_data["severity"],
180
+ "status": "PASS" if matched else "MISSING",
181
+ "matched_keywords": matched_keywords,
182
+ })
183
+
184
+ passed = sum(1 for c in checks if c["status"] == "PASS")
185
+ total = len(checks)
186
+ compliance_rate = round(passed / total * 100) if total > 0 else 0
187
+
188
+ results[reg_name] = {
189
+ "description": reg_data["description"],
190
+ "compliance_rate": compliance_rate,
191
+ "checks": checks,
192
+ "overall_status": "COMPLIANT" if compliance_rate >= 80 else "PARTIAL" if compliance_rate >= 40 else "NON-COMPLIANT",
193
+ }
194
+
195
+ return results
196
+
197
+
198
+ def render_compliance_html(results):
199
+ """Render compliance results as HTML for Gradio."""
200
+ html = '<div style="font-family:system-ui,sans-serif;">'
201
+
202
+ for reg_name, reg_result in results.items():
203
+ rate = reg_result["compliance_rate"]
204
+ status = reg_result["overall_status"]
205
+ status_color = "#16a34a" if status == "COMPLIANT" else "#ca8a04" if status == "PARTIAL" else "#dc2626"
206
+ status_bg = "#f0fdf4" if status == "COMPLIANT" else "#fefce8" if status == "PARTIAL" else "#fef2f2"
207
+
208
+ html += f'''
209
+ <div style="border:1px solid #e5e7eb;border-radius:10px;margin-bottom:16px;overflow:hidden;">
210
+ <div style="display:flex;justify-content:space-between;align-items:center;padding:12px 16px;background:{status_bg};border-bottom:1px solid #e5e7eb;">
211
+ <div>
212
+ <span style="font-size:16px;font-weight:700;color:#1f2937;">{reg_name}</span>
213
+ <p style="font-size:11px;color:#6b7280;margin:2px 0 0 0;">{reg_result["description"]}</p>
214
+ </div>
215
+ <div style="text-align:right;">
216
+ <div style="font-size:24px;font-weight:700;color:{status_color};">{rate}%</div>
217
+ <div style="font-size:11px;color:{status_color};font-weight:500;">{status}</div>
218
+ </div>
219
+ </div>
220
+ <div style="padding:8px 16px;">
221
+ '''
222
+
223
+ for check in reg_result["checks"]:
224
+ color, bg = RISK_STYLES[check["severity"]]
225
+ status_icon = "βœ…" if check["status"] == "PASS" else "❌"
226
+ status_text = "Found" if check["status"] == "PASS" else "Missing"
227
+ keywords = ", ".join(check["matched_keywords"][:3]) if check["matched_keywords"] else "β€”"
228
+
229
+ html += f'''
230
+ <div style="display:flex;justify-content:space-between;align-items:flex-start;padding:8px 0;border-bottom:1px solid #f3f4f6;">
231
+ <div style="flex:1;">
232
+ <div style="font-size:12px;font-weight:500;color:#374151;">{check["description"]}</div>
233
+ <div style="font-size:10px;color:#9ca3af;margin-top:2px;">Keywords: {keywords}</div>
234
+ </div>
235
+ <div style="display:flex;align-items:center;gap:6px;margin-left:8px;">
236
+ <span style="font-size:10px;color:{color};font-weight:600;background:{bg};padding:2px 8px;border-radius:4px;">{check["severity"]}</span>
237
+ <span style="font-size:13px;">{status_icon}</span>
238
+ </div>
239
+ </div>
240
+ '''
241
+
242
+ html += '</div></div>'
243
+
244
+ html += '</div>'
245
+ return html
obligations.py ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ClauseGuard β€” Obligation Tracker
3
+ ═══════════════════════════════
4
+ Extract action items, deadlines, and obligations from contracts.
5
+ Categorize: monetary, compliance, reporting, delivery
6
+ """
7
+
8
+ import re
9
+ from collections import defaultdict
10
+ from datetime import datetime, timedelta
11
+
12
+ # Obligation keywords by category
13
+ OBLIGATION_PATTERNS = {
14
+ "monetary": [
15
+ r"(?:shall|must|will|agrees? to)\s+pay\s+(?:\$?[\d,]+(?:\.\d{2})?)",
16
+ r"(?:fee|payment|compensation|reimburs(?:e|ement))\s+of\s+(?:\$?[\d,]+(?:\.\d{2})?)",
17
+ r"(?:shall|must|will)\s+remit\s+(?:\$?[\d,]+(?:\.\d{2})?)",
18
+ r"(?:annual|monthly|quarterly)\s+(?:fee|payment)\s+of",
19
+ r"(?:liquidated damages|penalty)\s+of\s+(?:\$?[\d,]+(?:\.\d{2})?)",
20
+ ],
21
+ "compliance": [
22
+ r"(?:shall|must|will)\s+comply\s+with",
23
+ r"(?:shall|must|will)\s+adhere\s+to",
24
+ r"(?:shall|must|will)\s+conform\s+to",
25
+ r"(?:shall|must|will)\s+follow\s+(?:the|all)\s+(?:applicable|relevant)\s+(?:laws|regulations|standards)",
26
+ r"(?:GDPR|CCPA|HIPAA|SOX|PCI-DSS|ISO\s+\d+)",
27
+ r"(?:confidential|privacy|data protection)",
28
+ r"(?:shall|must|will)\s+obtain\s+(?:necessary|required)\s+(?:approvals?|permits?|licenses?)",
29
+ r"(?:shall|must|will)\s+maintain\s+(?:insurance|coverage|bond)",
30
+ ],
31
+ "reporting": [
32
+ r"(?:shall|must|will)\s+report",
33
+ r"(?:shall|must|will)\s+provide\s+(?:regular|monthly|quarterly|annual)\s+(?:reports?|updates?|status)",
34
+ r"(?:shall|must|will)\s+notify",
35
+ r"(?:shall|must|will)\s+inform",
36
+ r"(?:shall|must|will)\s+deliver\s+(?:a|an|the)\s+report",
37
+ r"(?:audit|inspection)\s+(?:reports?|rights?)",
38
+ ],
39
+ "delivery": [
40
+ r"(?:shall|must|will)\s+deliver",
41
+ r"(?:shall|must|will)\s+provide",
42
+ r"(?:shall|must|will)\s+furnish",
43
+ r"(?:shall|must|will)\s+supply",
44
+ r"(?:shall|must|will)\s+submit",
45
+ r"(?:delivery|performance)\s+(?:date|schedule|timeline)",
46
+ r"(?:within|no later than|by)\s+(?:\d+)\s+(?:days?|weeks?|months?|years?)",
47
+ ],
48
+ "termination": [
49
+ r"(?:shall|must|will)\s+return",
50
+ r"(?:shall|must|will)\s+destroy",
51
+ r"(?:shall|must|will)\s+cease",
52
+ r"(?:upon|after)\s+termination",
53
+ r"(?:post-termination|surviving)\s+obligations?",
54
+ ],
55
+ }
56
+
57
+ # Timeframe extraction
58
+ TIME_PATTERNS = [
59
+ (r"within\s+(\d+)\s+(day|week|month|year)s?", "relative"),
60
+ (r"no\s+later\s+than\s+(\d+)\s+(day|week|month|year)s?", "relative"),
61
+ (r"within\s+(\d+)\s+business\s+days?", "business_days"),
62
+ (r"by\s+([A-Z][a-z]+\s+\d{1,2},?\s+\d{4})", "absolute"),
63
+ (r"on\s+or\s+before\s+([A-Z][a-z]+\s+\d{1,2},?\s+\d{4})", "absolute"),
64
+ (r"(\d{1,2}/\d{1,2}/\d{2,4})", "absolute_date"),
65
+ (r"(\d{1,2}-\d{1,2}-\d{2,4})", "absolute_date"),
66
+ ]
67
+
68
+ PARTY_PATTERNS = [
69
+ r"\b(?:Party A|Party B|Disclosing Party|Receiving Party|Licensor|Licensee|Buyer|Seller|Tenant|Landlord|Employer|Employee|Company|Customer|Vendor|Client)\b",
70
+ r"\b[A-Z][A-Za-z0-9\s&]+(?:Inc\.?|LLC|Ltd\.?|Limited|Corp\.?|Corporation|PLC|GmbH|AG|S\.A\.?|B\.V\.)\b",
71
+ ]
72
+
73
+
74
+ def extract_obligations(text):
75
+ """Extract obligations from contract text."""
76
+ obligations = []
77
+
78
+ # Split into sentences
79
+ sentences = re.split(r'(?<=[.!?])\s+(?=[A-Z])', text)
80
+
81
+ for sentence in sentences:
82
+ sentence = sentence.strip()
83
+ if len(sentence) < 30:
84
+ continue
85
+
86
+ found_types = set()
87
+ for otype, patterns in OBLIGATION_PATTERNS.items():
88
+ for pat in patterns:
89
+ if re.search(pat, sentence, re.IGNORECASE):
90
+ found_types.add(otype)
91
+ break
92
+
93
+ if not found_types:
94
+ continue
95
+
96
+ # Extract party
97
+ party = "Unknown"
98
+ for pp in PARTY_PATTERNS:
99
+ m = re.search(pp, sentence)
100
+ if m:
101
+ party = m.group(0)
102
+ break
103
+
104
+ # Extract timeframe
105
+ deadline = "Not specified"
106
+ for pat, ptype in TIME_PATTERNS:
107
+ m = re.search(pat, sentence, re.IGNORECASE)
108
+ if m:
109
+ if ptype == "relative":
110
+ num = m.group(1)
111
+ unit = m.group(2)
112
+ deadline = f"Within {num} {unit}(s)"
113
+ elif ptype == "business_days":
114
+ num = m.group(1)
115
+ deadline = f"Within {num} business day(s)"
116
+ elif ptype in ("absolute", "absolute_date"):
117
+ deadline = m.group(1)
118
+ break
119
+
120
+ for otype in found_types:
121
+ obligations.append({
122
+ "type": otype,
123
+ "party": party,
124
+ "description": sentence[:250] + ("..." if len(sentence) > 250 else ""),
125
+ "deadline": deadline,
126
+ "full_text": sentence,
127
+ })
128
+
129
+ return obligations
130
+
131
+
132
+ def render_obligations_html(obligations):
133
+ """Render obligations as HTML cards for Gradio."""
134
+ if not obligations:
135
+ return '<div style="padding:16px;color:#6b7280;text-align:center;">No obligations detected.</div>'
136
+
137
+ # Group by type
138
+ grouped = defaultdict(list)
139
+ for ob in obligations:
140
+ grouped[ob["type"]].append(ob)
141
+
142
+ type_icons = {
143
+ "monetary": "πŸ’°",
144
+ "compliance": "βš–οΈ",
145
+ "reporting": "πŸ“Š",
146
+ "delivery": "πŸ“¦",
147
+ "termination": "πŸ›‘",
148
+ }
149
+ type_colors = {
150
+ "monetary": "#22c55e",
151
+ "compliance": "#f59e0b",
152
+ "reporting": "#3b82f6",
153
+ "delivery": "#8b5cf6",
154
+ "termination": "#ef4444",
155
+ }
156
+
157
+ html = '<div style="font-family:system-ui,sans-serif;">'
158
+
159
+ # Summary counts
160
+ html += '<div style="display:grid;grid-template-columns:repeat(auto-fit,minmax(120px,1fr));gap:8px;margin-bottom:16px;">'
161
+ for otype, obs in sorted(grouped.items()):
162
+ color = type_colors.get(otype, "#6b7280")
163
+ icon = type_icons.get(otype, "πŸ“‹")
164
+ html += f'''
165
+ <div style="text-align:center;padding:10px;border-radius:8px;background:{color}15;border:1px solid {color}30;">
166
+ <div style="font-size:20px;">{icon}</div>
167
+ <div style="font-size:20px;font-weight:700;color:{color};">{len(obs)}</div>
168
+ <div style="font-size:11px;color:{color};text-transform:capitalize;">{otype}</div>
169
+ </div>
170
+ '''
171
+ html += '</div>'
172
+
173
+ # Individual cards
174
+ for otype, obs in sorted(grouped.items()):
175
+ color = type_colors.get(otype, "#6b7280")
176
+ icon = type_icons.get(otype, "πŸ“‹")
177
+ html += f'<h3 style="font-size:14px;color:#374151;margin:16px 0 8px 0;border-bottom:2px solid {color}30;padding-bottom:4px;">{icon} {otype.title()} Obligations</h3>'
178
+ for ob in obs:
179
+ html += f'''
180
+ <div style="border:1px solid #e5e7eb;border-left:4px solid {color};border-radius:6px;padding:10px;margin-bottom:8px;background:#fafafa;">
181
+ <div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:4px;">
182
+ <span style="font-size:12px;font-weight:600;color:{color};">{ob["party"]}</span>
183
+ <span style="font-size:11px;color:#6b7280;background:#f3f4f6;padding:2px 8px;border-radius:4px;">{ob["deadline"]}</span>
184
+ </div>
185
+ <p style="font-size:12px;color:#4b5563;margin:0;line-height:1.5;">{ob["description"]}</p>
186
+ </div>
187
+ '''
188
+
189
+ html += '</div>'
190
+ return html
requirements.txt CHANGED
@@ -1,4 +1,11 @@
1
- gradio>=5.0
2
- transformers>=5.0
3
- torch
4
- numpy
 
 
 
 
 
 
 
 
1
+ gradio>=5.23.0
2
+ transformers>=5.6.1
3
+ torch>=2.5.0
4
+ numpy>=2.0.0
5
+ pdfplumber>=0.11.0
6
+ python-docx>=1.1.0
7
+ spacy>=3.8.0
8
+ scikit-learn>=1.6.0
9
+ peft>=0.15.0
10
+ accelerate>=1.2.0
11
+ pandas>=2.2.0