gaurv007 commited on
Commit
d3099a5
Β·
verified Β·
1 Parent(s): e24206c

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -31
README.md CHANGED
@@ -10,56 +10,101 @@ app_file: app.py
10
  pinned: false
11
  ---
12
 
13
- # πŸ›‘οΈ ClauseGuard β€” AI Contract Analysis
14
 
15
- **The world's most comprehensive open-source legal contract analysis tool.**
16
 
17
- ClauseGuard automatically analyzes legal contracts using state-of-the-art NLP models fine-tuned on legal data.
18
 
19
- ## ✨ Features
 
 
 
 
 
 
 
 
 
20
 
21
- ### Core Analysis
22
- - **41 CUAD Clause Categories** β€” Detects and classifies clauses across the full [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) taxonomy (Document Name, Parties, Governing Law, Indemnification, Termination, Non-Compete, IP Ownership, etc.)
23
- - **4-Tier Risk Scoring** β€” Critical / High / Medium / Low with visual risk matrix
24
- - **Legal NER** β€” Extracts parties, dates, monetary values, jurisdictions, defined terms
25
- - **NLI Contradiction Detection** β€” Identifies conflicting clauses and missing critical provisions
26
- - **PDF / DOCX / TXT Support** β€” Upload any contract format
27
 
28
  ### UI/UX
29
- - **3-Panel Professional Layout** β€” Sidebar upload + Main analysis + Summary dashboard
30
- - **Document Viewer** β€” Text with inline entity highlights
31
- - **Clause Cards** β€” Expandable cards with risk badges and descriptions
32
- - **Export Reports** β€” JSON (structured data) and CSV (tabular) downloads
33
- - **Color-Coded Risk Badges** β€” Visual indicators for quick triage
34
-
35
- ## 🧠 Models Used
36
-
37
- | Component | Model |
38
- |-----------|-------|
39
- | Clause Classification | `Mokshith31/legalbert-contract-clause-classification` (LoRA on `nlpaueb/legal-bert-base-uncased`) |
40
- | Fallback Detection | Regex patterns covering 15+ clause types |
41
  | NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
42
- | NLI | Heuristic contradiction detection across 5 conflict patterns |
 
 
 
 
 
 
 
 
 
 
 
43
 
44
- ## πŸ“š Datasets
 
 
 
 
 
 
 
45
 
46
  - [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β€” 510 contracts, 13K annotations, 41 clause categories
47
  - [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β€” 322 legal reasoning tasks
48
  - [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β€” Unfair Terms of Service classification
 
49
 
50
  ## πŸš€ Usage
51
 
52
- 1. Upload a contract (PDF, DOCX, or TXT) or paste text directly
53
  2. Click **Analyze Contract**
54
- 3. View results across tabs: Document, Clauses, Entities, Contradictions
55
- 4. Download JSON/CSV reports
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ## ⚠️ Disclaimer
58
 
59
- *Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions.*
60
 
61
  ## πŸ”— Links
62
 
63
- - [Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
64
- - [CUAD Paper](https://arxiv.org/abs/2103.06268)
65
- - [LegalBench](https://huggingface.co/datasets/nguha/legalbench)
 
 
 
 
 
 
 
10
  pinned: false
11
  ---
12
 
13
+ # πŸ›‘οΈ ClauseGuard β€” World's Best Open-Source Legal Contract Analysis
14
 
15
+ **ClauseGuard** is the most comprehensive open-source AI-powered legal contract analysis tool. It analyzes contracts using state-of-the-art legal NLP models and provides actionable risk assessments.
16
 
17
+ ## ✨ Core Features
18
 
19
+ ### Analysis Engine
20
+ | Feature | Description |
21
+ |---------|-------------|
22
+ | **41 CUAD Clause Categories** | Full taxonomy: Document Name, Parties, Governing Law, Indemnification, Termination, Non-Compete, IP Ownership, Audit Rights, Force Majeure, and more |
23
+ | **4-Tier Risk Scoring** | Critical πŸ”΄ / High 🟠 / Medium 🟑 / Low 🟒 with visual risk matrix |
24
+ | **Legal NER** | Extracts parties, dates, monetary values ($), jurisdictions, defined terms, and party roles |
25
+ | **NLI Contradiction Detection** | Identifies conflicting clauses (e.g., uncapped + capped liability) and missing critical provisions |
26
+ | **Obligation Tracker** | Categorizes action items: monetary πŸ’°, compliance βš–οΈ, reporting πŸ“Š, delivery πŸ“¦, termination πŸ›‘ |
27
+ | **Compliance Checker** | Validates against GDPR, CCPA, SOX, HIPAA, and FINRA requirements |
28
+ | **Contract Comparison** | Side-by-side diff between two contracts with alignment scoring |
29
 
30
+ ### Document Support
31
+ - **PDF** parsing via `pdfplumber`
32
+ - **DOCX/DOC** parsing via `python-docx`
33
+ - **TXT / Markdown** direct text input
 
 
34
 
35
  ### UI/UX
36
+ - **3-Panel Professional Layout** β€” Upload sidebar + Main analysis + Summary dashboard
37
+ - **Document Viewer** β€” Inline entity highlights (colored annotations)
38
+ - **Clause Cards** β€” Expandable risk-badged cards with confidence scores
39
+ - **Export Reports** β€” JSON (structured) and CSV (tabular) downloads
40
+ - **Color-Coded Risk Badges** β€” Instant visual triage
41
+
42
+ ## 🧠 Models & Architecture
43
+
44
+ | Component | Technology |
45
+ |-----------|------------|
46
+ | Clause Classification | `Mokshith31/legalbert-contract-clause-classification` β€” LoRA adapter on `nlpaueb/legal-bert-base-uncased`, fine-tuned on CUAD 41-class taxonomy |
 
47
  | NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
48
+ | NLI | Heuristic contradiction detection with 5 conflict patterns + missing-clause detection |
49
+ | Compliance | Regulatory keyword matching across GDPR, CCPA, SOX, HIPAA, FINRA |
50
+ | Comparison | SequenceMatcher-based clause alignment with risk delta analysis |
51
+ | Obligations | Regex pattern matching across 5 obligation categories |
52
+
53
+ ## πŸ“Š Risk Scoring Methodology
54
+
55
+ Risk scores combine clause detection with weighted severity:
56
+ - **CRITICAL**: 40 pts (Uncapped Liability, Arbitration, IP Assignment, etc.)
57
+ - **HIGH**: 20 pts (Non-Compete, Exclusivity, Unilateral Change, etc.)
58
+ - **MEDIUM**: 10 pts (Governing Law, Jurisdiction, etc.)
59
+ - **LOW**: 3 pts (Document Name, Dates, etc.)
60
 
61
+ Final score normalized to 0-100 with letter grades:
62
+ - A (0-14): Low risk
63
+ - B (15-29): Moderate risk
64
+ - C (30-49): Elevated risk
65
+ - D (50-69): High risk
66
+ - F (70+): Critical risk
67
+
68
+ ## πŸ“š Datasets & Research
69
 
70
  - [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β€” 510 contracts, 13K annotations, 41 clause categories
71
  - [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β€” 322 legal reasoning tasks
72
  - [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β€” Unfair Terms of Service classification
73
+ - Paper: [CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review](https://arxiv.org/abs/2103.06268) (Hendrycks et al., 2021)
74
 
75
  ## πŸš€ Usage
76
 
77
+ 1. **Upload** a contract (PDF, DOCX, or TXT) or paste text directly
78
  2. Click **Analyze Contract**
79
+ 3. View results across tabs:
80
+ - **Document**: Full text with inline entity highlights
81
+ - **Clauses**: Detected clauses with risk badges
82
+ - **Entities**: Extracted parties, dates, money, jurisdictions
83
+ - **Contradictions**: Conflicting clauses and missing provisions
84
+ - **Obligations**: Action items categorized by type
85
+ - **Compliance**: Regulatory framework checks
86
+ 4. **Export** JSON/CSV reports
87
+
88
+ ## πŸ”€ Compare Contracts
89
+
90
+ Switch to the **Compare Contracts** tab to:
91
+ - Upload or paste two contracts side-by-side
92
+ - See clause-level diffs (added, removed, modified)
93
+ - Get an alignment score and risk delta
94
+ - View raw JSON comparison data
95
 
96
  ## ⚠️ Disclaimer
97
 
98
+ *Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions. The tool may miss nuances and should be used as a preliminary screening aid, not a substitute for professional legal review.*
99
 
100
  ## πŸ”— Links
101
 
102
+ - [ClauseGuard Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
103
+ - [Clause Classifier Model](https://huggingface.co/Mokshith31/legalbert-contract-clause-classification)
104
+ - [Legal-BERT Base](https://huggingface.co/nlpaueb/legal-bert-base-uncased)
105
+ - [CUAD Dataset](https://huggingface.co/datasets/theatticusproject/cuad-qa)
106
+ - [CUAD Paper (arXiv:2103.06268)](https://arxiv.org/abs/2103.06268)
107
+
108
+ ---
109
+
110
+ *Built with ❀️ using Gradio, Hugging Face Transformers, and Legal-BERT. Open source and free for all.*