Spaces:
Sleeping
Sleeping
Upload README.md
Browse files
README.md
CHANGED
|
@@ -10,56 +10,101 @@ app_file: app.py
|
|
| 10 |
pinned: false
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# π‘οΈ ClauseGuard β
|
| 14 |
|
| 15 |
-
**
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
###
|
| 22 |
-
- **
|
| 23 |
-
- **
|
| 24 |
-
- **
|
| 25 |
-
- **NLI Contradiction Detection** β Identifies conflicting clauses and missing critical provisions
|
| 26 |
-
- **PDF / DOCX / TXT Support** β Upload any contract format
|
| 27 |
|
| 28 |
### UI/UX
|
| 29 |
-
- **3-Panel Professional Layout** β
|
| 30 |
-
- **Document Viewer** β
|
| 31 |
-
- **Clause Cards** β Expandable cards with
|
| 32 |
-
- **Export Reports** β JSON (structured
|
| 33 |
-
- **Color-Coded Risk Badges** β
|
| 34 |
-
|
| 35 |
-
## π§ Models
|
| 36 |
-
|
| 37 |
-
| Component |
|
| 38 |
-
|-----------|-------|
|
| 39 |
-
| Clause Classification | `Mokshith31/legalbert-contract-clause-classification`
|
| 40 |
-
| Fallback Detection | Regex patterns covering 15+ clause types |
|
| 41 |
| NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
|
| 42 |
-
| NLI | Heuristic contradiction detection
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
- [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β 510 contracts, 13K annotations, 41 clause categories
|
| 47 |
- [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β 322 legal reasoning tasks
|
| 48 |
- [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β Unfair Terms of Service classification
|
|
|
|
| 49 |
|
| 50 |
## π Usage
|
| 51 |
|
| 52 |
-
1. Upload a contract (PDF, DOCX, or TXT) or paste text directly
|
| 53 |
2. Click **Analyze Contract**
|
| 54 |
-
3. View results across tabs:
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
## β οΈ Disclaimer
|
| 58 |
|
| 59 |
-
*Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions.*
|
| 60 |
|
| 61 |
## π Links
|
| 62 |
|
| 63 |
-
- [Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
|
| 64 |
-
- [
|
| 65 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
pinned: false
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# π‘οΈ ClauseGuard β World's Best Open-Source Legal Contract Analysis
|
| 14 |
|
| 15 |
+
**ClauseGuard** is the most comprehensive open-source AI-powered legal contract analysis tool. It analyzes contracts using state-of-the-art legal NLP models and provides actionable risk assessments.
|
| 16 |
|
| 17 |
+
## β¨ Core Features
|
| 18 |
|
| 19 |
+
### Analysis Engine
|
| 20 |
+
| Feature | Description |
|
| 21 |
+
|---------|-------------|
|
| 22 |
+
| **41 CUAD Clause Categories** | Full taxonomy: Document Name, Parties, Governing Law, Indemnification, Termination, Non-Compete, IP Ownership, Audit Rights, Force Majeure, and more |
|
| 23 |
+
| **4-Tier Risk Scoring** | Critical π΄ / High π / Medium π‘ / Low π’ with visual risk matrix |
|
| 24 |
+
| **Legal NER** | Extracts parties, dates, monetary values ($), jurisdictions, defined terms, and party roles |
|
| 25 |
+
| **NLI Contradiction Detection** | Identifies conflicting clauses (e.g., uncapped + capped liability) and missing critical provisions |
|
| 26 |
+
| **Obligation Tracker** | Categorizes action items: monetary π°, compliance βοΈ, reporting π, delivery π¦, termination π |
|
| 27 |
+
| **Compliance Checker** | Validates against GDPR, CCPA, SOX, HIPAA, and FINRA requirements |
|
| 28 |
+
| **Contract Comparison** | Side-by-side diff between two contracts with alignment scoring |
|
| 29 |
|
| 30 |
+
### Document Support
|
| 31 |
+
- **PDF** parsing via `pdfplumber`
|
| 32 |
+
- **DOCX/DOC** parsing via `python-docx`
|
| 33 |
+
- **TXT / Markdown** direct text input
|
|
|
|
|
|
|
| 34 |
|
| 35 |
### UI/UX
|
| 36 |
+
- **3-Panel Professional Layout** β Upload sidebar + Main analysis + Summary dashboard
|
| 37 |
+
- **Document Viewer** β Inline entity highlights (colored annotations)
|
| 38 |
+
- **Clause Cards** β Expandable risk-badged cards with confidence scores
|
| 39 |
+
- **Export Reports** β JSON (structured) and CSV (tabular) downloads
|
| 40 |
+
- **Color-Coded Risk Badges** β Instant visual triage
|
| 41 |
+
|
| 42 |
+
## π§ Models & Architecture
|
| 43 |
+
|
| 44 |
+
| Component | Technology |
|
| 45 |
+
|-----------|------------|
|
| 46 |
+
| Clause Classification | `Mokshith31/legalbert-contract-clause-classification` β LoRA adapter on `nlpaueb/legal-bert-base-uncased`, fine-tuned on CUAD 41-class taxonomy |
|
|
|
|
| 47 |
| NER | Rule-based with 7 entity types (dates, money, parties, jurisdictions, defined terms) |
|
| 48 |
+
| NLI | Heuristic contradiction detection with 5 conflict patterns + missing-clause detection |
|
| 49 |
+
| Compliance | Regulatory keyword matching across GDPR, CCPA, SOX, HIPAA, FINRA |
|
| 50 |
+
| Comparison | SequenceMatcher-based clause alignment with risk delta analysis |
|
| 51 |
+
| Obligations | Regex pattern matching across 5 obligation categories |
|
| 52 |
+
|
| 53 |
+
## π Risk Scoring Methodology
|
| 54 |
+
|
| 55 |
+
Risk scores combine clause detection with weighted severity:
|
| 56 |
+
- **CRITICAL**: 40 pts (Uncapped Liability, Arbitration, IP Assignment, etc.)
|
| 57 |
+
- **HIGH**: 20 pts (Non-Compete, Exclusivity, Unilateral Change, etc.)
|
| 58 |
+
- **MEDIUM**: 10 pts (Governing Law, Jurisdiction, etc.)
|
| 59 |
+
- **LOW**: 3 pts (Document Name, Dates, etc.)
|
| 60 |
|
| 61 |
+
Final score normalized to 0-100 with letter grades:
|
| 62 |
+
- A (0-14): Low risk
|
| 63 |
+
- B (15-29): Moderate risk
|
| 64 |
+
- C (30-49): Elevated risk
|
| 65 |
+
- D (50-69): High risk
|
| 66 |
+
- F (70+): Critical risk
|
| 67 |
+
|
| 68 |
+
## π Datasets & Research
|
| 69 |
|
| 70 |
- [CUAD](https://huggingface.co/datasets/theatticusproject/cuad-qa) β 510 contracts, 13K annotations, 41 clause categories
|
| 71 |
- [LegalBench](https://huggingface.co/datasets/nguha/legalbench) β 322 legal reasoning tasks
|
| 72 |
- [LexGLUE](https://huggingface.co/datasets/coastalcph/lex_glue) β Unfair Terms of Service classification
|
| 73 |
+
- Paper: [CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review](https://arxiv.org/abs/2103.06268) (Hendrycks et al., 2021)
|
| 74 |
|
| 75 |
## π Usage
|
| 76 |
|
| 77 |
+
1. **Upload** a contract (PDF, DOCX, or TXT) or paste text directly
|
| 78 |
2. Click **Analyze Contract**
|
| 79 |
+
3. View results across tabs:
|
| 80 |
+
- **Document**: Full text with inline entity highlights
|
| 81 |
+
- **Clauses**: Detected clauses with risk badges
|
| 82 |
+
- **Entities**: Extracted parties, dates, money, jurisdictions
|
| 83 |
+
- **Contradictions**: Conflicting clauses and missing provisions
|
| 84 |
+
- **Obligations**: Action items categorized by type
|
| 85 |
+
- **Compliance**: Regulatory framework checks
|
| 86 |
+
4. **Export** JSON/CSV reports
|
| 87 |
+
|
| 88 |
+
## π Compare Contracts
|
| 89 |
+
|
| 90 |
+
Switch to the **Compare Contracts** tab to:
|
| 91 |
+
- Upload or paste two contracts side-by-side
|
| 92 |
+
- See clause-level diffs (added, removed, modified)
|
| 93 |
+
- Get an alignment score and risk delta
|
| 94 |
+
- View raw JSON comparison data
|
| 95 |
|
| 96 |
## β οΈ Disclaimer
|
| 97 |
|
| 98 |
+
*Not legal advice. ClauseGuard is an AI-powered analysis tool for informational purposes only. Always consult a qualified attorney for legal decisions. The tool may miss nuances and should be used as a preliminary screening aid, not a substitute for professional legal review.*
|
| 99 |
|
| 100 |
## π Links
|
| 101 |
|
| 102 |
+
- [ClauseGuard Space](https://huggingface.co/spaces/gaurv007/ClauseGuard)
|
| 103 |
+
- [Clause Classifier Model](https://huggingface.co/Mokshith31/legalbert-contract-clause-classification)
|
| 104 |
+
- [Legal-BERT Base](https://huggingface.co/nlpaueb/legal-bert-base-uncased)
|
| 105 |
+
- [CUAD Dataset](https://huggingface.co/datasets/theatticusproject/cuad-qa)
|
| 106 |
+
- [CUAD Paper (arXiv:2103.06268)](https://arxiv.org/abs/2103.06268)
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
*Built with β€οΈ using Gradio, Hugging Face Transformers, and Legal-BERT. Open source and free for all.*
|