Spaces:
Sleeping
Sleeping
π§ v4.2: Critical bug fixes + performance optimizations (7 bugs, 4 perf improvements)
#3
by gaurv007 - opened
ClauseGuard v4.2 β Deep Audit Fixes
π΄ Critical Bug Fixes
- NLI Contradiction Detection was BROKEN β
pipeline("text-classification")with dict input silently failed for cross-encoder. Replaced withCrossEncoder.predict()from sentence-transformers which accepts(text_a, text_b)tuples correctly. Contradictions now actually work. - BoundedCache Race Condition β
OrderedDictcompound operations are NOT atomic. Addedthreading.RLockto prevent crashes under concurrent Gradio requests.
π High-Severity Bug Fixes
- Extension Risk Formula Mismatch β Local fallback used
(weighted/clauses)*100(normalized). Backend uses100*(1-1/(1+w/30))(diminishing returns). Same contract got completely different scores. Fixed to match. - Extension API_BASE URL Wrong β Was pointing to non-existent
-apisubdomain. Fixed to correct Space URL. - Missing Regex Labels β
Indemnification,Confidentiality,Force Majeure,Penaltieshad regex patterns but no entries inRISK_MAP/DESC_MAP. Added. - Inconsistent Model Name β
compare.pyused"all-MiniLM-L6-v2"without prefix whilechatbot.pyused"sentence-transformers/all-MiniLM-L6-v2". Could cause duplicate downloads.
β‘ Performance Improvements
- Pre-compiled ALL regex patterns β Clause classification (45 label patterns), obligation extraction (25+ patterns), compliance negation (8 patterns), false positive filters, time patterns, party patterns β all compiled once at module level instead of per-call.
- API Rate Limiter Memory Fix β Stale IPs now cleaned up every 60s regardless of dict size (was only cleaning when >1000 entries).
π Security
- API CORS localhost β
localhost:3000/3001origins now require explicitCORS_ALLOW_LOCALHOST=trueenv var instead of being always allowed.
π Housekeeping
- Version updated to v4.2 across app.py docstring, README.md, and changelog.
gaurv007 changed pull request status to open
Files Changed (7 files)
| File | Changes |
|---|---|
app.py |
π΄ NLI: CrossEncoder instead of broken pipeline Β· π΄ BoundedCache: threading.RLock Β· Pre-compiled regex Β· Missing labels added |
obligations.py |
β‘ Pre-compiled all obligation/false-positive/time/party patterns at module level |
compliance.py |
β‘ Pre-compiled negation patterns at module level |
compare.py |
π‘ Fixed inconsistent model name (added sentence-transformers/ prefix) |
extension/background.js |
π Fixed risk formula to match backend Β· Fixed API_BASE URL |
api/main.py |
π CORS localhost requires env var Β· Rate limiter periodic cleanup |
README.md |
Version bump to v4.2 + changelog |
How to verify
- NLI fix: Analyze a contract containing both "uncapped liability" and "cap on liability" clauses β should now show contradiction with NLI confidence score (was previously silent/heuristic-only)
- Thread safety: Run two concurrent analyses β no more potential
KeyErroron cache eviction - Regex perf: Check startup logs β patterns compile once on import, not per-clause
Next recommended steps (not in this PR)
- ONNX export + INT8 quantization (2-4x inference speedup)
- Upgrade embedder from
all-MiniLM-L6-v2toBAAI/bge-small-en-v1.5(+21% retrieval accuracy) - Batch clause classification (single forward pass for all clauses)
- Gradio
concurrency_limiton analysis button
gaurv007 changed pull request status to merged