πŸ”§ v4.2: Critical bug fixes + performance optimizations (7 bugs, 4 perf improvements)

#3
by gaurv007 - opened

ClauseGuard v4.2 β€” Deep Audit Fixes

πŸ”΄ Critical Bug Fixes

  1. NLI Contradiction Detection was BROKEN β€” pipeline("text-classification") with dict input silently failed for cross-encoder. Replaced with CrossEncoder.predict() from sentence-transformers which accepts (text_a, text_b) tuples correctly. Contradictions now actually work.
  2. BoundedCache Race Condition β€” OrderedDict compound operations are NOT atomic. Added threading.RLock to prevent crashes under concurrent Gradio requests.

🟠 High-Severity Bug Fixes

  1. Extension Risk Formula Mismatch β€” Local fallback used (weighted/clauses)*100 (normalized). Backend uses 100*(1-1/(1+w/30)) (diminishing returns). Same contract got completely different scores. Fixed to match.
  2. Extension API_BASE URL Wrong β€” Was pointing to non-existent -api subdomain. Fixed to correct Space URL.
  3. Missing Regex Labels β€” Indemnification, Confidentiality, Force Majeure, Penalties had regex patterns but no entries in RISK_MAP/DESC_MAP. Added.
  4. Inconsistent Model Name β€” compare.py used "all-MiniLM-L6-v2" without prefix while chatbot.py used "sentence-transformers/all-MiniLM-L6-v2". Could cause duplicate downloads.

⚑ Performance Improvements

  1. Pre-compiled ALL regex patterns β€” Clause classification (45 label patterns), obligation extraction (25+ patterns), compliance negation (8 patterns), false positive filters, time patterns, party patterns β€” all compiled once at module level instead of per-call.
  2. API Rate Limiter Memory Fix β€” Stale IPs now cleaned up every 60s regardless of dict size (was only cleaning when >1000 entries).

πŸ”’ Security

  1. API CORS localhost β€” localhost:3000/3001 origins now require explicit CORS_ALLOW_LOCALHOST=true env var instead of being always allowed.

πŸ“ Housekeeping

  1. Version updated to v4.2 across app.py docstring, README.md, and changelog.
gaurv007 changed pull request status to open

Files Changed (7 files)

File Changes
app.py πŸ”΄ NLI: CrossEncoder instead of broken pipeline Β· πŸ”΄ BoundedCache: threading.RLock Β· Pre-compiled regex Β· Missing labels added
obligations.py ⚑ Pre-compiled all obligation/false-positive/time/party patterns at module level
compliance.py ⚑ Pre-compiled negation patterns at module level
compare.py 🟑 Fixed inconsistent model name (added sentence-transformers/ prefix)
extension/background.js 🟠 Fixed risk formula to match backend · Fixed API_BASE URL
api/main.py πŸ”’ CORS localhost requires env var Β· Rate limiter periodic cleanup
README.md Version bump to v4.2 + changelog

How to verify

  1. NLI fix: Analyze a contract containing both "uncapped liability" and "cap on liability" clauses β†’ should now show contradiction with NLI confidence score (was previously silent/heuristic-only)
  2. Thread safety: Run two concurrent analyses β†’ no more potential KeyError on cache eviction
  3. Regex perf: Check startup logs β€” patterns compile once on import, not per-clause

Next recommended steps (not in this PR)

  • ONNX export + INT8 quantization (2-4x inference speedup)
  • Upgrade embedder from all-MiniLM-L6-v2 to BAAI/bge-small-en-v1.5 (+21% retrieval accuracy)
  • Batch clause classification (single forward pass for all clauses)
  • Gradio concurrency_limit on analysis button
gaurv007 changed pull request status to merged

Sign up or log in to comment