πŸ›‘οΈ v4.1: Fix all critical bugs, security issues, and performance problems

#2
by gaurv007 - opened

ClauseGuard v4.1 β€” Comprehensive Bug Fix & Performance PR

πŸ”΄ CRITICAL Bug Fixes

  1. XSS sanitization corrupting contract text β€” Removed text.replace(/</, "&lt;") from analyze route that permanently mutated contracts before analysis
  2. Unbounded memory leaks β€” _chunk_cache and _prediction_cache replaced with BoundedCache (LRU, max 500/2000 entries)
  3. Missing middleware.ts β€” Auth guard never executed; anyone could access dashboard without login
  4. NLI input format wrong β€” Changed from [SEP]-concatenated string to proper (text_a, text_b) dict format for cross-encoder
  5. Scan count race condition β€” Fixed table name mismatch (analysis_history vs analyses) and now uses profiles.analyses_this_month
  6. RAG sessions never expired β€” Added TTL-based expiry (1 hour) for RAG sessions

🟠 HIGH-Severity Fixes

  1. Hardcoded admin email removed β€” ankygaur9972@gmail.com removed from public schema.sql
  2. Rate limiter improved β€” Sliding window with proper X-Forwarded-For IP extraction
  3. Input size validation β€” Added 200KB max text limit to prevent DoS
  4. Duplicate model loading eliminated β€” SentenceTransformer singleton in compare.py

🟑 MEDIUM-Severity Fixes

  1. Train/inference alignment β€” Changed from sigmoid (multi-label) to softmax (matching single-label training)
  2. Classifier max_length raised β€” 256β†’512 tokens (was truncating legal clauses)
  3. Risk score formula fixed β€” Now uses absolute risk (diminishing returns), not normalized by document length
  4. Compliance negation detection improved β€” Wider window (200 chars), sentence-boundary aware
  5. Regex fallback coverage expanded β€” Added 20+ missing CUAD categories (Audit Rights, Insurance, Source Code Escrow, etc.)

⚑ Performance Fixes

  1. O(nΒ²) comparison β†’ O(n+m) β€” Pre-compute all embeddings once, use matrix multiplication
  2. Sequential NER β†’ batched β€” Single pipeline call with batch_size=8 instead of per-chunk calls
  3. Gradio SSE polling improved — Exponential backoff (500ms→2s) instead of fixed 1s, increased timeout to 90s
  4. Loading skeleton added β€” loading.tsx for instant navigation feedback

Files Changed

  • app.py β€” Core analysis engine (all ML fixes)
  • compare.py β€” O(nΒ²)β†’O(n) comparison
  • compliance.py β€” Better negation detection
  • api/main.py β€” Rate limiter, RAG session TTL, input validation
  • web/middleware.ts β€” NEW: Auth guard (was missing entirely)
  • web/app/api/analyze/route.ts β€” XSS fix, scan count fix, input validation
  • web/app/api/chat/route.ts β€” Proper session handling documentation
  • web/app/dashboard-pages/analyze/loading.tsx β€” NEW: Loading skeleton
  • web/lib/supabase/schema.sql β€” Removed hardcoded admin email
gaurv007 changed pull request status to open
gaurv007 changed pull request status to merged

Sign up or log in to comment