tonysodano commited on
Commit
de377ae
·
verified ·
1 Parent(s): 6c8c167

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -822
README.md CHANGED
@@ -1,851 +1,185 @@
1
  ---
2
  title: Hallucination Detection For Legal LLM Input Output-CERT Vs HHEM
3
- emoji: 📚
4
- colorFrom: blue
5
- colorTo: red
 
6
  sdk: gradio
7
  sdk_version: 6.13.0
8
  app_file: app.py
9
  pinned: true
10
- short_description: Detection of LLM hallucinations in legal AI outputs.
 
 
 
 
 
 
 
 
 
 
11
  ---
12
- """
13
- Legal Hallucination Detection — Live LLM Generation + CERT + HHEM-2.1-Open
14
-
15
- Workflow:
16
- 1. User provides a question and (optionally) a source legal document.
17
- 2. A selected LLM generates an answer via HF Inference API.
18
- 3. CERT (SGI or DGI) and HHEM-2.1-Open score the generated answer.
19
- 4. Both scores and a verdict are displayed alongside the generated text.
20
-
21
- SGI / DGI from arXiv:2512.13771 and arXiv:2602.13224.
22
- HHEM-2.1-Open: fine-tuned flan-T5 classifier (Vectara).
23
-
24
- Environment variable:
25
- HF_TOKEN — required for gated models (Llama 3, etc.).
26
- Set in Space Settings → Repository secrets.
27
- Free-tier models work without a token.
28
-
29
- DISCLAIMER: This tool detects statistical patterns that correlate with
30
- hallucination. It does not verify case citations, confirm statute numbers,
31
- or validate contract terms against any legal database. A "Grounded" result
32
- means the response is semantically consistent with the source document —
33
- not that it is legally accurate. Do not use output as legal advice.
34
- """
35
-
36
- import logging
37
- import os
38
- import time
39
-
40
- import numpy as np
41
- import gradio as gr
42
- from sentence_transformers import SentenceTransformer
43
- from huggingface_hub import InferenceClient
44
-
45
- logging.basicConfig(level=logging.INFO)
46
- logger = logging.getLogger(__name__)
47
-
48
-
49
- # ─────────────────────────────────────────────────────────────────────────────
50
- # MODELS AVAILABLE IN THE DROPDOWN
51
- #
52
- # Tier column is informational only — displayed in the UI.
53
- # "free" = no token required, accessible on free HF accounts
54
- # "pro" = requires HF Pro subscription or valid HF_TOKEN with access
55
- # "nvidia"= NVIDIA NIM endpoint via HF Inference API (Pro)
56
- # ─────────────────────────────────────────────────────────────────────────────
57
-
58
- MODEL_CATALOG = [
59
- # ── Free tier ─────────────────────────────────────────────────────────────
60
- {
61
- "label": "Mistral 7B Instruct v0.3 [free]",
62
- "id": "mistralai/Mistral-7B-Instruct-v0.3",
63
- "tier": "free",
64
- },
65
- {
66
- "label": "Zephyr 7B Beta [free]",
67
- "id": "HuggingFaceH4/zephyr-7b-beta",
68
- "tier": "free",
69
- },
70
- {
71
- "label": "Qwen 2.5 7B Instruct [free]",
72
- "id": "Qwen/Qwen2.5-7B-Instruct",
73
- "tier": "free",
74
- },
75
- # ── Pro tier ──────────────────────────────────────────────────────────────
76
- {
77
- "label": "Llama 3.1 8B Instruct [pro]",
78
- "id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
79
- "tier": "pro",
80
- },
81
- {
82
- "label": "Llama 3.1 70B Instruct [pro]",
83
- "id": "meta-llama/Meta-Llama-3.1-70B-Instruct",
84
- "tier": "pro",
85
- },
86
- {
87
- "label": "Mixtral 8x7B Instruct [pro]",
88
- "id": "mistralai/Mixtral-8x7B-Instruct-v0.1",
89
- "tier": "pro",
90
- },
91
- {
92
- "label": "Qwen 2.5 72B Instruct [pro]",
93
- "id": "Qwen/Qwen2.5-72B-Instruct",
94
- "tier": "pro",
95
- },
96
- {
97
- "label": "Mistral Large 2411 [pro]",
98
- "id": "mistralai/Mistral-Large-Instruct-2411",
99
- "tier": "pro",
100
- },
101
- # ── NVIDIA NIM (Pro) ──────────────────────────────────────────────────────
102
- {
103
- "label": "NVIDIA Llama 3.1 Nemotron 70B [nvidia / pro]",
104
- "id": "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
105
- "tier": "nvidia",
106
- },
107
- ]
108
-
109
- MODEL_CHOICES = [m["label"] for m in MODEL_CATALOG]
110
- MODEL_ID_MAP = {m["label"]: m["id"] for m in MODEL_CATALOG}
111
- DEFAULT_MODEL = MODEL_CHOICES[0]
112
-
113
-
114
- # ─────────────────────────────────────────────────────────────────────────────
115
- # INFERENCE CLIENT — disabled, reserved for future HF Pro upgrade
116
- #
117
- # To re-enable live LLM generation:
118
- # 1. Upgrade to HF Pro at huggingface.co/pricing
119
- # 2. Create an HF token at huggingface.co/settings/tokens
120
- # (Read scope + "Make calls to Inference Providers" permission)
121
- # 3. Add it to Space Settings → Repository secrets as HF_TOKEN
122
- # 4. Uncomment the four lines below
123
- # 5. In the UI section at the bottom, wire gen_btn to
124
- # generate_and_evaluate_via_api() instead of generate_from_scenarios()
125
- # ─────────────────────────────────────────────────────────────────────────────
126
-
127
- # _HF_TOKEN = os.environ.get("HF_TOKEN")
128
- # _client = InferenceClient(token=_HF_TOKEN)
129
- # _client_nvidia = InferenceClient(provider="nvidia", token=_HF_TOKEN)
130
- # MODEL_TIER_MAP = {m["label"]: m["tier"] for m in MODEL_CATALOG}
131
-
132
-
133
- # ─────────────────────────────────────────────────────────────────────────────
134
- # SYSTEM PROMPTS
135
- # Two variants: with source doc (strict RAG) and without (general legal).
136
- # Both instruct the model to be precise and avoid adding outside information.
137
- # ─────────────────────────────────────────────────────────────────────────────
138
-
139
- _SYSTEM_WITH_CONTEXT = """You are a precise legal AI assistant.
140
- Answer the user's question using ONLY the provided legal document or contract excerpt.
141
- Do not add any information, clauses, obligations, rights, amounts, or legal rules
142
- that are not explicitly stated in the source document.
143
- If the document does not address the question, say so directly.
144
- Cite the relevant section when possible. Be concise."""
145
-
146
- _SYSTEM_NO_CONTEXT = """You are a precise legal AI assistant.
147
- Answer the user's question accurately based on established law.
148
- Cite specific statutes, rules, or legal standards where applicable.
149
- Be concise and accurate. Do not invent case names, statute numbers,
150
- regulatory requirements, or legal obligations that do not exist."""
151
-
152
-
153
- # ─────────────────────────────────────────────────────────────────────────────
154
- # EMBEDDING MODEL — shared by SGI and DGI
155
- # ─────────────────────────────────────────────────────────────────────────────
156
-
157
- logger.info("Loading embedding model (all-MiniLM-L6-v2)...")
158
- _encoder = SentenceTransformer("all-MiniLM-L6-v2")
159
- logger.info("Embedding model loaded.")
160
-
161
-
162
- # ─────────────────────────────────────────────────────────────────────────────
163
- # DGI REFERENCE DIRECTION — legal domain grounded pairs
164
- # Calibrated across 8 core legal domains. AUROC ~0.76 generic;
165
- # domain-specific calibration reaches 0.90+. See arXiv:2602.13224.
166
- # ─────────────────────────────────────────────────────────────────────────────
167
-
168
- _REFERENCE_PAIRS = [
169
- (
170
- "What is required to form a binding contract?",
171
- "A binding contract requires offer, acceptance, consideration, "
172
- "mutual assent, and legal capacity of the parties. Without all "
173
- "elements, the agreement may be unenforceable.",
174
- ),
175
- (
176
- "What must a plaintiff prove in a negligence claim?",
177
- "A negligence plaintiff must establish duty, breach of that duty, "
178
- "actual and proximate causation, and damages. Failure to prove any "
179
- "element defeats the claim.",
180
- ),
181
- (
182
- "What is the plain meaning rule in statutory interpretation?",
183
- "The plain meaning rule requires courts to apply the ordinary "
184
- "meaning of statutory text when the language is unambiguous, "
185
- "without looking to legislative history or extrinsic sources.",
186
- ),
187
- (
188
- "What is hearsay under the Federal Rules of Evidence?",
189
- "Hearsay is an out-of-court statement offered to prove the truth "
190
- "of the matter asserted. FRE 801 defines it, and FRE 802 makes "
191
- "it generally inadmissible absent a recognized exception.",
192
- ),
193
- (
194
- "When does the Fourth Amendment protect against government searches?",
195
- "The Fourth Amendment protects against unreasonable searches and "
196
- "seizures where the person has a reasonable expectation of privacy. "
197
- "Warrantless searches are presumptively unconstitutional absent an "
198
- "established exception such as exigent circumstances or consent.",
199
- ),
200
- (
201
- "What rights does the CCPA grant California consumers?",
202
- "The CCPA grants California consumers the right to know what "
203
- "personal information is collected, the right to delete it, the "
204
- "right to opt out of its sale, and the right to non-discrimination "
205
- "for exercising those rights.",
206
- ),
207
- (
208
- "What qualifies as a trade secret under the DTSA?",
209
- "Under the Defend Trade Secrets Act, a trade secret is information "
210
- "that derives independent economic value from not being generally "
211
- "known, and for which the owner has taken reasonable measures to "
212
- "maintain its secrecy.",
213
- ),
214
- (
215
- "When is a liquidated damages clause enforceable?",
216
- "A liquidated damages clause is enforceable when actual damages "
217
- "would be difficult to estimate at the time of contracting and the "
218
- "stipulated amount is a reasonable forecast of compensatory damages, "
219
- "not a penalty.",
220
- ),
221
- ]
222
-
223
- logger.info("Computing DGI reference direction from %d legal grounded pairs...", len(_REFERENCE_PAIRS))
224
-
225
- _all_texts = []
226
- for q, r in _REFERENCE_PAIRS:
227
- _all_texts.extend([q, r])
228
-
229
- _all_embs = _encoder.encode(_all_texts, convert_to_numpy=True, normalize_embeddings=False)
230
-
231
- _displacements = []
232
- for i in range(len(_REFERENCE_PAIRS)):
233
- q_emb = _all_embs[i * 2]
234
- r_emb = _all_embs[i * 2 + 1]
235
- delta = r_emb - q_emb
236
- norm = np.linalg.norm(delta)
237
- if norm > 1e-8:
238
- _displacements.append(delta / norm)
239
-
240
- _mu = np.mean(_displacements, axis=0)
241
- _mu_norm = np.linalg.norm(_mu)
242
- _mu_hat = _mu / _mu_norm if _mu_norm > 1e-8 else _mu
243
-
244
- logger.info("DGI reference direction computed (dims=%d, concentration=%.4f).", _mu_hat.shape[0], float(_mu_norm))
245
-
246
-
247
- # ─────────────────────────────────────────────────────────────────────────────
248
- # HHEM-2.1-Open
249
- # ─────────────────────────────────────────────────────────────────────────────
250
-
251
- logger.info("Loading HHEM-2.1-Open...")
252
- from transformers import AutoModelForSequenceClassification
253
-
254
- _hhem = AutoModelForSequenceClassification.from_pretrained(
255
- "vectara/hallucination_evaluation_model",
256
- trust_remote_code=True,
257
- )
258
- logger.info("HHEM loaded.")
259
-
260
-
261
- # ─────────────────────────────────────────────────────────────────────────────
262
- # SGI — Semantic Grounding Index (arXiv:2512.13771)
263
- # SGI = dist(response, question) / dist(response, context)
264
- # ─────────────────────────────────────────────────────────────────────────────
265
-
266
- SGI_FLAG_THRESHOLD = 0.95
267
- SGI_STRONG_PASS = 1.20
268
-
269
-
270
- def compute_sgi(question: str, context: str, response: str) -> dict:
271
- embeddings = _encoder.encode(
272
- [question, context, response],
273
- convert_to_numpy=True,
274
- normalize_embeddings=False,
275
- )
276
- q_emb, ctx_emb, resp_emb = embeddings
277
-
278
- q_dist = float(np.linalg.norm(resp_emb - q_emb))
279
- ctx_dist = float(np.linalg.norm(resp_emb - ctx_emb))
280
-
281
- if ctx_dist < 1e-8:
282
- return {"score": 10.0, "flag": False, "degenerate": True}
283
- if q_dist < 1e-8:
284
- return {"score": 0.0, "flag": True, "degenerate": True}
285
-
286
- sgi = q_dist / ctx_dist
287
- return {
288
- "score": round(sgi, 4),
289
- "flag": sgi < SGI_FLAG_THRESHOLD,
290
- "q_dist": round(q_dist, 4),
291
- "ctx_dist": round(ctx_dist, 4),
292
- "degenerate": False,
293
- }
294
-
295
-
296
- # ─────────────────────────────────────────────────────────────────────────────
297
- # DGI — Directional Grounding Index (arXiv:2602.13224)
298
- # DGI = dot(normalize(phi(r) - phi(q)), mu_hat)
299
- # ─────────────────────────────────────────────────────────────────────────────
300
-
301
- DGI_FLAG_THRESHOLD = 0.30
302
-
303
-
304
- def compute_dgi(question: str, response: str) -> dict:
305
- embeddings = _encoder.encode(
306
- [question, response],
307
- convert_to_numpy=True,
308
- normalize_embeddings=False,
309
- )
310
- q_emb, r_emb = embeddings
311
-
312
- delta = r_emb - q_emb
313
- magnitude = float(np.linalg.norm(delta))
314
-
315
- if magnitude < 1e-8:
316
- return {"score": 0.0, "flag": True, "degenerate": True}
317
-
318
- delta_hat = delta / magnitude
319
- gamma = float(np.dot(delta_hat, _mu_hat))
320
-
321
- if np.isnan(gamma):
322
- return {"score": 0.0, "flag": True, "degenerate": True}
323
-
324
- return {
325
- "score": round(gamma, 4),
326
- "flag": gamma < DGI_FLAG_THRESHOLD,
327
- "magnitude": round(magnitude, 4),
328
- "degenerate": False,
329
- }
330
-
331
-
332
- # ─────────────────────────────────────────────────────────────────────────────
333
- # SCORING WRAPPERS
334
- # ─────────────────────────────────────────────────────────────────────────────
335
-
336
- def score_cert(question: str, response: str, context: str) -> dict:
337
- start = time.perf_counter()
338
- has_context = bool(context.strip())
339
-
340
- if has_context:
341
- result = compute_sgi(question, context, response)
342
- method = "SGI"
343
- else:
344
- result = compute_dgi(question, response)
345
- method = "DGI"
346
-
347
- return {
348
- "method": method,
349
- "raw_score": result["score"],
350
- "grounded": not result["flag"],
351
- "threshold": SGI_FLAG_THRESHOLD if method == "SGI" else DGI_FLAG_THRESHOLD,
352
- "elapsed_ms": round((time.perf_counter() - start) * 1000, 1),
353
- }
354
-
355
-
356
- def score_hhem(question: str, response: str, context: str) -> dict:
357
- has_context = bool(context.strip())
358
- premise = f"{context.strip()}\n\n{question}".strip() if has_context else question
359
- if len(premise) > 1800:
360
- premise = premise[:1800]
361
-
362
- start = time.perf_counter()
363
- scores = _hhem.predict([(premise, response)])
364
- raw_score = float(scores[0])
365
-
366
- return {
367
- "method": "HHEM-2.1-Open",
368
- "raw_score": round(raw_score, 4),
369
- "grounded": raw_score >= 0.5,
370
- "elapsed_ms": round((time.perf_counter() - start) * 1000, 1),
371
- "label": "consistent" if raw_score >= 0.5 else "hallucinated",
372
- }
373
-
374
-
375
- # ─────────────────────────────────────────────────────────────────────────────
376
- # LLM GENERATION — calls HF Inference API
377
- # ─────────────────────────────────────────────────────────────────────────────
378
-
379
- def generate_via_api(question: str, context: str, model_label: str) -> tuple[str, str]:
380
- """
381
- Call the selected model via HF Inference API.
382
- Returns (generated_text, error_message).
383
- error_message is empty string on success.
384
- """
385
- if not question.strip():
386
- return "", "Please enter a question before generating."
387
-
388
- model_id = MODEL_ID_MAP.get(model_label)
389
- if not model_id:
390
- return "", f"Unknown model: {model_label}"
391
-
392
- has_context = bool(context.strip())
393
- system_prompt = _SYSTEM_WITH_CONTEXT if has_context else _SYSTEM_NO_CONTEXT
394
-
395
- user_content = question.strip()
396
- if has_context:
397
- user_content = f"Source document:\n{context.strip()}\n\nQuestion: {question.strip()}"
398
-
399
- messages = [
400
- {"role": "system", "content": system_prompt},
401
- {"role": "user", "content": user_content},
402
- ]
403
-
404
- try:
405
- tier = MODEL_TIER_MAP.get(model_label, "free")
406
- client = _client_nvidia if tier == "nvidia" else _client
407
- logger.info("Calling model: %s (tier: %s)", model_id, tier)
408
- start = time.perf_counter()
409
- completion = client.chat_completion(
410
- model=model_id,
411
- messages=messages,
412
- max_tokens=512,
413
- temperature=0.1,
414
- )
415
- elapsed = round((time.perf_counter() - start) * 1000)
416
- text = completion.choices[0].message.content.strip()
417
- logger.info("Generation complete in %d ms (%d chars)", elapsed, len(text))
418
- return text, ""
419
-
420
- except Exception as exc:
421
- logger.error("Generation failed: %s", exc)
422
- err = str(exc)
423
-
424
- # Surface actionable errors
425
- if "401" in err or "unauthorized" in err.lower():
426
- return "", (
427
- "❌ Authentication error — this model requires a valid HF_TOKEN. "
428
- "Set it in Space Settings → Repository secrets, or choose a free-tier model."
429
- )
430
- if "403" in err or "gated" in err.lower():
431
- return "", (
432
- "❌ Access denied — this is a gated model. "
433
- "Request access on the model page, then add your HF_TOKEN to Space secrets."
434
- )
435
- if "429" in err or "rate" in err.lower():
436
- return "", (
437
- "❌ Rate limit hit — try again in a moment, "
438
- "or upgrade to HF Pro for higher limits."
439
- )
440
- return "", f"❌ Generation failed: {err}"
441
-
442
-
443
- # ���────────────────────────────────────────────────────────────────────────────
444
- # GENERATE + EVALUATE — single button action
445
- # ─────────────────────────────────────────────────────────────────────────────
446
-
447
- def generate_and_evaluate_via_api(
448
- question: str, context: str, model_label: str
449
- ) -> tuple[str, str, str, str]:
450
- """
451
- Generate an LLM answer, then score it.
452
- Returns: (generated_response, cert_md, hhem_md, agreement_md)
453
- """
454
- generated, err = generate_via_api(question, context, model_label)
455
-
456
- if err:
457
- return "", err, "", ""
458
-
459
- cert_md, hhem_md, agreement_md = evaluate_only(question, context, generated)
460
- return generated, cert_md, hhem_md, agreement_md
461
-
462
-
463
- # ─────────────────────────────────────────────────────────────────────────────
464
- # SCENARIO LIBRARY — curated correct + hallucinated response pairs
465
- #
466
- # Used by generate_from_scenarios() — the active generate path.
467
- # No external API required. Each scenario has a correct response and a
468
- # hallucinated response. The radio toggle in the UI selects which to score.
469
- #
470
- # To add scenarios: copy a block, update question/context/correct/hallucinated.
471
- # ─────────────────────────────────────────────────────────────────────────────
472
-
473
- _SCENARIOS = [
474
- {
475
- "label": "NDA — what is protected",
476
- "question": "What information is protected by this NDA?",
477
- "context": (
478
- "Section 2 — Confidential Information: Confidential Information "
479
- "means all non-public technical, financial, and business information "
480
- "disclosed by either party. It does not include information that is "
481
- "already publicly available, independently developed by the receiving "
482
- "party, or received from a third party without restriction."
483
- ),
484
- "correct": (
485
- "Under Section 2, the NDA protects non-public technical, financial, "
486
- "and business information shared by either party. It excludes "
487
- "information that is already public, independently developed, or "
488
- "received from a third party without restriction."
489
- ),
490
- "hallucinated": (
491
- "The NDA protects all technical, financial, and business information "
492
- "for a period of 5 years. After 5 years the information is no longer "
493
- "confidential and can be freely disclosed."
494
- ),
495
- },
496
- {
497
- "label": "Employment — termination without notice",
498
- "question": "Can my employer fire me without warning?",
499
- "context": (
500
- "Section 7 — Termination: Either party may terminate this agreement "
501
- "with 30 days written notice. The Company may terminate immediately, "
502
- "without notice, only in cases of gross misconduct, material breach, "
503
- "or criminal conviction."
504
- ),
505
- "correct": (
506
- "Under Section 7, your employer can only fire you without notice for "
507
- "gross misconduct, material breach, or criminal conviction. All other "
508
- "terminations require 30 days written notice."
509
- ),
510
- "hallucinated": (
511
- "Yes, your employer can fire you immediately for any reason, but they "
512
- "must pay 3 months severance whenever they terminate without notice, "
513
- "as required under California Labor Code Section 1400."
514
- ),
515
- },
516
- {
517
- "label": "SaaS contract — liability cap",
518
- "question": "How much can I recover if the software fails and causes me losses?",
519
- "context": (
520
- "Section 11 — Limitation of Liability: Each party's total liability "
521
- "shall not exceed the fees paid by Customer in the 12 months preceding "
522
- "the claim. Neither party is liable for indirect, incidental, special, "
523
- "or consequential damages under any theory of liability."
524
- ),
525
- "correct": (
526
- "Your maximum recovery is capped at whatever you paid for the software "
527
- "in the 12 months before your claim. You cannot recover for indirect "
528
- "or consequential losses — only direct damages up to that cap."
529
- ),
530
- "hallucinated": (
531
- "Recovery is capped at 12 months of fees for most claims, but data "
532
- "breaches and gross negligence are uncapped under standard software "
533
- "contract law — you can recover full damages in those cases."
534
- ),
535
- },
536
- {
537
- "label": "At-will employment — California (no source doc)",
538
- "question": "What does at-will employment mean in California?",
539
- "context": "",
540
- "correct": (
541
- "At-will employment means either the employer or employee can end the "
542
- "job at any time, for any legal reason or no reason, without owing "
543
- "advance notice or severance. The main limits are anti-discrimination "
544
- "laws — you cannot be fired for race, gender, disability, or other "
545
- "protected characteristics."
546
- ),
547
- "hallucinated": (
548
- "At-will employment means the employer can fire you at any time, but "
549
- "California law requires a written explanation within 10 business days "
550
- "and a minimum of 2 weeks severance under the California WARN Act "
551
- "regardless of company size."
552
- ),
553
- },
554
- {
555
- "label": "Preliminary injunction standard (no source doc)",
556
- "question": "What must I prove to get a preliminary injunction in federal court?",
557
- "context": "",
558
- "correct": (
559
- "Under Winter v. Natural Resources Defense Council, you must show: "
560
- "(1) likely success on the merits, (2) likely irreparable harm absent "
561
- "relief, (3) that the balance of equities tips in your favor, and "
562
- "(4) that an injunction serves the public interest. All four factors "
563
- "must be satisfied."
564
- ),
565
- "hallucinated": (
566
- "Under Johnson v. United States (2019), federal courts apply a "
567
- "two-factor test: you need only show hardship and a colorable claim "
568
- "on the merits. The public interest factor was eliminated by the "
569
- "Supreme Court in 2018."
570
- ),
571
- },
572
- ]
573
-
574
- _SCENARIO_MAP = {s["label"]: s for s in _SCENARIOS}
575
- SCENARIO_LABELS = [s["label"] for s in _SCENARIOS]
576
-
577
-
578
- def generate_from_scenarios(
579
- label: str, response_type: str
580
- ) -> tuple[str, str, str, str, str, str]:
581
- """
582
- Active generate path — no API required.
583
- Selects the pre-written correct or hallucinated response for the chosen
584
- scenario, fills the input boxes, and scores immediately.
585
-
586
- response_type: "Correct answer" | "Hallucinated answer"
587
-
588
- Returns: (question, context, response, cert_md, hhem_md, agreement_md)
589
-
590
- # ── Future upgrade: live LLM generation ──────────────────────────────────
591
- # When upgrading to HF Pro:
592
- # 1. Set HF_TOKEN in Space Settings → Repository secrets
593
- # 2. Uncomment the InferenceClient lines in the INFERENCE CLIENT block above
594
- # 3. In the UI section below, swap gen_btn.click() to call
595
- # generate_and_evaluate_via_api() instead of generate_from_scenarios()
596
- # ─────────────────────────────────────────────────────────────────────────
597
- """
598
- s = _SCENARIO_MAP.get(label)
599
- if not s:
600
- return "", "", "Select a scenario first.", "", "", ""
601
-
602
- use_hallucinated = "Hallucinated" in response_type
603
- response = s["hallucinated"] if use_hallucinated else s["correct"]
604
- question = s["question"]
605
- context = s["context"]
606
-
607
- cert_md, hhem_md, agreement_md = evaluate_only(question, context, response)
608
- return question, context, response, cert_md, hhem_md, agreement_md
609
-
610
-
611
- # ─────────────────────────────────────────────────────────────────────────────
612
- # EVALUATE ONLY — score a manually pasted response
613
- # ─────────────────────────────────────────────────────────────────────────────
614
-
615
- def evaluate_only(
616
- question: str, context: str, response: str
617
- ) -> tuple[str, str, str]:
618
- """Score a response that is already in the text box."""
619
- if not question.strip():
620
- return "Please enter a question.", "", ""
621
- if not response.strip():
622
- return "Please enter or generate an AI response to evaluate.", "", ""
623
-
624
- cert = score_cert(question, response, context)
625
- hhem = score_hhem(question, response, context)
626
-
627
- cert_verdict = "🟢 Grounded" if cert["grounded"] else "🔴 Hallucination detected"
628
- mode_note = (
629
- "*Checked whether the response moved toward your source document or away from it.*"
630
- if cert["method"] == "SGI"
631
- else "*Checked whether the response follows verified legal reasoning patterns.*"
632
- )
633
- cert_md = f"""**{cert_verdict}**
634
- | | |
635
- |---|---|
636
- | Method | `{cert["method"]}` |
637
- | Score | `{cert["raw_score"]}` |
638
- | Threshold | `{cert["threshold"]}` |
639
- | Latency | `{cert["elapsed_ms"]} ms` |
640
- {mode_note}"""
641
-
642
- hhem_verdict = "🟢 Grounded" if hhem["grounded"] else "🔴 Hallucination detected"
643
- hhem_md = f"""**{hhem_verdict}**
644
- | | |
645
- |---|---|
646
- | Method | `{hhem["method"]}` |
647
- | Score | `{hhem["raw_score"]}` |
648
- | Label | `{hhem["label"]}` |
649
- | Latency | `{hhem["elapsed_ms"]} ms` |
650
- *Reads source + response and checks for contradiction.*"""
651
-
652
- agree = cert["grounded"] == hhem["grounded"]
653
- if agree and cert["grounded"]:
654
- agreement_md = "🔵 **Both methods agree — response appears grounded.**"
655
- elif agree and not cert["grounded"]:
656
- agreement_md = "🔵 **Both methods agree — hallucination likely. Verify before use.**"
657
- else:
658
- agreement_md = """🟠 **Methods disagree — manual review recommended.**
659
-
660
- The geometry check says the response is in the right topic area.
661
- The classifier disagrees. This usually means the response *sounds* legally
662
- correct but gets a specific fact wrong: an invented clause, wrong dollar
663
- amount, fabricated case name, or statute that doesn't exist.
664
- Verify manually before relying on this answer."""
665
-
666
- return cert_md, hhem_md, agreement_md
667
-
668
-
669
- # ─────────────────────────────────────────────────────────────────────────────
670
- # EXAMPLES — 8 plain-language legal scenarios
671
- # Odd rows = correct answers. Even rows = hallucinated versions.
672
- # ─────────────────────────────────────────────────────────────────────────────
673
-
674
- EXAMPLES = [
675
- ["What information is protected by this NDA?",
676
- "Section 2 — Confidential Information: 'Confidential Information' means all non-public technical, financial, and business information disclosed by either party. It does not include information that is already publicly available, independently developed by the receiving party, or received from a third party without restriction.",
677
- "Under Section 2, the NDA protects non-public technical, financial, and business information shared by either party. It excludes information that is already public, independently developed, or received from a third party without restriction."],
678
-
679
- ["What information is protected by this NDA?",
680
- "Section 2 — Confidential Information: 'Confidential Information' means all non-public technical, financial, and business information disclosed by either party. It does not include information that is already publicly available, independently developed by the receiving party, or received from a third party without restriction.",
681
- "The NDA protects all technical, financial, and business information for a period of 5 years. After 5 years the information is no longer confidential and can be freely disclosed."],
682
-
683
- ["Can my employer fire me without warning?",
684
- "Section 7 — Termination: Either party may terminate this agreement with 30 days written notice. The Company may terminate immediately, without notice, only in cases of gross misconduct, material breach, or criminal conviction.",
685
- "Under Section 7, your employer can only fire you without notice for gross misconduct, material breach, or criminal conviction. All other terminations require 30 days written notice."],
686
-
687
- ["Can my employer fire me without warning?",
688
- "Section 7 — Termination: Either party may terminate this agreement with 30 days written notice. The Company may terminate immediately, without notice, only in cases of gross misconduct, material breach, or criminal conviction.",
689
- "Yes, your employer can fire you immediately for any reason but must pay 3 months severance whenever they terminate without notice, as required under California Labor Code Section 1400."],
690
-
691
- ["How much can I recover if the software fails and causes me losses?",
692
- "Section 11 — Limitation of Liability: Each party's total liability shall not exceed the fees paid by Customer in the 12 months preceding the claim. Neither party is liable for indirect, incidental, special, or consequential damages under any theory of liability.",
693
- "Your maximum recovery is capped at whatever you paid for the software in the 12 months before your claim. You cannot recover for indirect or consequential losses — only direct damages up to that cap."],
694
-
695
- ["How much can I recover if the software fails and causes me losses?",
696
- "Section 11 — Limitation of Liability: Each party's total liability shall not exceed the fees paid by Customer in the 12 months preceding the claim. Neither party is liable for indirect, incidental, special, or consequential damages under any theory of liability.",
697
- "Recovery is capped at 12 months of fees for most claims, but data breaches and gross negligence are uncapped under standard software contract law — you can recover full damages in those cases."],
698
-
699
- ["What does at-will employment mean in California?",
700
- "",
701
- "At-will employment means either the employer or employee can end the job at any time, for any legal reason or no reason, without owing advance notice or severance. The main limits are anti-discrimination laws — you cannot be fired for race, gender, disability, or other protected characteristics."],
702
-
703
- ["What does at-will employment mean in California?",
704
- "",
705
- "At-will employment means the employer can fire you at any time, but California law requires a written explanation within 10 business days and a minimum of 2 weeks severance under the California WARN Act regardless of company size."],
706
- ]
707
-
708
-
709
- # ─────────────────────────────────────────────────────────────────────────────
710
- # UI
711
- # ─────────────────────────────────────────────────────────────────────────────
712
-
713
- _DISCLAIMER = """> ⚠️ **Research tool — not legal advice.**
714
- > This tool detects statistical patterns that *correlate* with hallucination.
715
- > It does **not** verify case citations, confirm statute numbers, or validate contract
716
- > terms against any authoritative legal database. A **"Grounded"** result means the
717
- > response is semantically consistent with your source — not that it is legally correct.
718
- > Always verify AI-generated legal analysis with a qualified attorney before acting on it."""
719
-
720
- _HOW_IT_WORKS = """---
721
- ### How it works
722
-
723
- 1. **Select a scenario** from the dropdown — or paste your own contract clause, statute, or case excerpt into the source document field.
724
- 2. **Toggle the response type** — correct or hallucinated — and click Generate & Evaluate.
725
- 3. **Or paste any AI response manually** and click Evaluate to score it directly.
726
 
727
- ---
728
 
729
- ### Detection methods
730
 
731
- Two independent detectors run on every evaluation and must both be considered together.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
732
 
733
- | Detector | Method | Speed |
734
- |---|---|---|
735
- | **CERT** (geometry) | Measures whether the response moved toward the source document in embedding space, or drifted away from it | ~5–50 ms |
736
- | **HHEM** (classifier) | Reads source and response as text and checks for semantic contradiction | ~100–200 ms |
737
 
738
- **When both agree**, confidence is high in either direction.
 
 
 
 
739
 
740
- **When they disagree**, the response is geometrically in the correct topic region — it uses the right legal vocabulary in the right context — but likely contains a specific factual error: a fabricated case citation, a clause term that was never in the contract, a statute number that does not exist. This is what the research literature classifies as a *Type III hallucination*: factually wrong within a semantically correct frame. It is the most dangerous failure mode in legal AI and the hardest to catch automatically. Treat any disagreement as a flag for manual review.
 
 
 
 
 
 
741
 
742
  ---
743
 
744
- ### Why geometry detects hallucination
745
 
746
- LLM responses exist as vectors in a high-dimensional embedding space φ: *T* → ℝᵈ. A response that genuinely engages with a source document — a contract clause, a statute, a case holding — will be geometrically displaced toward that document's representation. A hallucinated response tends to remain anchored near the original question rather than moving toward the source.
 
 
747
 
748
- **Semantic Grounding Index (SGI)** quantifies this as a distance ratio:
 
749
 
750
  ```
751
  SGI(q, c, r) = ‖φ(r) − φ(q)‖ / ‖φ(r) − φ(c)‖
752
  ```
753
 
754
- where *q* is the query, *c* is the source document, and *r* is the LLM response. A grounded response satisfies SGI ≥ 0.95 — it moved closer to the source than to the question. No trained classifier required. One embedding call, one ratio.
 
 
 
 
755
 
756
- **Directional Grounding Index (DGI)** applies when no source document is present. It computes the displacement vector Δ = φ(r) − φ(q) and measures its alignment with μ̂ — the mean displacement direction of verified correct legal answers across eight calibrated domains:
 
 
757
 
758
  ```
759
  DGI(q, r) = (Δ / ‖Δ‖) · μ̂
760
  ```
761
 
762
- A score below 0.30 indicates the response trajectory is anomalous relative to verified legal reasoning patterns — a geometric signal of confabulation even without a reference document to compare against.
763
-
764
- This geometric layer is a fast, model-agnostic first-pass filter. It catches *where* the response went in the embedding space. HHEM's learned classifier catches *what* the response says relative to the source. The two signals are orthogonal — running both is the point.
765
-
766
- ---"""
767
-
768
- with gr.Blocks(
769
- title="Legal Hallucination Detection",
770
- theme=gr.themes.Soft(primary_hue="purple", secondary_hue="teal"),
771
- ) as demo:
772
-
773
- gr.Markdown("# Legal Hallucination Detection\n### Hallucination scoring for contract review and legal research")
774
- gr.Markdown(_DISCLAIMER)
775
- gr.Markdown(_HOW_IT_WORKS)
776
-
777
- # ── Scenario selector row ─────────────────────────────────────────────────
778
- # Pick a scenario choose correct or hallucinated → Generate & Evaluate.
779
- # The question and source doc fill automatically.
780
- with gr.Row():
781
- scenario_dd = gr.Dropdown(
782
- choices=SCENARIO_LABELS,
783
- value=SCENARIO_LABELS[0],
784
- label="Scenario",
785
- info="Select a pre-built legal scenario to demo.",
786
- scale=3,
787
- )
788
- response_type = gr.Radio(
789
- choices=["Correct answer", "Hallucinated answer"],
790
- value="Correct answer",
791
- label="Response type",
792
- info="Toggle to see how each version scores.",
793
- scale=1,
794
- )
795
-
796
- gen_btn = gr.Button("⚡ Generate & Evaluate", variant="primary")
797
-
798
- # ── Input boxes (auto-filled by scenario, also editable manually) ─────────
799
- with gr.Row():
800
- with gr.Column(scale=3):
801
- q_in = gr.Textbox(
802
- label="Question (auto-filled by scenario — or type your own)",
803
- placeholder="e.g. Can the company terminate without notice?",
804
- lines=2,
805
- )
806
- ctx_in = gr.Textbox(
807
- label="Source document (auto-filled by scenario or paste your own contract clause, statute, or case excerpt)",
808
- placeholder="e.g. Section 7 Termination: Either party may terminate with 30 days written notice...",
809
- lines=5,
810
- )
811
-
812
- response_box = gr.Textbox(
813
- label="AI response (auto-filled on Generate or paste any AI response and click Evaluate)",
814
- placeholder="Generated answer will appear here — or paste any AI response to score it.",
815
- lines=5,
816
- interactive=True,
817
- )
818
-
819
- eval_btn = gr.Button("Evaluate pasted response", variant="secondary")
820
-
821
- with gr.Row():
822
- cert_out = gr.Markdown(label="CERT")
823
- hhem_out = gr.Markdown(label="HHEM-2.1-Open")
824
-
825
- agreement_out = gr.Markdown(label="Verdict")
826
-
827
- gr.Markdown("""---
828
- *Geometry: [arXiv:2512.13771](https://arxiv.org/abs/2512.13771) · [arXiv:2602.13224](https://arxiv.org/abs/2602.13224) · [arXiv:2603.13259](https://arxiv.org/abs/2603.13259)*""")
829
-
830
- # ── Button wiring ──────────────────────────────────────���──────────────────
831
- #
832
- # ACTIVE: generate_from_scenarios() — uses curated scenario library, no API.
833
- #
834
- # FUTURE (HF Pro upgrade): swap gen_btn.click() fn to generate_and_evaluate_via_api()
835
- # inputs=[q_in, ctx_in, model_dd], outputs=[response_box, cert_out, hhem_out, agreement_out]
836
- # Also add model_dd dropdown back to the UI (see MODEL_CHOICES / MODEL_CATALOG above).
837
-
838
- gen_btn.click(
839
- fn=generate_from_scenarios,
840
- inputs=[scenario_dd, response_type],
841
- outputs=[q_in, ctx_in, response_box, cert_out, hhem_out, agreement_out],
842
- )
843
-
844
- eval_btn.click(
845
- fn=evaluate_only,
846
- inputs=[q_in, ctx_in, response_box],
847
- outputs=[cert_out, hhem_out, agreement_out],
848
- )
849
-
850
- if __name__ == "__main__":
851
- demo.launch()
 
 
 
 
 
 
 
 
1
  ---
2
  title: Hallucination Detection For Legal LLM Input Output-CERT Vs HHEM
3
+ author: Anthony Sodano
4
+ emoji: ⚖️
5
+ colorFrom: purple
6
+ colorTo: indigo
7
  sdk: gradio
8
  sdk_version: 6.13.0
9
  app_file: app.py
10
  pinned: true
11
+ license: apache-2.0
12
+ tags:
13
+ - hallucination-detection
14
+ - llm-evaluation
15
+ - rag
16
+ - grounding
17
+ - legal-ai
18
+ - contract-analysis
19
+ - nlp
20
+ - cert
21
+ short_description: Detect LLM hallucinations in legal AI outputs.
22
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
 
24
 
 
25
 
26
+ [![HF Space](https://img.shields.io/badge/🤗%20Space-CERT%20Demo-4FB3B3)](https://huggingface.co/spaces/tonysodano/Hallucination_Detection_for_Legal_LLM_Input_Output-CERT_vs_HHEMs)
27
+
28
+ # CERT Hallucination Detection
29
+ Detects LLM hallucinations using embedding geometry.
30
+ Various benchmarks.
31
+
32
+ ## Methods compared
33
+
34
+ **CERT SGI** (with context): ratio of distances on the embedding hypersphere —
35
+ `dist(response, question) / dist(response, context)`. No model inference for
36
+ the evaluation. One embedding call, one division.
37
+
38
+ **CERT DGI** (without context): cosine similarity between the response
39
+ displacement vector and the mean displacement of verified grounded pairs.
40
+
41
+ **HHEM-2.1-Open** (Vectara): fine-tuned flan-T5 classifier. Full model
42
+ inference per evaluation call.
43
 
44
+ ## When they disagree
 
 
 
45
 
46
+ Disagreement surfaces **Type III hallucinations** factual errors within
47
+ a correct semantic frame. Embedding geometry cannot detect these: the
48
+ response occupies the geometrically correct region of the space despite
49
+ being factually wrong. HHEM's classifier may catch some of these cases.
50
+ The two methods are orthogonal signals, not competing alternatives.
51
 
52
+ ## Research
53
+ ## Research & Theoretical Foundations
54
+
55
+ This tool is grounded in three intersecting research domains: **geometric hallucination detection**,
56
+ **legal AI benchmarking**, and **retrieval-augmented generation (RAG) faithfulness**. The methods
57
+ implemented here — SGI and DGI — are direct implementations of peer-reviewed work. The legal
58
+ framing addresses a documented, high-stakes failure mode in deployed AI systems.
59
 
60
  ---
61
 
62
+ ### Geometric Hallucination Detection (Core Methods)
63
 
64
+ The CERT framework treats LLM outputs as vectors in a high-dimensional embedding space
65
+ φ: *T* → ℝ^d and uses geometric properties of that space to detect grounding failures —
66
+ without requiring a trained classifier or ground-truth labels.
67
 
68
+ **Semantic Grounding Index (SGI)**
69
+ Defined as the ratio of distances in embedding space:
70
 
71
  ```
72
  SGI(q, c, r) = ‖φ(r) − φ(q)‖ / ‖φ(r) − φ(c)‖
73
  ```
74
 
75
+ where *q* is the query, *c* is the source context (e.g., contract clause), and *r* is the
76
+ LLM response. A grounded response should satisfy SGI ≥ τ (threshold = 0.95), meaning the
77
+ response moved geometrically closer to the context than to the question.
78
+
79
+ - [Semantic Grounding Index: Geometric Bounds on Context Engagement in RAG Systems — arXiv:2512.13771](https://arxiv.org/abs/2512.13771)
80
 
81
+ **Directional Grounding Index (DGI)**
82
+ When no source document is available, DGI measures whether the displacement vector
83
+ Δ = φ(r) − φ(q) aligns with the mean displacement direction μ̂ of verified grounded pairs:
84
 
85
  ```
86
  DGI(q, r) = (Δ / ‖Δ‖) · μ̂
87
  ```
88
 
89
+ A score below 0.30 indicates the response trajectory is anomalous relative to verified
90
+ correct legal reasoning patterns — a geometric signal of confabulation.
91
+
92
+ - [A Geometric Taxonomy of Hallucinations in LLMs — arXiv:2602.13224](https://arxiv.org/abs/2602.13224)
93
+
94
+ **Rotational Constraint Processing**
95
+ Companion work explaining *why* transformer attention geometry produces these detectable
96
+ displacement patterns — grounded responses exhibit measurable rotational alignment with
97
+ factual constraint directions in the residual stream.
98
+
99
+ - [How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing — arXiv:2603.13259](https://arxiv.org/abs/2603.13259)
100
+
101
+ ---
102
+
103
+ ### Hallucination — Foundational Literature
104
+
105
+ **Survey of Hallucination in Natural Language Generation**
106
+ The canonical taxonomy paper. Classifies hallucinations as *intrinsic* (contradicts source)
107
+ vs. *extrinsic* (adds unverifiable content) — a distinction directly relevant to contract
108
+ review, where both failure modes carry legal risk.
109
+
110
+ - [Ji et al. (2022) — arXiv:2202.03629](https://arxiv.org/abs/2202.03629)
111
+
112
+ **TruthfulQA: Measuring How Models Mimic Human Falsehoods**
113
+ Benchmark demonstrating that larger models are not necessarily more truthful — they are
114
+ better at producing *plausible* falsehoods. Directly relevant to legal AI, where fluency
115
+ and legal vocabulary mask factual errors.
116
+
117
+ ```
118
+ P(truthful | fluent) ≠ P(truthful)
119
+ ```
120
+
121
+ - [Lin et al. (2021) — arXiv:2109.07958](https://arxiv.org/abs/2109.07958)
122
+
123
+ **Siren's Song in the AI Ocean: A Survey on Hallucination in LLMs**
124
+ Covers hallucination across the full model lifecycle — pretraining data bias, decoding
125
+ strategies, and RLHF alignment failures. Includes mitigation taxonomy with retrieval,
126
+ calibration, and post-hoc verification approaches.
127
+
128
+ - [Zhang et al. (2023) — arXiv:2309.01219](https://arxiv.org/abs/2309.01219)
129
+
130
+ ---
131
+
132
+ ### Legal AI Benchmarking
133
+
134
+ **LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning**
135
+ 264 tasks spanning statutory reasoning, contract interpretation, and rule application
136
+ assembled by 40+ legal professionals. Establishes baseline performance gaps between
137
+ general-purpose LLMs and legally reliable reasoning. Directly motivates hallucination
138
+ detection as a required layer over any legal AI system.
139
+
140
+ - [Guha et al. (2023) arXiv:2308.11462](https://arxiv.org/abs/2308.11462)
141
+
142
+ **CUAD: An Expert-Annotated NLP Dataset for Legal Contract Understanding**
143
+ 510 commercial contracts annotated by legal experts across 41 clause categories. The
144
+ standard benchmark for contract clause extraction and understanding — the task this
145
+ tool's SGI scoring is designed to protect.
146
+
147
+ - [Hendrycks et al. (2021) — arXiv:2103.06268](https://arxiv.org/abs/2103.06268)
148
+
149
+ ---
150
+
151
+ ### RAG Faithfulness
152
+
153
+ **Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks**
154
+ The foundational RAG paper. Defines the architecture that SGI is designed to audit:
155
+ a retriever *p_η(z|x)* selects documents *z* given query *x*, and a generator
156
+ *p_θ(y|x,z)* conditions on both. SGI detects when the generator fails to condition
157
+ on *z* the core faithfulness failure in document-grounded legal AI.
158
+
159
+ ```
160
+ p(y|x) = Σ_z p_η(z|x) · p_θ(y|x,z)
161
+ ```
162
+
163
+ - [Lewis et al. (2020) arXiv:2005.11401](https://arxiv.org/abs/2005.11401)
164
+
165
+ ---
166
+
167
+ ### Case Law Context
168
+
169
+ ***Mata v. Avianca*, No. 22-cv-1461 (S.D.N.Y. 2023)**
170
+ Attorneys submitted a brief citing six fabricated case citations generated by ChatGPT.
171
+ The court imposed sanctions. Every cited case — including purported holdings and quotations
172
+ — was a hallucination. This is the canonical real-world example of *extrinsic hallucination*
173
+ in a legal context: the model produced fluent, jurisdiction-appropriate, entirely fictional
174
+ legal authority.
175
+
176
+ This case motivates the core design principle of this tool: hallucination detection must
177
+ run *before* any AI-generated legal content is relied upon, not after.
178
+
179
+
180
+
181
+ ## Dashboard
182
+
183
+ [cert-framework.com](https://cert-framework.com)
184
+
185
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference