Spaces:

tonysodano
/

Hallucination_Detection_for_Legal_LLM_Input_Output-CERT_vs_HHEM

Sleeping

App Files Files Community

tonysodano commited on 11 days ago

Commit

de377ae

verified ·

1 Parent(s): 6c8c167

Update README.md

Browse files

Files changed (1) hide show

README.md +156 -822

README.md CHANGED Viewed

@@ -1,851 +1,185 @@
 ---
 title: Hallucination Detection For Legal LLM Input Output-CERT Vs HHEM
-emoji: 📚
-colorFrom: blue
-colorTo: red
 sdk: gradio
 sdk_version: 6.13.0
 app_file: app.py
 pinned: true
-short_description: Detection of LLM hallucinations in legal AI outputs.
 ---
-"""
-Legal Hallucination Detection — Live LLM Generation + CERT + HHEM-2.1-Open
-Workflow:
-  1. User provides a question and (optionally) a source legal document.
-  2. A selected LLM generates an answer via HF Inference API.
-  3. CERT (SGI or DGI) and HHEM-2.1-Open score the generated answer.
-  4. Both scores and a verdict are displayed alongside the generated text.
-SGI / DGI from arXiv:2512.13771 and arXiv:2602.13224.
-HHEM-2.1-Open: fine-tuned flan-T5 classifier (Vectara).
-Environment variable:
-  HF_TOKEN — required for gated models (Llama 3, etc.).
-              Set in Space Settings → Repository secrets.
-              Free-tier models work without a token.
-DISCLAIMER: This tool detects statistical patterns that correlate with
-hallucination. It does not verify case citations, confirm statute numbers,
-or validate contract terms against any legal database. A "Grounded" result
-means the response is semantically consistent with the source document —
-not that it is legally accurate. Do not use output as legal advice.
-"""
-import logging
-import os
-import time
-import numpy as np
-import gradio as gr
-from sentence_transformers import SentenceTransformer
-from huggingface_hub import InferenceClient
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-# ─────────────────────────────────────────────────────────────────────────────
-# MODELS AVAILABLE IN THE DROPDOWN
-#
-# Tier column is informational only — displayed in the UI.
-# "free"  = no token required, accessible on free HF accounts
-# "pro"   = requires HF Pro subscription or valid HF_TOKEN with access
-# "nvidia"= NVIDIA NIM endpoint via HF Inference API (Pro)
-# ─────────────────────────────────────────────────────────────────────────────
-MODEL_CATALOG = [
-    # ── Free tier ─────────────────────────────────────────────────────────────
-    {
-        "label": "Mistral 7B Instruct v0.3  [free]",
-        "id": "mistralai/Mistral-7B-Instruct-v0.3",
-        "tier": "free",
-    },
-    {
-        "label": "Zephyr 7B Beta  [free]",
-        "id": "HuggingFaceH4/zephyr-7b-beta",
-        "tier": "free",
-    },
-    {
-        "label": "Qwen 2.5 7B Instruct  [free]",
-        "id": "Qwen/Qwen2.5-7B-Instruct",
-        "tier": "free",
-    },
-    # ── Pro tier ──────────────────────────────────────────────────────────────
-    {
-        "label": "Llama 3.1 8B Instruct  [pro]",
-        "id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
-        "tier": "pro",
-    },
-    {
-        "label": "Llama 3.1 70B Instruct  [pro]",
-        "id": "meta-llama/Meta-Llama-3.1-70B-Instruct",
-        "tier": "pro",
-    },
-    {
-        "label": "Mixtral 8x7B Instruct  [pro]",
-        "id": "mistralai/Mixtral-8x7B-Instruct-v0.1",
-        "tier": "pro",
-    },
-    {
-        "label": "Qwen 2.5 72B Instruct  [pro]",
-        "id": "Qwen/Qwen2.5-72B-Instruct",
-        "tier": "pro",
-    },
-    {
-        "label": "Mistral Large 2411  [pro]",
-        "id": "mistralai/Mistral-Large-Instruct-2411",
-        "tier": "pro",
-    },
-    # ── NVIDIA NIM (Pro) ──────────────────────────────────────────────────────
-    {
-        "label": "NVIDIA Llama 3.1 Nemotron 70B  [nvidia / pro]",
-        "id": "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",
-        "tier": "nvidia",
-    },
-]
-MODEL_CHOICES = [m["label"] for m in MODEL_CATALOG]
-MODEL_ID_MAP = {m["label"]: m["id"] for m in MODEL_CATALOG}
-DEFAULT_MODEL = MODEL_CHOICES[0]
-# ─────────────────────────────────────────────────────────────────────────────
-# INFERENCE CLIENT — disabled, reserved for future HF Pro upgrade
-#
-# To re-enable live LLM generation:
-#   1. Upgrade to HF Pro at huggingface.co/pricing
-#   2. Create an HF token at huggingface.co/settings/tokens
-#      (Read scope + "Make calls to Inference Providers" permission)
-#   3. Add it to Space Settings → Repository secrets as HF_TOKEN
-#   4. Uncomment the four lines below
-#   5. In the UI section at the bottom, wire gen_btn to
-#      generate_and_evaluate_via_api() instead of generate_from_scenarios()
-# ─────────────────────────────────────────────────────────────────────────────
-# _HF_TOKEN = os.environ.get("HF_TOKEN")
-# _client = InferenceClient(token=_HF_TOKEN)
-# _client_nvidia = InferenceClient(provider="nvidia", token=_HF_TOKEN)
-# MODEL_TIER_MAP = {m["label"]: m["tier"] for m in MODEL_CATALOG}
-# ─────────────────────────────────────────────────────────────────────────────
-# SYSTEM PROMPTS
-# Two variants: with source doc (strict RAG) and without (general legal).
-# Both instruct the model to be precise and avoid adding outside information.
-# ─────────────────────────────────────────────────────────────────────────────
-_SYSTEM_WITH_CONTEXT = """You are a precise legal AI assistant.
-Answer the user's question using ONLY the provided legal document or contract excerpt.
-Do not add any information, clauses, obligations, rights, amounts, or legal rules
-that are not explicitly stated in the source document.
-If the document does not address the question, say so directly.
-Cite the relevant section when possible. Be concise."""
-_SYSTEM_NO_CONTEXT = """You are a precise legal AI assistant.
-Answer the user's question accurately based on established law.
-Cite specific statutes, rules, or legal standards where applicable.
-Be concise and accurate. Do not invent case names, statute numbers,
-regulatory requirements, or legal obligations that do not exist."""
-# ─────────────────────────────────────────────────────────────────────────────
-# EMBEDDING MODEL — shared by SGI and DGI
-# ─────────────────────────────────────────────────────────────────────────────
-logger.info("Loading embedding model (all-MiniLM-L6-v2)...")
-_encoder = SentenceTransformer("all-MiniLM-L6-v2")
-logger.info("Embedding model loaded.")
-# ─────────────────────────────────────────────────────────────────────────────
-# DGI REFERENCE DIRECTION — legal domain grounded pairs
-# Calibrated across 8 core legal domains. AUROC ~0.76 generic;
-# domain-specific calibration reaches 0.90+. See arXiv:2602.13224.
-# ─────────────────────────────────────────────────────────────────────────────
-_REFERENCE_PAIRS = [
-    (
-        "What is required to form a binding contract?",
-        "A binding contract requires offer, acceptance, consideration, "
-        "mutual assent, and legal capacity of the parties. Without all "
-        "elements, the agreement may be unenforceable.",
-    ),
-    (
-        "What must a plaintiff prove in a negligence claim?",
-        "A negligence plaintiff must establish duty, breach of that duty, "
-        "actual and proximate causation, and damages. Failure to prove any "
-        "element defeats the claim.",
-    ),
-    (
-        "What is the plain meaning rule in statutory interpretation?",
-        "The plain meaning rule requires courts to apply the ordinary "
-        "meaning of statutory text when the language is unambiguous, "
-        "without looking to legislative history or extrinsic sources.",
-    ),
-    (
-        "What is hearsay under the Federal Rules of Evidence?",
-        "Hearsay is an out-of-court statement offered to prove the truth "
-        "of the matter asserted. FRE 801 defines it, and FRE 802 makes "
-        "it generally inadmissible absent a recognized exception.",
-    ),
-    (
-        "When does the Fourth Amendment protect against government searches?",
-        "The Fourth Amendment protects against unreasonable searches and "
-        "seizures where the person has a reasonable expectation of privacy. "
-        "Warrantless searches are presumptively unconstitutional absent an "
-        "established exception such as exigent circumstances or consent.",
-    ),
-    (
-        "What rights does the CCPA grant California consumers?",
-        "The CCPA grants California consumers the right to know what "
-        "personal information is collected, the right to delete it, the "
-        "right to opt out of its sale, and the right to non-discrimination "
-        "for exercising those rights.",
-    ),
-    (
-        "What qualifies as a trade secret under the DTSA?",
-        "Under the Defend Trade Secrets Act, a trade secret is information "
-        "that derives independent economic value from not being generally "
-        "known, and for which the owner has taken reasonable measures to "
-        "maintain its secrecy.",
-    ),
-    (
-        "When is a liquidated damages clause enforceable?",
-        "A liquidated damages clause is enforceable when actual damages "
-        "would be difficult to estimate at the time of contracting and the "
-        "stipulated amount is a reasonable forecast of compensatory damages, "
-        "not a penalty.",
-    ),
-]
-logger.info("Computing DGI reference direction from %d legal grounded pairs...", len(_REFERENCE_PAIRS))
-_all_texts = []
-for q, r in _REFERENCE_PAIRS:
-    _all_texts.extend([q, r])
-_all_embs = _encoder.encode(_all_texts, convert_to_numpy=True, normalize_embeddings=False)
-_displacements = []
-for i in range(len(_REFERENCE_PAIRS)):
-    q_emb = _all_embs[i * 2]
-    r_emb = _all_embs[i * 2 + 1]
-    delta = r_emb - q_emb
-    norm = np.linalg.norm(delta)
-    if norm > 1e-8:
-        _displacements.append(delta / norm)
-_mu = np.mean(_displacements, axis=0)
-_mu_norm = np.linalg.norm(_mu)
-_mu_hat = _mu / _mu_norm if _mu_norm > 1e-8 else _mu
-logger.info("DGI reference direction computed (dims=%d, concentration=%.4f).", _mu_hat.shape[0], float(_mu_norm))
-# ─────────────────────────────────────────────────────────────────────────────
-# HHEM-2.1-Open
-# ─────────────────────────────────────────────────────────────────────────────
-logger.info("Loading HHEM-2.1-Open...")
-from transformers import AutoModelForSequenceClassification
-_hhem = AutoModelForSequenceClassification.from_pretrained(
-    "vectara/hallucination_evaluation_model",
-    trust_remote_code=True,
-)
-logger.info("HHEM loaded.")
-# ─────────────────────────────────────────────────────────────────────────────
-# SGI — Semantic Grounding Index (arXiv:2512.13771)
-# SGI = dist(response, question) / dist(response, context)
-# ─────────────────────────────────────────────────────────────────────────────
-SGI_FLAG_THRESHOLD = 0.95
-SGI_STRONG_PASS = 1.20
-def compute_sgi(question: str, context: str, response: str) -> dict:
-    embeddings = _encoder.encode(
-        [question, context, response],
-        convert_to_numpy=True,
-        normalize_embeddings=False,
-    )
-    q_emb, ctx_emb, resp_emb = embeddings
-    q_dist = float(np.linalg.norm(resp_emb - q_emb))
-    ctx_dist = float(np.linalg.norm(resp_emb - ctx_emb))
-    if ctx_dist < 1e-8:
-        return {"score": 10.0, "flag": False, "degenerate": True}
-    if q_dist < 1e-8:
-        return {"score": 0.0, "flag": True, "degenerate": True}
-    sgi = q_dist / ctx_dist
-    return {
-        "score": round(sgi, 4),
-        "flag": sgi < SGI_FLAG_THRESHOLD,
-        "q_dist": round(q_dist, 4),
-        "ctx_dist": round(ctx_dist, 4),
-        "degenerate": False,
-    }
-# ─────────────────────────────────────────────────────────────────────────────
-# DGI — Directional Grounding Index (arXiv:2602.13224)
-# DGI = dot(normalize(phi(r) - phi(q)), mu_hat)
-# ─────────────────────────────────────────────────────────────────────────────
-DGI_FLAG_THRESHOLD = 0.30
-def compute_dgi(question: str, response: str) -> dict:
-    embeddings = _encoder.encode(
-        [question, response],
-        convert_to_numpy=True,
-        normalize_embeddings=False,
-    )
-    q_emb, r_emb = embeddings
-    delta = r_emb - q_emb
-    magnitude = float(np.linalg.norm(delta))
-    if magnitude < 1e-8:
-        return {"score": 0.0, "flag": True, "degenerate": True}
-    delta_hat = delta / magnitude
-    gamma = float(np.dot(delta_hat, _mu_hat))
-    if np.isnan(gamma):
-        return {"score": 0.0, "flag": True, "degenerate": True}
-    return {
-        "score": round(gamma, 4),
-        "flag": gamma < DGI_FLAG_THRESHOLD,
-        "magnitude": round(magnitude, 4),
-        "degenerate": False,
-    }
-# ─────────────────────────────────────────────────────────────────────────────
-# SCORING WRAPPERS
-# ─────────────────────────────────────────────────────────────────────────────
-def score_cert(question: str, response: str, context: str) -> dict:
-    start = time.perf_counter()
-    has_context = bool(context.strip())
-    if has_context:
-        result = compute_sgi(question, context, response)
-        method = "SGI"
-    else:
-        result = compute_dgi(question, response)
-        method = "DGI"
-    return {
-        "method": method,
-        "raw_score": result["score"],
-        "grounded": not result["flag"],
-        "threshold": SGI_FLAG_THRESHOLD if method == "SGI" else DGI_FLAG_THRESHOLD,
-        "elapsed_ms": round((time.perf_counter() - start) * 1000, 1),
-    }
-def score_hhem(question: str, response: str, context: str) -> dict:
-    has_context = bool(context.strip())
-    premise = f"{context.strip()}\n\n{question}".strip() if has_context else question
-    if len(premise) > 1800:
-        premise = premise[:1800]
-    start = time.perf_counter()
-    scores = _hhem.predict([(premise, response)])
-    raw_score = float(scores[0])
-    return {
-        "method": "HHEM-2.1-Open",
-        "raw_score": round(raw_score, 4),
-        "grounded": raw_score >= 0.5,
-        "elapsed_ms": round((time.perf_counter() - start) * 1000, 1),
-        "label": "consistent" if raw_score >= 0.5 else "hallucinated",
-    }
-# ─────────────────────────────────────────────────────────────────────────────
-# LLM GENERATION — calls HF Inference API
-# ─────────────────────────────────────────────────────────────────────────────
-def generate_via_api(question: str, context: str, model_label: str) -> tuple[str, str]:
-    """
-    Call the selected model via HF Inference API.
-    Returns (generated_text, error_message).
-    error_message is empty string on success.
-    """
-    if not question.strip():
-        return "", "Please enter a question before generating."
-    model_id = MODEL_ID_MAP.get(model_label)
-    if not model_id:
-        return "", f"Unknown model: {model_label}"
-    has_context = bool(context.strip())
-    system_prompt = _SYSTEM_WITH_CONTEXT if has_context else _SYSTEM_NO_CONTEXT
-    user_content = question.strip()
-    if has_context:
-        user_content = f"Source document:\n{context.strip()}\n\nQuestion: {question.strip()}"
-    messages = [
-        {"role": "system", "content": system_prompt},
-        {"role": "user", "content": user_content},
-    ]
-    try:
-        tier = MODEL_TIER_MAP.get(model_label, "free")
-        client = _client_nvidia if tier == "nvidia" else _client
-        logger.info("Calling model: %s (tier: %s)", model_id, tier)
-        start = time.perf_counter()
-        completion = client.chat_completion(
-            model=model_id,
-            messages=messages,
-            max_tokens=512,
-            temperature=0.1,
-        )
-        elapsed = round((time.perf_counter() - start) * 1000)
-        text = completion.choices[0].message.content.strip()
-        logger.info("Generation complete in %d ms (%d chars)", elapsed, len(text))
-        return text, ""
-    except Exception as exc:
-        logger.error("Generation failed: %s", exc)
-        err = str(exc)
-        # Surface actionable errors
-        if "401" in err or "unauthorized" in err.lower():
-            return "", (
-                "❌ Authentication error — this model requires a valid HF_TOKEN. "
-                "Set it in Space Settings → Repository secrets, or choose a free-tier model."
-            )
-        if "403" in err or "gated" in err.lower():
-            return "", (
-                "❌ Access denied — this is a gated model. "
-                "Request access on the model page, then add your HF_TOKEN to Space secrets."
-            )
-        if "429" in err or "rate" in err.lower():
-            return "", (
-                "❌ Rate limit hit — try again in a moment, "
-                "or upgrade to HF Pro for higher limits."
-            )
-        return "", f"❌ Generation failed: {err}"
-# ���────────────────────────────────────────────────────────────────────────────
-# GENERATE + EVALUATE — single button action
-# ─────────────────────────────────────────────────────────────────────────────
-def generate_and_evaluate_via_api(
-    question: str, context: str, model_label: str
-) -> tuple[str, str, str, str]:
-    """
-    Generate an LLM answer, then score it.
-    Returns: (generated_response, cert_md, hhem_md, agreement_md)
-    """
-    generated, err = generate_via_api(question, context, model_label)
-    if err:
-        return "", err, "", ""
-    cert_md, hhem_md, agreement_md = evaluate_only(question, context, generated)
-    return generated, cert_md, hhem_md, agreement_md
-# ─────────────────────────────────────────────────────────────────────────────
-# SCENARIO LIBRARY — curated correct + hallucinated response pairs
-#
-# Used by generate_from_scenarios() — the active generate path.
-# No external API required. Each scenario has a correct response and a
-# hallucinated response. The radio toggle in the UI selects which to score.
-#
-# To add scenarios: copy a block, update question/context/correct/hallucinated.
-# ─────────────────────────────────────────────────────────────────────────────
-_SCENARIOS = [
-    {
-        "label": "NDA — what is protected",
-        "question": "What information is protected by this NDA?",
-        "context": (
-            "Section 2 — Confidential Information: Confidential Information "
-            "means all non-public technical, financial, and business information "
-            "disclosed by either party. It does not include information that is "
-            "already publicly available, independently developed by the receiving "
-            "party, or received from a third party without restriction."
-        ),
-        "correct": (
-            "Under Section 2, the NDA protects non-public technical, financial, "
-            "and business information shared by either party. It excludes "
-            "information that is already public, independently developed, or "
-            "received from a third party without restriction."
-        ),
-        "hallucinated": (
-            "The NDA protects all technical, financial, and business information "
-            "for a period of 5 years. After 5 years the information is no longer "
-            "confidential and can be freely disclosed."
-        ),
-    },
-    {
-        "label": "Employment — termination without notice",
-        "question": "Can my employer fire me without warning?",
-        "context": (
-            "Section 7 — Termination: Either party may terminate this agreement "
-            "with 30 days written notice. The Company may terminate immediately, "
-            "without notice, only in cases of gross misconduct, material breach, "
-            "or criminal conviction."
-        ),
-        "correct": (
-            "Under Section 7, your employer can only fire you without notice for "
-            "gross misconduct, material breach, or criminal conviction. All other "
-            "terminations require 30 days written notice."
-        ),
-        "hallucinated": (
-            "Yes, your employer can fire you immediately for any reason, but they "
-            "must pay 3 months severance whenever they terminate without notice, "
-            "as required under California Labor Code Section 1400."
-        ),
-    },
-    {
-        "label": "SaaS contract — liability cap",
-        "question": "How much can I recover if the software fails and causes me losses?",
-        "context": (
-            "Section 11 — Limitation of Liability: Each party's total liability "
-            "shall not exceed the fees paid by Customer in the 12 months preceding "
-            "the claim. Neither party is liable for indirect, incidental, special, "
-            "or consequential damages under any theory of liability."
-        ),
-        "correct": (
-            "Your maximum recovery is capped at whatever you paid for the software "
-            "in the 12 months before your claim. You cannot recover for indirect "
-            "or consequential losses — only direct damages up to that cap."
-        ),
-        "hallucinated": (
-            "Recovery is capped at 12 months of fees for most claims, but data "
-            "breaches and gross negligence are uncapped under standard software "
-            "contract law — you can recover full damages in those cases."
-        ),
-    },
-    {
-        "label": "At-will employment — California (no source doc)",
-        "question": "What does at-will employment mean in California?",
-        "context": "",
-        "correct": (
-            "At-will employment means either the employer or employee can end the "
-            "job at any time, for any legal reason or no reason, without owing "
-            "advance notice or severance. The main limits are anti-discrimination "
-            "laws — you cannot be fired for race, gender, disability, or other "
-            "protected characteristics."
-        ),
-        "hallucinated": (
-            "At-will employment means the employer can fire you at any time, but "
-            "California law requires a written explanation within 10 business days "
-            "and a minimum of 2 weeks severance under the California WARN Act "
-            "regardless of company size."
-        ),
-    },
-    {
-        "label": "Preliminary injunction standard (no source doc)",
-        "question": "What must I prove to get a preliminary injunction in federal court?",
-        "context": "",
-        "correct": (
-            "Under Winter v. Natural Resources Defense Council, you must show: "
-            "(1) likely success on the merits, (2) likely irreparable harm absent "
-            "relief, (3) that the balance of equities tips in your favor, and "
-            "(4) that an injunction serves the public interest. All four factors "
-            "must be satisfied."
-        ),
-        "hallucinated": (
-            "Under Johnson v. United States (2019), federal courts apply a "
-            "two-factor test: you need only show hardship and a colorable claim "
-            "on the merits. The public interest factor was eliminated by the "
-            "Supreme Court in 2018."
-        ),
-    },
-]
-_SCENARIO_MAP = {s["label"]: s for s in _SCENARIOS}
-SCENARIO_LABELS = [s["label"] for s in _SCENARIOS]
-def generate_from_scenarios(
-    label: str, response_type: str
-) -> tuple[str, str, str, str, str, str]:
-    """
-    Active generate path — no API required.
-    Selects the pre-written correct or hallucinated response for the chosen
-    scenario, fills the input boxes, and scores immediately.
-    response_type: "Correct answer" | "Hallucinated answer"
-    Returns: (question, context, response, cert_md, hhem_md, agreement_md)
-    # ── Future upgrade: live LLM generation ──────────────────────────────────
-    # When upgrading to HF Pro:
-    #   1. Set HF_TOKEN in Space Settings → Repository secrets
-    #   2. Uncomment the InferenceClient lines in the INFERENCE CLIENT block above
-    #   3. In the UI section below, swap gen_btn.click() to call
-    #      generate_and_evaluate_via_api() instead of generate_from_scenarios()
-    # ─────────────────────────────────────────────────────────────────────────
-    """
-    s = _SCENARIO_MAP.get(label)
-    if not s:
-        return "", "", "Select a scenario first.", "", "", ""
-    use_hallucinated = "Hallucinated" in response_type
-    response = s["hallucinated"] if use_hallucinated else s["correct"]
-    question = s["question"]
-    context = s["context"]
-    cert_md, hhem_md, agreement_md = evaluate_only(question, context, response)
-    return question, context, response, cert_md, hhem_md, agreement_md
-# ─────────────────────────────────────────────────────────────────────────────
-# EVALUATE ONLY — score a manually pasted response
-# ─────────────────────────────────────────────────────────────────────────────
-def evaluate_only(
-    question: str, context: str, response: str
-) -> tuple[str, str, str]:
-    """Score a response that is already in the text box."""
-    if not question.strip():
-        return "Please enter a question.", "", ""
-    if not response.strip():
-        return "Please enter or generate an AI response to evaluate.", "", ""
-    cert = score_cert(question, response, context)
-    hhem = score_hhem(question, response, context)
-    cert_verdict = "🟢 Grounded" if cert["grounded"] else "🔴 Hallucination detected"
-    mode_note = (
-        "*Checked whether the response moved toward your source document or away from it.*"
-        if cert["method"] == "SGI"
-        else "*Checked whether the response follows verified legal reasoning patterns.*"
-    )
-    cert_md = f"""**{cert_verdict}**
-| | |
-|---|---|
-| Method | `{cert["method"]}` |
-| Score | `{cert["raw_score"]}` |
-| Threshold | `{cert["threshold"]}` |
-| Latency | `{cert["elapsed_ms"]} ms` |
-{mode_note}"""
-    hhem_verdict = "🟢 Grounded" if hhem["grounded"] else "🔴 Hallucination detected"
-    hhem_md = f"""**{hhem_verdict}**
-| | |
-|---|---|
-| Method | `{hhem["method"]}` |
-| Score | `{hhem["raw_score"]}` |
-| Label | `{hhem["label"]}` |
-| Latency | `{hhem["elapsed_ms"]} ms` |
-*Reads source + response and checks for contradiction.*"""
-    agree = cert["grounded"] == hhem["grounded"]
-    if agree and cert["grounded"]:
-        agreement_md = "🔵 **Both methods agree — response appears grounded.**"
-    elif agree and not cert["grounded"]:
-        agreement_md = "🔵 **Both methods agree — hallucination likely. Verify before use.**"
-    else:
-        agreement_md = """🟠 **Methods disagree — manual review recommended.**
-The geometry check says the response is in the right topic area.
-The classifier disagrees. This usually means the response *sounds* legally
-correct but gets a specific fact wrong: an invented clause, wrong dollar
-amount, fabricated case name, or statute that doesn't exist.
-Verify manually before relying on this answer."""
-    return cert_md, hhem_md, agreement_md
-# ─────────────────────────────────────────────────────────────────────────────
-# EXAMPLES — 8 plain-language legal scenarios
-# Odd rows = correct answers. Even rows = hallucinated versions.
-# ─────────────────────────────────────────────────────────────────────────────
-EXAMPLES = [
-    ["What information is protected by this NDA?",
-     "Section 2 — Confidential Information: 'Confidential Information' means all non-public technical, financial, and business information disclosed by either party. It does not include information that is already publicly available, independently developed by the receiving party, or received from a third party without restriction.",
-     "Under Section 2, the NDA protects non-public technical, financial, and business information shared by either party. It excludes information that is already public, independently developed, or received from a third party without restriction."],
-    ["What information is protected by this NDA?",
-     "Section 2 — Confidential Information: 'Confidential Information' means all non-public technical, financial, and business information disclosed by either party. It does not include information that is already publicly available, independently developed by the receiving party, or received from a third party without restriction.",
-     "The NDA protects all technical, financial, and business information for a period of 5 years. After 5 years the information is no longer confidential and can be freely disclosed."],
-    ["Can my employer fire me without warning?",
-     "Section 7 — Termination: Either party may terminate this agreement with 30 days written notice. The Company may terminate immediately, without notice, only in cases of gross misconduct, material breach, or criminal conviction.",
-     "Under Section 7, your employer can only fire you without notice for gross misconduct, material breach, or criminal conviction. All other terminations require 30 days written notice."],
-    ["Can my employer fire me without warning?",
-     "Section 7 — Termination: Either party may terminate this agreement with 30 days written notice. The Company may terminate immediately, without notice, only in cases of gross misconduct, material breach, or criminal conviction.",
-     "Yes, your employer can fire you immediately for any reason but must pay 3 months severance whenever they terminate without notice, as required under California Labor Code Section 1400."],
-    ["How much can I recover if the software fails and causes me losses?",
-     "Section 11 — Limitation of Liability: Each party's total liability shall not exceed the fees paid by Customer in the 12 months preceding the claim. Neither party is liable for indirect, incidental, special, or consequential damages under any theory of liability.",
-     "Your maximum recovery is capped at whatever you paid for the software in the 12 months before your claim. You cannot recover for indirect or consequential losses — only direct damages up to that cap."],
-    ["How much can I recover if the software fails and causes me losses?",
-     "Section 11 — Limitation of Liability: Each party's total liability shall not exceed the fees paid by Customer in the 12 months preceding the claim. Neither party is liable for indirect, incidental, special, or consequential damages under any theory of liability.",
-     "Recovery is capped at 12 months of fees for most claims, but data breaches and gross negligence are uncapped under standard software contract law — you can recover full damages in those cases."],
-    ["What does at-will employment mean in California?",
-     "",
-     "At-will employment means either the employer or employee can end the job at any time, for any legal reason or no reason, without owing advance notice or severance. The main limits are anti-discrimination laws — you cannot be fired for race, gender, disability, or other protected characteristics."],
-    ["What does at-will employment mean in California?",
-     "",
-     "At-will employment means the employer can fire you at any time, but California law requires a written explanation within 10 business days and a minimum of 2 weeks severance under the California WARN Act regardless of company size."],
-]
-# ─────────────────────────────────────────────────────────────────────────────
-# UI
-# ─────────────────────────────────────────────────────────────────────────────
-_DISCLAIMER = """> ⚠️ **Research tool — not legal advice.**
-> This tool detects statistical patterns that *correlate* with hallucination.
-> It does **not** verify case citations, confirm statute numbers, or validate contract
-> terms against any authoritative legal database. A **"Grounded"** result means the
-> response is semantically consistent with your source — not that it is legally correct.
-> Always verify AI-generated legal analysis with a qualified attorney before acting on it."""
-_HOW_IT_WORKS = """---
-### How it works
-1. **Select a scenario** from the dropdown — or paste your own contract clause, statute, or case excerpt into the source document field.
-2. **Toggle the response type** — correct or hallucinated — and click Generate & Evaluate.
-3. **Or paste any AI response manually** and click Evaluate to score it directly.
----
-### Detection methods
-Two independent detectors run on every evaluation and must both be considered together.
-| Detector | Method | Speed |
-|---|---|---|
-| **CERT** (geometry) | Measures whether the response moved toward the source document in embedding space, or drifted away from it | ~5–50 ms |
-| **HHEM** (classifier) | Reads source and response as text and checks for semantic contradiction | ~100–200 ms |
-**When both agree**, confidence is high in either direction.
-**When they disagree**, the response is geometrically in the correct topic region — it uses the right legal vocabulary in the right context — but likely contains a specific factual error: a fabricated case citation, a clause term that was never in the contract, a statute number that does not exist. This is what the research literature classifies as a *Type III hallucination*: factually wrong within a semantically correct frame. It is the most dangerous failure mode in legal AI and the hardest to catch automatically. Treat any disagreement as a flag for manual review.
 ---
-### Why geometry detects hallucination
-LLM responses exist as vectors in a high-dimensional embedding space φ: *T* → ℝᵈ. A response that genuinely engages with a source document — a contract clause, a statute, a case holding — will be geometrically displaced toward that document's representation. A hallucinated response tends to remain anchored near the original question rather than moving toward the source.
-**Semantic Grounding Index (SGI)** quantifies this as a distance ratio:
 ```
 SGI(q, c, r) = ‖φ(r) − φ(q)‖ / ‖φ(r) − φ(c)‖
 ```
-where *q* is the query, *c* is the source document, and *r* is the LLM response. A grounded response satisfies SGI ≥ 0.95 — it moved closer to the source than to the question. No trained classifier required. One embedding call, one ratio.
-**Directional Grounding Index (DGI)** applies when no source document is present. It computes the displacement vector Δ = φ(r) − φ(q) and measures its alignment with μ̂ — the mean displacement direction of verified correct legal answers across eight calibrated domains:
 ```
 DGI(q, r) = (Δ / ‖Δ‖) · μ̂
 ```
-A score below 0.30 indicates the response trajectory is anomalous relative to verified legal reasoning patterns — a geometric signal of confabulation even without a reference document to compare against.
-This geometric layer is a fast, model-agnostic first-pass filter. It catches *where* the response went in the embedding space. HHEM's learned classifier catches *what* the response says relative to the source. The two signals are orthogonal — running both is the point.
----"""
-with gr.Blocks(
-    title="Legal Hallucination Detection",
-    theme=gr.themes.Soft(primary_hue="purple", secondary_hue="teal"),
-) as demo:
-    gr.Markdown("# Legal Hallucination Detection\n### Hallucination scoring for contract review and legal research")
-    gr.Markdown(_DISCLAIMER)
-    gr.Markdown(_HOW_IT_WORKS)
-    # ── Scenario selector row ─────────────────────────────────────────────────
-    # Pick a scenario → choose correct or hallucinated → Generate & Evaluate.
-    # The question and source doc fill automatically.
-    with gr.Row():
-        scenario_dd = gr.Dropdown(
-            choices=SCENARIO_LABELS,
-            value=SCENARIO_LABELS[0],
-            label="Scenario",
-            info="Select a pre-built legal scenario to demo.",
-            scale=3,
-        )
-        response_type = gr.Radio(
-            choices=["Correct answer", "Hallucinated answer"],
-            value="Correct answer",
-            label="Response type",
-            info="Toggle to see how each version scores.",
-            scale=1,
-        )
-    gen_btn = gr.Button("⚡ Generate & Evaluate", variant="primary")
-    # ── Input boxes (auto-filled by scenario, also editable manually) ─────────
-    with gr.Row():
-        with gr.Column(scale=3):
-            q_in = gr.Textbox(
-                label="Question  (auto-filled by scenario — or type your own)",
-                placeholder="e.g.  Can the company terminate without notice?",
-                lines=2,
-            )
-            ctx_in = gr.Textbox(
-                label="Source document  (auto-filled by scenario — or paste your own contract clause, statute, or case excerpt)",
-                placeholder="e.g.  Section 7 — Termination: Either party may terminate with 30 days written notice...",
-                lines=5,
-            )
-    response_box = gr.Textbox(
-        label="AI response  (auto-filled on Generate — or paste any AI response and click Evaluate)",
-        placeholder="Generated answer will appear here — or paste any AI response to score it.",
-        lines=5,
-        interactive=True,
-    )
-    eval_btn = gr.Button("Evaluate pasted response", variant="secondary")
-    with gr.Row():
-        cert_out = gr.Markdown(label="CERT")
-        hhem_out = gr.Markdown(label="HHEM-2.1-Open")
-    agreement_out = gr.Markdown(label="Verdict")
-    gr.Markdown("""---
-*Geometry: [arXiv:2512.13771](https://arxiv.org/abs/2512.13771) · [arXiv:2602.13224](https://arxiv.org/abs/2602.13224) · [arXiv:2603.13259](https://arxiv.org/abs/2603.13259)*""")
-    # ── Button wiring ──────────────────────────────────────���──────────────────
-    #
-    # ACTIVE: generate_from_scenarios() — uses curated scenario library, no API.
-    #
-    # FUTURE (HF Pro upgrade): swap gen_btn.click() fn to generate_and_evaluate_via_api()
-    #   inputs=[q_in, ctx_in, model_dd], outputs=[response_box, cert_out, hhem_out, agreement_out]
-    #   Also add model_dd dropdown back to the UI (see MODEL_CHOICES / MODEL_CATALOG above).
-    gen_btn.click(
-        fn=generate_from_scenarios,
-        inputs=[scenario_dd, response_type],
-        outputs=[q_in, ctx_in, response_box, cert_out, hhem_out, agreement_out],
-    )
-    eval_btn.click(
-        fn=evaluate_only,
-        inputs=[q_in, ctx_in, response_box],
-        outputs=[cert_out, hhem_out, agreement_out],
-    )
-if __name__ == "__main__":
-    demo.launch()

 ---
 title: Hallucination Detection For Legal LLM Input Output-CERT Vs HHEM
+author: Anthony Sodano
+emoji: ⚖️
+colorFrom: purple
+colorTo: indigo
 sdk: gradio
 sdk_version: 6.13.0
 app_file: app.py
 pinned: true
+license: apache-2.0
+tags:
+- hallucination-detection
+- llm-evaluation
+- rag
+- grounding
+- legal-ai
+- contract-analysis
+- nlp
+- cert
+short_description: Detect LLM hallucinations in legal AI outputs.
 ---
+[![HF Space](https://img.shields.io/badge/🤗%20Space-CERT%20Demo-4FB3B3)](https://huggingface.co/spaces/tonysodano/Hallucination_Detection_for_Legal_LLM_Input_Output-CERT_vs_HHEMs)
+# CERT Hallucination Detection
+Detects LLM hallucinations using embedding geometry.
+Various benchmarks.
+## Methods compared
+**CERT SGI** (with context): ratio of distances on the embedding hypersphere —
+`dist(response, question) / dist(response, context)`. No model inference for
+the evaluation. One embedding call, one division.
+**CERT DGI** (without context): cosine similarity between the response
+displacement vector and the mean displacement of verified grounded pairs.
+**HHEM-2.1-Open** (Vectara): fine-tuned flan-T5 classifier. Full model
+inference per evaluation call.
+## When they disagree
+Disagreement surfaces **Type III hallucinations** — factual errors within
+a correct semantic frame. Embedding geometry cannot detect these: the
+response occupies the geometrically correct region of the space despite
+being factually wrong. HHEM's classifier may catch some of these cases.
+The two methods are orthogonal signals, not competing alternatives.
+## Research
+## Research & Theoretical Foundations
+This tool is grounded in three intersecting research domains: **geometric hallucination detection**,
+**legal AI benchmarking**, and **retrieval-augmented generation (RAG) faithfulness**. The methods
+implemented here — SGI and DGI — are direct implementations of peer-reviewed work. The legal
+framing addresses a documented, high-stakes failure mode in deployed AI systems.
 ---
+### Geometric Hallucination Detection (Core Methods)
+The CERT framework treats LLM outputs as vectors in a high-dimensional embedding space
+φ: *T* → ℝ^d and uses geometric properties of that space to detect grounding failures —
+without requiring a trained classifier or ground-truth labels.
+**Semantic Grounding Index (SGI)**
+Defined as the ratio of distances in embedding space:
 ```
 SGI(q, c, r) = ‖φ(r) − φ(q)‖ / ‖φ(r) − φ(c)‖
 ```
+where *q* is the query, *c* is the source context (e.g., contract clause), and *r* is the
+LLM response. A grounded response should satisfy SGI ≥ τ (threshold = 0.95), meaning the
+response moved geometrically closer to the context than to the question.
+- [Semantic Grounding Index: Geometric Bounds on Context Engagement in RAG Systems — arXiv:2512.13771](https://arxiv.org/abs/2512.13771)
+**Directional Grounding Index (DGI)**
+When no source document is available, DGI measures whether the displacement vector
+Δ = φ(r) − φ(q) aligns with the mean displacement direction μ̂ of verified grounded pairs:
 ```
 DGI(q, r) = (Δ / ‖Δ‖) · μ̂
 ```
+A score below 0.30 indicates the response trajectory is anomalous relative to verified
+correct legal reasoning patterns — a geometric signal of confabulation.
+- [A Geometric Taxonomy of Hallucinations in LLMs — arXiv:2602.13224](https://arxiv.org/abs/2602.13224)
+**Rotational Constraint Processing**
+Companion work explaining *why* transformer attention geometry produces these detectable
+displacement patterns — grounded responses exhibit measurable rotational alignment with
+factual constraint directions in the residual stream.
+- [How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing — arXiv:2603.13259](https://arxiv.org/abs/2603.13259)
+---
+### Hallucination — Foundational Literature
+**Survey of Hallucination in Natural Language Generation**
+The canonical taxonomy paper. Classifies hallucinations as *intrinsic* (contradicts source)
+vs. *extrinsic* (adds unverifiable content) — a distinction directly relevant to contract
+review, where both failure modes carry legal risk.
+- [Ji et al. (2022) — arXiv:2202.03629](https://arxiv.org/abs/2202.03629)
+**TruthfulQA: Measuring How Models Mimic Human Falsehoods**
+Benchmark demonstrating that larger models are not necessarily more truthful — they are
+better at producing *plausible* falsehoods. Directly relevant to legal AI, where fluency
+and legal vocabulary mask factual errors.
+```
+P(truthful | fluent) ≠ P(truthful)
+```
+- [Lin et al. (2021) — arXiv:2109.07958](https://arxiv.org/abs/2109.07958)
+**Siren's Song in the AI Ocean: A Survey on Hallucination in LLMs**
+Covers hallucination across the full model lifecycle — pretraining data bias, decoding
+strategies, and RLHF alignment failures. Includes mitigation taxonomy with retrieval,
+calibration, and post-hoc verification approaches.
+- [Zhang et al. (2023) — arXiv:2309.01219](https://arxiv.org/abs/2309.01219)
+---
+### Legal AI Benchmarking
+**LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning**
+264 tasks spanning statutory reasoning, contract interpretation, and rule application —
+assembled by 40+ legal professionals. Establishes baseline performance gaps between
+general-purpose LLMs and legally reliable reasoning. Directly motivates hallucination
+detection as a required layer over any legal AI system.
+- [Guha et al. (2023) — arXiv:2308.11462](https://arxiv.org/abs/2308.11462)
+**CUAD: An Expert-Annotated NLP Dataset for Legal Contract Understanding**
+510 commercial contracts annotated by legal experts across 41 clause categories. The
+standard benchmark for contract clause extraction and understanding — the task this
+tool's SGI scoring is designed to protect.
+- [Hendrycks et al. (2021) — arXiv:2103.06268](https://arxiv.org/abs/2103.06268)
+---
+### RAG Faithfulness
+**Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks**
+The foundational RAG paper. Defines the architecture that SGI is designed to audit:
+a retriever *p_η(z|x)* selects documents *z* given query *x*, and a generator
+*p_θ(y|x,z)* conditions on both. SGI detects when the generator fails to condition
+on *z* — the core faithfulness failure in document-grounded legal AI.
+```
+p(y|x) = Σ_z p_η(z|x) · p_θ(y|x,z)
+```
+- [Lewis et al. (2020) — arXiv:2005.11401](https://arxiv.org/abs/2005.11401)
+---
+### Case Law Context
+***Mata v. Avianca*, No. 22-cv-1461 (S.D.N.Y. 2023)**
+Attorneys submitted a brief citing six fabricated case citations generated by ChatGPT.
+The court imposed sanctions. Every cited case — including purported holdings and quotations
+— was a hallucination. This is the canonical real-world example of *extrinsic hallucination*
+in a legal context: the model produced fluent, jurisdiction-appropriate, entirely fictional
+legal authority.
+This case motivates the core design principle of this tool: hallucination detection must
+run *before* any AI-generated legal content is relied upon, not after.
+## Dashboard
+[cert-framework.com](https://cert-framework.com)
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference