Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: FactEval
emoji: π
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.29.0
python_version: '3.12'
app_file: app.py
pinned: false
license: mit
short_description: Find exactly which parts of your LLM output are hallucinated
π FactEval
Find exactly which parts of your LLM output are hallucinated.
Debug hallucinations in RAG and LLM pipelines with claim-level verification.
Paste an LLM-generated answer and reference contexts. FactEval highlights β supported, β contradicted, and β unverifiable claims with human-readable reasons and pipeline diagnostics.
How it works
- Claim Extraction β Breaks the answer into atomic claims (Qwen2.5-1.5B)
- Evidence Retrieval β Finds the most relevant sentences from your contexts (MiniLM + FAISS)
- NLI Verification β Checks each claim against evidence (DeBERTa-v3)
- Calibration β Produces trustworthy confidence scores (Isotonic Regression)
π¦ GitHub Repository