Spaces:
Running
Running
| title: Spot the AI Receipt | |
| emoji: 🧾 | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| python_version: "3.12" | |
| app_file: app.py | |
| pinned: true | |
| short_description: Can you spot the AI-generated receipt? | |
| license: cc-by-nc-sa-4.0 | |
| # Spot the AI Receipt 🧾 | |
| An interactive 2AFC (two-alternative forced choice) game built by **[Scam.AI](https://www.scam.ai)**. | |
| Each round shows two receipts side by side: | |
| - One is an **authentic** receipt from the public [CORD-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset | |
| - One is **fully AI-synthesized** (GPT-4o generates the text, GPT-Image-1 renders the image) from our [GPT4o-Receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt) benchmark | |
| Pick the AI one. After 10 rounds you'll see your accuracy vs. the human and LLM baselines reported in our paper: | |
| > *Zhang, Ren, et al. — "GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics" (arXiv:2603.11442)* | |
| **Key finding:** humans rate AI receipts as visually distinct from real ones (1.87/5 gap) yet only achieve **F1 = 0.852** binary detection — well below LLMs like Claude Sonnet 4 (**F1 = 0.975**). The forensic signal is in **arithmetic incoherence** that humans rarely audit but LLMs verify trivially. | |
| Production-grade detection: [scam.ai](https://www.scam.ai). | |