File size: 1,315 Bytes
6513a89
80ebf7c
 
6513a89
80ebf7c
6513a89
7899a37
6513a89
80ebf7c
 
 
6513a89
 
80ebf7c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
title: Spot the AI Receipt
emoji: 🧾
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: "3.12"
app_file: app.py
pinned: true
short_description: Can you spot the AI-generated receipt?
license: cc-by-nc-sa-4.0
---

# Spot the AI Receipt 🧾

An interactive 2AFC (two-alternative forced choice) game built by **[Scam.AI](https://www.scam.ai)**.

Each round shows two receipts side by side:
- One is an **authentic** receipt from the public [CORD-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset
- One is **fully AI-synthesized** (GPT-4o generates the text, GPT-Image-1 renders the image) from our [GPT4o-Receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt) benchmark

Pick the AI one. After 10 rounds you'll see your accuracy vs. the human and LLM baselines reported in our paper:

> *Zhang, Ren, et al. — "GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics" (arXiv:2603.11442)*

**Key finding:** humans rate AI receipts as visually distinct from real ones (1.87/5 gap) yet only achieve **F1 = 0.852** binary detection — well below LLMs like Claude Sonnet 4 (**F1 = 0.975**). The forensic signal is in **arithmetic incoherence** that humans rarely audit but LLMs verify trivially.

Production-grade detection: [scam.ai](https://www.scam.ai).