Spaces:

Scam-AI
/

README

Running

App Files Files Community

README / README.md

StephenSAI

Initial org card

e50ced5 verified 7 days ago

preview code

raw

history blame

5.64 kB

	---
	title: Scam.AI
	emoji: 🛡️
	colorFrom: blue
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# Scam.AI

	Detection systems for AI-driven fraud — deepfakes, document forgery, synthetic media, and adversarial attacks against identity verification.

	[![Website](https://img.shields.io/badge/scam.ai-Website-blue)](https://www.scam.ai)
	[![Research](https://img.shields.io/badge/Research-Publications-orange)](https://www.scam.ai/en/research)
	[![Datasets](https://img.shields.io/badge/Datasets-7%20open-green)](https://huggingface.co/Scam-AI)

	---

	## What We Do

	Scam.AI builds detection systems that protect identity-verification pipelines, financial-document workflows, and digital media ecosystems from the next generation of AI-driven fraud. Our research portfolio spans deepfake detection, document forgery forensics, AI-generated image attribution, age-estimation robustness, and behavioral-biometric verification — published at top venues (CVPR, arXiv) and released here as open benchmarks for the community.

	---

	## 🔬 Research Areas

	\| Area \| Focus \| Key Datasets \|
	\|------\|-------\|--------------\|
	\| 🎭 Deepfake Detection \| Real-world faceswap detection beyond academic benchmarks \| [RWFS](./datasets/Scam-AI/RWFS) \|
	\| 📄 Document Forgery \| AI-inpainted receipts, forms, and financial documents \| [AIForge-Doc-v2](./datasets/Scam-AI/AIForge-Doc-v2) · [AIForge-Doc-v1](./datasets/Scam-AI/AIForge-Doc-v1) · [gpt4o-receipt](./datasets/Scam-AI/gpt4o-receipt) \|
	\| 🖼️ AI-Generated Image Detection \| Self-reported AI-generated images in the wild \| [gpt-image-2](./datasets/Scam-AI/gpt-image-2) \|
	\| 🛡️ Age Estimation Robustness \| Cosmetic adversarial attacks against age verification \| [age-adversarial-attack](./datasets/Scam-AI/age-adversarial-attack) \|
	\| 👁️ Behavioral Biometrics \| Gaze-based liveness for video interview verification \| [synthetic-gaze-reading](./datasets/Scam-AI/synthetic-gaze-reading) \|

	---

	## 📚 Featured Datasets

	All datasets are released for academic research and non-commercial use under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.

	### 🎭 Deepfake Detection
	- [RWFS — Real-World Faceswap Dataset](./datasets/Scam-AI/RWFS) — 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
	> Ren et al., "Do Deepfake Detectors Work in Reality?" — arXiv:2502.10920

	### 📄 Document Forgery & Forensics
	- [AIForge-Doc v2](./datasets/Scam-AI/AIForge-Doc-v2) — 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
	- [AIForge-Doc v1](./datasets/Scam-AI/AIForge-Doc-v1) — 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
	- [GPT4o-Receipt](./datasets/Scam-AI/gpt4o-receipt) — 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.

	### 🖼️ AI-Generated Image Detection
	- [GPT-Image-2 Twitter Dataset](./datasets/Scam-AI/gpt-image-2) — 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).

	### 🛡️ Identity Verification Robustness
	- [Age Adversarial Attack Dataset](./datasets/Scam-AI/age-adversarial-attack) — 5,809 VLM-simulated cosmetic attacks (beard, gray hair, makeup, wrinkles) demonstrating 29–65% attack-conversion rate on production age estimators.
	> Ren et al., CVPR 2026
	- [Synthetic Eye Movement Dataset](./datasets/Scam-AI/synthetic-gaze-reading) — 12 hours of synthetic eye-movement video (144 sessions × 5 min) for script-reading detection in video interviews.

	---

	## 📑 Publications

	13 papers across deepfake detection, AI-generated detection, document forgery, age estimation, and interview technology. Browse the full list at [scam.ai/research](https://www.scam.ai/en/research).

	Selected work:
	- Do Deepfake Detectors Work in Reality? — Ren, Patil, Zewde et al.
	- AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents — Wu, Zhou, Xu et al. (arXiv:2602.20569)
	- GPT-Image-2 in the Wild — Zewde, Ren, Shen et al. (arXiv:2604.25370)
	- Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems — Shen, Duong, An et al. (arXiv:2602.19539, CVPR 2026)

	---

	## 💼 For Enterprise

	The datasets above are released for the research community. For production needs we offer:

	- Detection APIs — Deepfake, document forgery, AI-image, and age-verification endpoints with latency and accuracy SLAs
	- On-premise deployment — Private cloud or air-gapped installations for regulated industries (banking, government, healthcare)
	- Commercial licensing — Use our datasets and models in commercial pipelines
	- Custom models — Trained on your domain, evaluated against the threat models we've published

	📧 sales@scam.ai · 🌐 [scam.ai](https://www.scam.ai)

	---

	## 🤝 Get Involved

	- ⭐ Follow this org to get notified of new dataset releases
	- 📥 Download any dataset (free for non-commercial research, just provide name + email)
	- 📝 Cite our papers if you publish work building on these resources
	- 🐛 Open a discussion on any dataset to report issues or share results

	---

	Building detection systems for an era when generative AI makes every digital artifact suspect.

	---
	title: Scam.AI
	emoji: 🛡️
	colorFrom: blue
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# Scam.AI

	Detection systems for AI-driven fraud — deepfakes, document forgery, synthetic media, and adversarial attacks against identity verification.

	[![Website](https://img.shields.io/badge/scam.ai-Website-blue)](https://www.scam.ai)
	[![Research](https://img.shields.io/badge/Research-Publications-orange)](https://www.scam.ai/en/research)
	[![Datasets](https://img.shields.io/badge/Datasets-7%20open-green)](https://huggingface.co/Scam-AI)

	---

	## What We Do

	Scam.AI builds detection systems that protect identity-verification pipelines, financial-document workflows, and digital media ecosystems from the next generation of AI-driven fraud. Our research portfolio spans deepfake detection, document forgery forensics, AI-generated image attribution, age-estimation robustness, and behavioral-biometric verification — published at top venues (CVPR, arXiv) and released here as open benchmarks for the community.

	---

	## 🔬 Research Areas

	\| Area \| Focus \| Key Datasets \|
	\|------\|-------\|--------------\|
	\| 🎭 Deepfake Detection \| Real-world faceswap detection beyond academic benchmarks \| [RWFS](./datasets/Scam-AI/RWFS) \|
	\| 📄 Document Forgery \| AI-inpainted receipts, forms, and financial documents \| [AIForge-Doc-v2](./datasets/Scam-AI/AIForge-Doc-v2) · [AIForge-Doc-v1](./datasets/Scam-AI/AIForge-Doc-v1) · [gpt4o-receipt](./datasets/Scam-AI/gpt4o-receipt) \|
	\| 🖼️ AI-Generated Image Detection \| Self-reported AI-generated images in the wild \| [gpt-image-2](./datasets/Scam-AI/gpt-image-2) \|
	\| 🛡️ Age Estimation Robustness \| Cosmetic adversarial attacks against age verification \| [age-adversarial-attack](./datasets/Scam-AI/age-adversarial-attack) \|
	\| 👁️ Behavioral Biometrics \| Gaze-based liveness for video interview verification \| [synthetic-gaze-reading](./datasets/Scam-AI/synthetic-gaze-reading) \|

	---

	## 📚 Featured Datasets

	All datasets are released for academic research and non-commercial use under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.

	### 🎭 Deepfake Detection
	- [RWFS — Real-World Faceswap Dataset](./datasets/Scam-AI/RWFS) — 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
	> Ren et al., "Do Deepfake Detectors Work in Reality?" — arXiv:2502.10920

	### 📄 Document Forgery & Forensics
	- [AIForge-Doc v2](./datasets/Scam-AI/AIForge-Doc-v2) — 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
	- [AIForge-Doc v1](./datasets/Scam-AI/AIForge-Doc-v1) — 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
	- [GPT4o-Receipt](./datasets/Scam-AI/gpt4o-receipt) — 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.

	### 🖼️ AI-Generated Image Detection
	- [GPT-Image-2 Twitter Dataset](./datasets/Scam-AI/gpt-image-2) — 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).

	### 🛡️ Identity Verification Robustness
	- [Age Adversarial Attack Dataset](./datasets/Scam-AI/age-adversarial-attack) — 5,809 VLM-simulated cosmetic attacks (beard, gray hair, makeup, wrinkles) demonstrating 29–65% attack-conversion rate on production age estimators.
	> Ren et al., CVPR 2026
	- [Synthetic Eye Movement Dataset](./datasets/Scam-AI/synthetic-gaze-reading) — 12 hours of synthetic eye-movement video (144 sessions × 5 min) for script-reading detection in video interviews.

	---

	## 📑 Publications

	13 papers across deepfake detection, AI-generated detection, document forgery, age estimation, and interview technology. Browse the full list at [scam.ai/research](https://www.scam.ai/en/research).

	Selected work:
	- Do Deepfake Detectors Work in Reality? — Ren, Patil, Zewde et al.
	- AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents — Wu, Zhou, Xu et al. (arXiv:2602.20569)
	- GPT-Image-2 in the Wild — Zewde, Ren, Shen et al. (arXiv:2604.25370)
	- Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems — Shen, Duong, An et al. (arXiv:2602.19539, CVPR 2026)

	---

	## 💼 For Enterprise

	The datasets above are released for the research community. For production needs we offer:

	- Detection APIs — Deepfake, document forgery, AI-image, and age-verification endpoints with latency and accuracy SLAs
	- On-premise deployment — Private cloud or air-gapped installations for regulated industries (banking, government, healthcare)
	- Commercial licensing — Use our datasets and models in commercial pipelines
	- Custom models — Trained on your domain, evaluated against the threat models we've published

	📧 sales@scam.ai · 🌐 [scam.ai](https://www.scam.ai)

	---

	## 🤝 Get Involved

	- ⭐ Follow this org to get notified of new dataset releases
	- 📥 Download any dataset (free for non-commercial research, just provide name + email)
	- 📝 Cite our papers if you publish work building on these resources
	- 🐛 Open a discussion on any dataset to report issues or share results

	---

	Building detection systems for an era when generative AI makes every digital artifact suspect.