Fix relative dataset links to absolute URLs
Browse files
README.md
CHANGED
|
@@ -27,11 +27,11 @@ Scam.AI builds detection systems that protect identity-verification pipelines, f
|
|
| 27 |
|
| 28 |
| Area | Focus | Key Datasets |
|
| 29 |
|------|-------|--------------|
|
| 30 |
-
| **π Deepfake Detection** | Real-world faceswap detection beyond academic benchmarks | [RWFS](./datasets/Scam-AI/RWFS) |
|
| 31 |
-
| **π Document Forgery** | AI-inpainted receipts, forms, and financial documents | [AIForge-Doc-v2](./datasets/Scam-AI/AIForge-Doc-v2) Β· [AIForge-Doc-v1](./datasets/Scam-AI/AIForge-Doc-v1) Β· [gpt4o-receipt](./datasets/Scam-AI/gpt4o-receipt) |
|
| 32 |
-
| **πΌοΈ AI-Generated Image Detection** | Self-reported AI-generated images in the wild | [gpt-image-2](./datasets/Scam-AI/gpt-image-2) |
|
| 33 |
-
| **π‘οΈ Age Estimation Robustness** | Cosmetic adversarial attacks against age verification | [age-adversarial-attack](./datasets/Scam-AI/age-adversarial-attack) |
|
| 34 |
-
| **ποΈ Behavioral Biometrics** | Gaze-based liveness for video interview verification | [synthetic-gaze-reading](./datasets/Scam-AI/synthetic-gaze-reading) |
|
| 35 |
|
| 36 |
---
|
| 37 |
|
|
@@ -40,21 +40,21 @@ Scam.AI builds detection systems that protect identity-verification pipelines, f
|
|
| 40 |
All datasets are released for **academic research and non-commercial use** under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.
|
| 41 |
|
| 42 |
### π Deepfake Detection
|
| 43 |
-
- **[RWFS β Real-World Faceswap Dataset](./datasets/Scam-AI/RWFS)** β 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
|
| 44 |
> *Ren et al., "Do Deepfake Detectors Work in Reality?" β arXiv:2502.10920*
|
| 45 |
|
| 46 |
### π Document Forgery & Forensics
|
| 47 |
-
- **[AIForge-Doc v2](./datasets/Scam-AI/AIForge-Doc-v2)** β 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
|
| 48 |
-
- **[AIForge-Doc v1](./datasets/Scam-AI/AIForge-Doc-v1)** β 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
|
| 49 |
-
- **[GPT4o-Receipt](./datasets/Scam-AI/gpt4o-receipt)** β 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.
|
| 50 |
|
| 51 |
### πΌοΈ AI-Generated Image Detection
|
| 52 |
-
- **[GPT-Image-2 Twitter Dataset](./datasets/Scam-AI/gpt-image-2)** β 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).
|
| 53 |
|
| 54 |
### π‘οΈ Identity Verification Robustness
|
| 55 |
-
- **[Age Adversarial Attack Dataset](./datasets/Scam-AI/age-adversarial-attack)** β 5,809 VLM-simulated cosmetic attacks (beard, gray hair, makeup, wrinkles) demonstrating 29β65% attack-conversion rate on production age estimators.
|
| 56 |
> *Ren et al., CVPR 2026*
|
| 57 |
-
- **[Synthetic Eye Movement Dataset](./datasets/Scam-AI/synthetic-gaze-reading)** β 12 hours of synthetic eye-movement video (144 sessions Γ 5 min) for script-reading detection in video interviews.
|
| 58 |
|
| 59 |
---
|
| 60 |
|
|
|
|
| 27 |
|
| 28 |
| Area | Focus | Key Datasets |
|
| 29 |
|------|-------|--------------|
|
| 30 |
+
| **π Deepfake Detection** | Real-world faceswap detection beyond academic benchmarks | [RWFS](https://huggingface.co/datasets/Scam-AI/RWFS) |
|
| 31 |
+
| **π Document Forgery** | AI-inpainted receipts, forms, and financial documents | [AIForge-Doc-v2](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v2) Β· [AIForge-Doc-v1](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v1) Β· [gpt4o-receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt) |
|
| 32 |
+
| **πΌοΈ AI-Generated Image Detection** | Self-reported AI-generated images in the wild | [gpt-image-2](https://huggingface.co/datasets/Scam-AI/gpt-image-2) |
|
| 33 |
+
| **π‘οΈ Age Estimation Robustness** | Cosmetic adversarial attacks against age verification | [age-adversarial-attack](https://huggingface.co/datasets/Scam-AI/age-adversarial-attack) |
|
| 34 |
+
| **ποΈ Behavioral Biometrics** | Gaze-based liveness for video interview verification | [synthetic-gaze-reading](https://huggingface.co/datasets/Scam-AI/synthetic-gaze-reading) |
|
| 35 |
|
| 36 |
---
|
| 37 |
|
|
|
|
| 40 |
All datasets are released for **academic research and non-commercial use** under CC-BY-NC-SA 4.0. Email-gated download with automatic approval.
|
| 41 |
|
| 42 |
### π Deepfake Detection
|
| 43 |
+
- **[RWFS β Real-World Faceswap Dataset](https://huggingface.co/datasets/Scam-AI/RWFS)** β 847 deepfakes from 8 production faceswap tools (Pixlr, Magic Hour, Remaker, etc) + 900 authentic faces. The first dataset reflecting how deepfakes actually appear in the wild.
|
| 44 |
> *Ren et al., "Do Deepfake Detectors Work in Reality?" β arXiv:2502.10920*
|
| 45 |
|
| 46 |
### π Document Forgery & Forensics
|
| 47 |
+
- **[AIForge-Doc v2](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v2)** β 3,066 GPT-Image-2 inpainted document forgeries paired with authentic source + pixel-precise tampering masks. DocTamper-compatible.
|
| 48 |
+
- **[AIForge-Doc v1](https://huggingface.co/datasets/Scam-AI/AIForge-Doc-v1)** β 4,061 forgeries via Gemini 2.5 / Ideogram v2. Same-spec pairing with v2 enables cross-generator detector analysis.
|
| 49 |
+
- **[GPT4o-Receipt](https://huggingface.co/datasets/Scam-AI/gpt4o-receipt)** β 935 fully AI-synthesized receipts (GPT-4o + GPT-Image-1) across 159 merchant categories. Companion human-vs-LLM forensic detection study.
|
| 50 |
|
| 51 |
### πΌοΈ AI-Generated Image Detection
|
| 52 |
+
- **[GPT-Image-2 Twitter Dataset](https://huggingface.co/datasets/Scam-AI/gpt-image-2)** β 10,217 confirmed GPT-Image-2 outputs scraped from Twitter/X in the first week post-launch. Multi-language: EN (40%), JA (33%), ZH (19%).
|
| 53 |
|
| 54 |
### π‘οΈ Identity Verification Robustness
|
| 55 |
+
- **[Age Adversarial Attack Dataset](https://huggingface.co/datasets/Scam-AI/age-adversarial-attack)** β 5,809 VLM-simulated cosmetic attacks (beard, gray hair, makeup, wrinkles) demonstrating 29β65% attack-conversion rate on production age estimators.
|
| 56 |
> *Ren et al., CVPR 2026*
|
| 57 |
+
- **[Synthetic Eye Movement Dataset](https://huggingface.co/datasets/Scam-AI/synthetic-gaze-reading)** β 12 hours of synthetic eye-movement video (144 sessions Γ 5 min) for script-reading detection in video interviews.
|
| 58 |
|
| 59 |
---
|
| 60 |
|