Spaces:
Sleeping
title: KAMY Vision AI
emoji: π‘οΈ
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
KAMY Vision AI
Multimodal forensic platform for deepfake detection. Analyzes images, audio, video, and text via a layered pipeline combining Vision Transformer ensembles with deterministic forensic signals.
Production: app.kamydev.com Β· API at oyabun-dev-kamyvision.hf.space Β· Docs at docs.kamydev.com
Stack
- Backend: Python 3.10+, FastAPI, uvicorn, PyTorch, HuggingFace Transformers
- Frontend: React 18, TypeScript, Vite β deployed on Vercel
- API hosting: HuggingFace Spaces (Docker)
- Docs: React + custom CSS β deployed on Vercel
Models
Image ensemble (3 ViT models, weighted average)
| Model | Weight | Task |
|---|---|---|
Ateeqq/ai-vs-human-image-detector |
45% | AI-generated vs human photo |
prithivMLmods/AI-vs-Deepfake-vs-Real |
35% | 3 classes: AI / Deepfake / Real |
prithivMLmods/Deep-Fake-Detector-Model |
20% | Facial deepfakes |
Forensic layers (no ML)
- EXIF β 19 AI generator signatures detected (Gemini, DALL-E, Firefly, Midjourney, Flux, SynthID, Canva AI, Stable Diffusion...)
- FFT β frequency spectrum analysis, GAN oversmoothing and periodic peak detection
- Texture β local variance per 16Γ16 patch, unnatural uniformity in skin/background
- Color β colorimetric entropy and HSV distribution, artificial saturation patterns
Fusion profiles
The engine selects a profile based on EXIF results, then adjusts weights:
| Profile | Trigger | EXIF weight |
|---|---|---|
EXIF_IA_DETECTE |
AI source found in metadata | 60% |
EXIF_FIABLE |
Real camera identified | 32% |
EXIF_ABSENT |
No metadata (stripped by social network) | 0%, FFT+texture boosted |
STANDARD |
General case | 20% |
Audio (pending)
MelodyMachine/Deepfake-audio-detection-V2 (wav2vec2) β pending ONNX conversion.
API endpoints
Base URL (local): http://localhost:8000
Base URL (production): https://oyabun-dev-kamyvision.hf.space
| Method | Endpoint | Status | Description |
|---|---|---|---|
GET |
/health |
Stable | API and model status |
POST |
/analyze/image |
Stable | Full image analysis (3 ViT + 4 forensic layers) |
POST |
/analyze/image/fast |
Stable | Fast image analysis (2 ViT + EXIF only) |
POST |
/analyze/audio |
WIP | Synthetic voice detection |
POST |
/analyze/video |
WIP | Frame-by-frame video analysis |
POST |
/analyze/text |
WIP | AI-generated text detection |
# Health check
curl http://localhost:8000/health
# Full image analysis
curl -X POST http://localhost:8000/analyze/image \
-F "file=@photo.jpg"
# Fast image analysis
curl -X POST http://localhost:8000/analyze/image/fast \
-F "file=@photo.jpg"
Response structure
{
"status": "success",
"verdict": "DEEPFAKE",
"fake_prob": 0.8731,
"real_prob": 0.1269,
"confidence": "high",
"reason": "AI source detected in EXIF metadata (Google Gemini).",
"fusion_profile": "EXIF_IA_DETECTE",
"ai_source": "Google Gemini",
"layer_scores": {
"ensemble": 0.82,
"exif": 0.97,
"fft": 0.61,
"texture": 0.55,
"color": 0.70
},
"weights_used": {
"ensemble": 0.20,
"exif": 0.60,
"fft": 0.08,
"texture": 0.07,
"color": 0.05
},
"models": [
"Ateeqq/ai-vs-human-image-detector",
"prithivMLmods/AI-vs-Deepfake-vs-Real",
"prithivMLmods/Deep-Fake-Detector-Model"
]
}
Getting started
Prerequisites
| Tool | Version |
|---|---|
| Python | 3.10+ |
| Node.js | 18+ |
| Docker | 24+ (optional) |
Backend
git clone https://github.com/oyabun-dev/deepfake_detection
cd deepfake_detection
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Models (~2β4 GB) are downloaded and cached automatically on first startup.
Frontend
cd frontend-react
npm install
npm run dev
Docker (recommended)
docker compose up --build
- API:
http://localhost:8000 - Frontend:
http://localhost:3000
Project structure
deepfake_detection/
βββ app/
β βββ main.py β FastAPI application, CORS, routers
β βββ core/
β β βββ config.py β Constants (formats, thresholds, max size)
β β βββ device.py β Automatic CPU/GPU selection
β βββ routers/
β β βββ image.py β /analyze/image and /analyze/image/fast
β β βββ audio.py β /analyze/audio (WIP)
β β βββ video.py β /analyze/video (WIP)
β β βββ text.py β /analyze/text (WIP)
β βββ pipelines/
β βββ image.py β Full pipeline: run() and run_fast()
βββ frontend-react/ β React + Vite frontend
βββ docs/ β React documentation site
βββ docker-compose.yml
βββ docker-compose.prod.yml
βββ Dockerfile
βββ requirements.txt
Deployment
HuggingFace Spaces (API)
pip install huggingface_hub
huggingface-cli login
git remote add spaces https://huggingface.co/spaces/oyabun-dev/kamyvision
git push spaces main
Vercel (frontend + docs)
Both the React frontend (frontend-react/) and the documentation (docs/) are deployed on Vercel. See the Deployment docs for full configuration.
Known limitations
The 3 ViT models were primarily trained on GAN datasets. Performance is degraded on recent diffusion model outputs (Midjourney v6, Stable Diffusion XL, Flux.1). EXIF analysis partially compensates for images that retain their metadata.
License
MIT