--- title: KAMY Vision AI emoji: ๐Ÿ›ก๏ธ colorFrom: purple colorTo: blue sdk: docker app_port: 8000 pinned: false --- # KAMY Vision AI Multimodal forensic platform for deepfake detection. Analyzes images, audio, video, and text via a layered pipeline combining Vision Transformer ensembles with deterministic forensic signals. **Production:** [app.kamydev.com](https://app.kamydev.com) ยท API at [oyabun-dev-kamyvision.hf.space](https://oyabun-dev-kamyvision.hf.space) ยท Docs at [docs.kamydev.com](https://docs.kamydev.com) --- ## Stack - **Backend:** Python 3.10+, FastAPI, uvicorn, PyTorch, HuggingFace Transformers - **Frontend:** React 18, TypeScript, Vite โ€” deployed on Vercel - **API hosting:** HuggingFace Spaces (Docker) - **Docs:** React + custom CSS โ€” deployed on Vercel --- ## Models ### Image ensemble (3 ViT models, weighted average) | Model | Weight | Task | |-------|--------|------| | `Ateeqq/ai-vs-human-image-detector` | 45% | AI-generated vs human photo | | `prithivMLmods/AI-vs-Deepfake-vs-Real` | 35% | 3 classes: AI / Deepfake / Real | | `prithivMLmods/Deep-Fake-Detector-Model` | 20% | Facial deepfakes | ### Forensic layers (no ML) - **EXIF** โ€” 19 AI generator signatures detected (Gemini, DALL-E, Firefly, Midjourney, Flux, SynthID, Canva AI, Stable Diffusion...) - **FFT** โ€” frequency spectrum analysis, GAN oversmoothing and periodic peak detection - **Texture** โ€” local variance per 16ร—16 patch, unnatural uniformity in skin/background - **Color** โ€” colorimetric entropy and HSV distribution, artificial saturation patterns ### Fusion profiles The engine selects a profile based on EXIF results, then adjusts weights: | Profile | Trigger | EXIF weight | |---------|---------|-------------| | `EXIF_IA_DETECTE` | AI source found in metadata | 60% | | `EXIF_FIABLE` | Real camera identified | 32% | | `EXIF_ABSENT` | No metadata (stripped by social network) | 0%, FFT+texture boosted | | `STANDARD` | General case | 20% | ### Audio (pending) `MelodyMachine/Deepfake-audio-detection-V2` (wav2vec2) โ€” pending ONNX conversion. --- ## API endpoints Base URL (local): `http://localhost:8000` Base URL (production): `https://oyabun-dev-kamyvision.hf.space` | Method | Endpoint | Status | Description | |--------|----------|--------|-------------| | `GET` | `/health` | Stable | API and model status | | `POST` | `/analyze/image` | Stable | Full image analysis (3 ViT + 4 forensic layers) | | `POST` | `/analyze/image/fast` | Stable | Fast image analysis (2 ViT + EXIF only) | | `POST` | `/analyze/audio` | WIP | Synthetic voice detection | | `POST` | `/analyze/video` | WIP | Frame-by-frame video analysis | | `POST` | `/analyze/text` | WIP | AI-generated text detection | ```bash # Health check curl http://localhost:8000/health # Full image analysis curl -X POST http://localhost:8000/analyze/image \ -F "file=@photo.jpg" # Fast image analysis curl -X POST http://localhost:8000/analyze/image/fast \ -F "file=@photo.jpg" ``` ### Response structure ```json { "status": "success", "verdict": "DEEPFAKE", "fake_prob": 0.8731, "real_prob": 0.1269, "confidence": "high", "reason": "AI source detected in EXIF metadata (Google Gemini).", "fusion_profile": "EXIF_IA_DETECTE", "ai_source": "Google Gemini", "layer_scores": { "ensemble": 0.82, "exif": 0.97, "fft": 0.61, "texture": 0.55, "color": 0.70 }, "weights_used": { "ensemble": 0.20, "exif": 0.60, "fft": 0.08, "texture": 0.07, "color": 0.05 }, "models": [ "Ateeqq/ai-vs-human-image-detector", "prithivMLmods/AI-vs-Deepfake-vs-Real", "prithivMLmods/Deep-Fake-Detector-Model" ] } ``` --- ## Getting started ### Prerequisites | Tool | Version | |------|---------| | Python | 3.10+ | | Node.js | 18+ | | Docker | 24+ (optional) | ### Backend ```bash git clone https://github.com/oyabun-dev/deepfake_detection cd deepfake_detection python -m venv .venv source .venv/bin/activate pip install -r requirements.txt uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` Models (~2โ€“4 GB) are downloaded and cached automatically on first startup. ### Frontend ```bash cd frontend-react npm install npm run dev ``` ### Docker (recommended) ```bash docker compose up --build ``` - API: `http://localhost:8000` - Frontend: `http://localhost:3000` --- ## Project structure ``` deepfake_detection/ โ”œโ”€โ”€ app/ โ”‚ โ”œโ”€โ”€ main.py โ€” FastAPI application, CORS, routers โ”‚ โ”œโ”€โ”€ core/ โ”‚ โ”‚ โ”œโ”€โ”€ config.py โ€” Constants (formats, thresholds, max size) โ”‚ โ”‚ โ””โ”€โ”€ device.py โ€” Automatic CPU/GPU selection โ”‚ โ”œโ”€โ”€ routers/ โ”‚ โ”‚ โ”œโ”€โ”€ image.py โ€” /analyze/image and /analyze/image/fast โ”‚ โ”‚ โ”œโ”€โ”€ audio.py โ€” /analyze/audio (WIP) โ”‚ โ”‚ โ”œโ”€โ”€ video.py โ€” /analyze/video (WIP) โ”‚ โ”‚ โ””โ”€โ”€ text.py โ€” /analyze/text (WIP) โ”‚ โ””โ”€โ”€ pipelines/ โ”‚ โ””โ”€โ”€ image.py โ€” Full pipeline: run() and run_fast() โ”œโ”€โ”€ frontend-react/ โ€” React + Vite frontend โ”œโ”€โ”€ docs/ โ€” React documentation site โ”œโ”€โ”€ docker-compose.yml โ”œโ”€โ”€ docker-compose.prod.yml โ”œโ”€โ”€ Dockerfile โ””โ”€โ”€ requirements.txt ``` --- ## Deployment ### HuggingFace Spaces (API) ```bash pip install huggingface_hub huggingface-cli login git remote add spaces https://huggingface.co/spaces/oyabun-dev/kamyvision git push spaces main ``` ### Vercel (frontend + docs) Both the React frontend (`frontend-react/`) and the documentation (`docs/`) are deployed on Vercel. See the [Deployment docs](https://docs.kamydev.com/deploy) for full configuration. --- ## Known limitations The 3 ViT models were primarily trained on GAN datasets. Performance is degraded on recent diffusion model outputs (Midjourney v6, Stable Diffusion XL, Flux.1). EXIF analysis partially compensates for images that retain their metadata. --- ## License MIT