kamyvision-api / README.md
oyabun-dev's picture
deploy: 2026-04-02T00:05:48Z
55bcd2b
metadata
title: KAMY Vision AI
emoji: πŸ›‘οΈ
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 8000
pinned: false

KAMY Vision AI

Multimodal forensic platform for deepfake detection. Analyzes images, audio, video, and text via a layered pipeline combining Vision Transformer ensembles with deterministic forensic signals.

Production: app.kamydev.com Β· API at oyabun-dev-kamyvision.hf.space Β· Docs at docs.kamydev.com


Stack

  • Backend: Python 3.10+, FastAPI, uvicorn, PyTorch, HuggingFace Transformers
  • Frontend: React 18, TypeScript, Vite β€” deployed on Vercel
  • API hosting: HuggingFace Spaces (Docker)
  • Docs: React + custom CSS β€” deployed on Vercel

Models

Image ensemble (3 ViT models, weighted average)

Model Weight Task
Ateeqq/ai-vs-human-image-detector 45% AI-generated vs human photo
prithivMLmods/AI-vs-Deepfake-vs-Real 35% 3 classes: AI / Deepfake / Real
prithivMLmods/Deep-Fake-Detector-Model 20% Facial deepfakes

Forensic layers (no ML)

  • EXIF β€” 19 AI generator signatures detected (Gemini, DALL-E, Firefly, Midjourney, Flux, SynthID, Canva AI, Stable Diffusion...)
  • FFT β€” frequency spectrum analysis, GAN oversmoothing and periodic peak detection
  • Texture β€” local variance per 16Γ—16 patch, unnatural uniformity in skin/background
  • Color β€” colorimetric entropy and HSV distribution, artificial saturation patterns

Fusion profiles

The engine selects a profile based on EXIF results, then adjusts weights:

Profile Trigger EXIF weight
EXIF_IA_DETECTE AI source found in metadata 60%
EXIF_FIABLE Real camera identified 32%
EXIF_ABSENT No metadata (stripped by social network) 0%, FFT+texture boosted
STANDARD General case 20%

Audio (pending)

MelodyMachine/Deepfake-audio-detection-V2 (wav2vec2) β€” pending ONNX conversion.


API endpoints

Base URL (local): http://localhost:8000 Base URL (production): https://oyabun-dev-kamyvision.hf.space

Method Endpoint Status Description
GET /health Stable API and model status
POST /analyze/image Stable Full image analysis (3 ViT + 4 forensic layers)
POST /analyze/image/fast Stable Fast image analysis (2 ViT + EXIF only)
POST /analyze/audio WIP Synthetic voice detection
POST /analyze/video WIP Frame-by-frame video analysis
POST /analyze/text WIP AI-generated text detection
# Health check
curl http://localhost:8000/health

# Full image analysis
curl -X POST http://localhost:8000/analyze/image \
  -F "file=@photo.jpg"

# Fast image analysis
curl -X POST http://localhost:8000/analyze/image/fast \
  -F "file=@photo.jpg"

Response structure

{
  "status": "success",
  "verdict": "DEEPFAKE",
  "fake_prob": 0.8731,
  "real_prob": 0.1269,
  "confidence": "high",
  "reason": "AI source detected in EXIF metadata (Google Gemini).",
  "fusion_profile": "EXIF_IA_DETECTE",
  "ai_source": "Google Gemini",
  "layer_scores": {
    "ensemble": 0.82,
    "exif": 0.97,
    "fft": 0.61,
    "texture": 0.55,
    "color": 0.70
  },
  "weights_used": {
    "ensemble": 0.20,
    "exif": 0.60,
    "fft": 0.08,
    "texture": 0.07,
    "color": 0.05
  },
  "models": [
    "Ateeqq/ai-vs-human-image-detector",
    "prithivMLmods/AI-vs-Deepfake-vs-Real",
    "prithivMLmods/Deep-Fake-Detector-Model"
  ]
}

Getting started

Prerequisites

Tool Version
Python 3.10+
Node.js 18+
Docker 24+ (optional)

Backend

git clone https://github.com/oyabun-dev/deepfake_detection
cd deepfake_detection

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Models (~2–4 GB) are downloaded and cached automatically on first startup.

Frontend

cd frontend-react
npm install
npm run dev

Docker (recommended)

docker compose up --build
  • API: http://localhost:8000
  • Frontend: http://localhost:3000

Project structure

deepfake_detection/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              β€” FastAPI application, CORS, routers
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py        β€” Constants (formats, thresholds, max size)
β”‚   β”‚   └── device.py        β€” Automatic CPU/GPU selection
β”‚   β”œβ”€β”€ routers/
β”‚   β”‚   β”œβ”€β”€ image.py         β€” /analyze/image and /analyze/image/fast
β”‚   β”‚   β”œβ”€β”€ audio.py         β€” /analyze/audio (WIP)
β”‚   β”‚   β”œβ”€β”€ video.py         β€” /analyze/video (WIP)
β”‚   β”‚   └── text.py          β€” /analyze/text (WIP)
β”‚   └── pipelines/
β”‚       └── image.py         β€” Full pipeline: run() and run_fast()
β”œβ”€β”€ frontend-react/          β€” React + Vite frontend
β”œβ”€β”€ docs/                    β€” React documentation site
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ docker-compose.prod.yml
β”œβ”€β”€ Dockerfile
└── requirements.txt

Deployment

HuggingFace Spaces (API)

pip install huggingface_hub
huggingface-cli login
git remote add spaces https://huggingface.co/spaces/oyabun-dev/kamyvision
git push spaces main

Vercel (frontend + docs)

Both the React frontend (frontend-react/) and the documentation (docs/) are deployed on Vercel. See the Deployment docs for full configuration.


Known limitations

The 3 ViT models were primarily trained on GAN datasets. Performance is degraded on recent diffusion model outputs (Midjourney v6, Stable Diffusion XL, Flux.1). EXIF analysis partially compensates for images that retain their metadata.


License

MIT