Spaces:

oyabun-dev
/

kamyvision-api

Sleeping

App Files Files Community

kamyvision-api / README.md

oyabun-dev

deploy: 2026-04-02T00:05:48Z

55bcd2b 17 days ago

preview code

raw

history blame contribute delete

6.02 kB

metadata

title: KAMY Vision AI
emoji: 🛡️
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 8000
pinned: false

KAMY Vision AI

Multimodal forensic platform for deepfake detection. Analyzes images, audio, video, and text via a layered pipeline combining Vision Transformer ensembles with deterministic forensic signals.

Production: app.kamydev.com · API at oyabun-dev-kamyvision.hf.space · Docs at docs.kamydev.com

Stack

Backend: Python 3.10+, FastAPI, uvicorn, PyTorch, HuggingFace Transformers
Frontend: React 18, TypeScript, Vite — deployed on Vercel
API hosting: HuggingFace Spaces (Docker)
Docs: React + custom CSS — deployed on Vercel

Models

Image ensemble (3 ViT models, weighted average)

Model	Weight	Task
`Ateeqq/ai-vs-human-image-detector`	45%	AI-generated vs human photo
`prithivMLmods/AI-vs-Deepfake-vs-Real`	35%	3 classes: AI / Deepfake / Real
`prithivMLmods/Deep-Fake-Detector-Model`	20%	Facial deepfakes

Forensic layers (no ML)

EXIF — 19 AI generator signatures detected (Gemini, DALL-E, Firefly, Midjourney, Flux, SynthID, Canva AI, Stable Diffusion...)
FFT — frequency spectrum analysis, GAN oversmoothing and periodic peak detection
Texture — local variance per 16×16 patch, unnatural uniformity in skin/background
Color — colorimetric entropy and HSV distribution, artificial saturation patterns

Fusion profiles

The engine selects a profile based on EXIF results, then adjusts weights:

Profile	Trigger	EXIF weight
`EXIF_IA_DETECTE`	AI source found in metadata	60%
`EXIF_FIABLE`	Real camera identified	32%
`EXIF_ABSENT`	No metadata (stripped by social network)	0%, FFT+texture boosted
`STANDARD`	General case	20%

Audio (pending)

MelodyMachine/Deepfake-audio-detection-V2 (wav2vec2) — pending ONNX conversion.

API endpoints

Base URL (local): http://localhost:8000 Base URL (production): https://oyabun-dev-kamyvision.hf.space

Method	Endpoint	Status	Description
`GET`	`/health`	Stable	API and model status
`POST`	`/analyze/image`	Stable	Full image analysis (3 ViT + 4 forensic layers)
`POST`	`/analyze/image/fast`	Stable	Fast image analysis (2 ViT + EXIF only)
`POST`	`/analyze/audio`	WIP	Synthetic voice detection
`POST`	`/analyze/video`	WIP	Frame-by-frame video analysis
`POST`	`/analyze/text`	WIP	AI-generated text detection

# Health check
curl http://localhost:8000/health

# Full image analysis
curl -X POST http://localhost:8000/analyze/image \
  -F "file=@photo.jpg"

# Fast image analysis
curl -X POST http://localhost:8000/analyze/image/fast \
  -F "file=@photo.jpg"

Response structure

{
  "status": "success",
  "verdict": "DEEPFAKE",
  "fake_prob": 0.8731,
  "real_prob": 0.1269,
  "confidence": "high",
  "reason": "AI source detected in EXIF metadata (Google Gemini).",
  "fusion_profile": "EXIF_IA_DETECTE",
  "ai_source": "Google Gemini",
  "layer_scores": {
    "ensemble": 0.82,
    "exif": 0.97,
    "fft": 0.61,
    "texture": 0.55,
    "color": 0.70
  },
  "weights_used": {
    "ensemble": 0.20,
    "exif": 0.60,
    "fft": 0.08,
    "texture": 0.07,
    "color": 0.05
  },
  "models": [
    "Ateeqq/ai-vs-human-image-detector",
    "prithivMLmods/AI-vs-Deepfake-vs-Real",
    "prithivMLmods/Deep-Fake-Detector-Model"
  ]
}

Getting started

Prerequisites

Tool	Version
Python	3.10+
Node.js	18+
Docker	24+ (optional)

Backend

git clone https://github.com/oyabun-dev/deepfake_detection
cd deepfake_detection

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Models (~2–4 GB) are downloaded and cached automatically on first startup.

Frontend

cd frontend-react
npm install
npm run dev

Docker (recommended)

docker compose up --build

API: http://localhost:8000
Frontend: http://localhost:3000

Project structure

deepfake_detection/
├── app/
│   ├── main.py              — FastAPI application, CORS, routers
│   ├── core/
│   │   ├── config.py        — Constants (formats, thresholds, max size)
│   │   └── device.py        — Automatic CPU/GPU selection
│   ├── routers/
│   │   ├── image.py         — /analyze/image and /analyze/image/fast
│   │   ├── audio.py         — /analyze/audio (WIP)
│   │   ├── video.py         — /analyze/video (WIP)
│   │   └── text.py          — /analyze/text (WIP)
│   └── pipelines/
│       └── image.py         — Full pipeline: run() and run_fast()
├── frontend-react/          — React + Vite frontend
├── docs/                    — React documentation site
├── docker-compose.yml
├── docker-compose.prod.yml
├── Dockerfile
└── requirements.txt

Deployment

HuggingFace Spaces (API)

pip install huggingface_hub
huggingface-cli login
git remote add spaces https://huggingface.co/spaces/oyabun-dev/kamyvision
git push spaces main

Vercel (frontend + docs)

Both the React frontend (frontend-react/) and the documentation (docs/) are deployed on Vercel. See the Deployment docs for full configuration.

Known limitations

The 3 ViT models were primarily trained on GAN datasets. Performance is degraded on recent diffusion model outputs (Midjourney v6, Stable Diffusion XL, Flux.1). EXIF analysis partially compensates for images that retain their metadata.

License

MIT