---
title: KAMY Vision AI
emoji: 🛡️
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
---

# KAMY Vision AI

Multimodal forensic platform for deepfake detection. Analyzes images, audio, video, and text via a layered pipeline combining Vision Transformer ensembles with deterministic forensic signals.

**Production:** [app.kamydev.com](https://app.kamydev.com) · API at [oyabun-dev-kamyvision.hf.space](https://oyabun-dev-kamyvision.hf.space) · Docs at [docs.kamydev.com](https://docs.kamydev.com)

---

## Stack

- **Backend:** Python 3.10+, FastAPI, uvicorn, PyTorch, HuggingFace Transformers
- **Frontend:** React 18, TypeScript, Vite — deployed on Vercel
- **API hosting:** HuggingFace Spaces (Docker)
- **Docs:** React + custom CSS — deployed on Vercel

---

## Models

### Image ensemble (3 ViT models, weighted average)

| Model | Weight | Task |
|-------|--------|------|
| `Ateeqq/ai-vs-human-image-detector` | 45% | AI-generated vs human photo |
| `prithivMLmods/AI-vs-Deepfake-vs-Real` | 35% | 3 classes: AI / Deepfake / Real |
| `prithivMLmods/Deep-Fake-Detector-Model` | 20% | Facial deepfakes |

### Forensic layers (no ML)

- **EXIF** — 19 AI generator signatures detected (Gemini, DALL-E, Firefly, Midjourney, Flux, SynthID, Canva AI, Stable Diffusion...)
- **FFT** — frequency spectrum analysis, GAN oversmoothing and periodic peak detection
- **Texture** — local variance per 16×16 patch, unnatural uniformity in skin/background
- **Color** — colorimetric entropy and HSV distribution, artificial saturation patterns

### Fusion profiles

The engine selects a profile based on EXIF results, then adjusts weights:

| Profile | Trigger | EXIF weight |
|---------|---------|-------------|
| `EXIF_IA_DETECTE` | AI source found in metadata | 60% |
| `EXIF_FIABLE` | Real camera identified | 32% |
| `EXIF_ABSENT` | No metadata (stripped by social network) | 0%, FFT+texture boosted |
| `STANDARD` | General case | 20% |

### Audio (pending)

`MelodyMachine/Deepfake-audio-detection-V2` (wav2vec2) — pending ONNX conversion.

---

## API endpoints

Base URL (local): `http://localhost:8000`
Base URL (production): `https://oyabun-dev-kamyvision.hf.space`

| Method | Endpoint | Status | Description |
|--------|----------|--------|-------------|
| `GET` | `/health` | Stable | API and model status |
| `POST` | `/analyze/image` | Stable | Full image analysis (3 ViT + 4 forensic layers) |
| `POST` | `/analyze/image/fast` | Stable | Fast image analysis (2 ViT + EXIF only) |
| `POST` | `/analyze/audio` | WIP | Synthetic voice detection |
| `POST` | `/analyze/video` | WIP | Frame-by-frame video analysis |
| `POST` | `/analyze/text` | WIP | AI-generated text detection |

```bash
# Health check
curl http://localhost:8000/health

# Full image analysis
curl -X POST http://localhost:8000/analyze/image \
  -F "file=@photo.jpg"

# Fast image analysis
curl -X POST http://localhost:8000/analyze/image/fast \
  -F "file=@photo.jpg"
```

### Response structure

```json
{
  "status": "success",
  "verdict": "DEEPFAKE",
  "fake_prob": 0.8731,
  "real_prob": 0.1269,
  "confidence": "high",
  "reason": "AI source detected in EXIF metadata (Google Gemini).",
  "fusion_profile": "EXIF_IA_DETECTE",
  "ai_source": "Google Gemini",
  "layer_scores": {
    "ensemble": 0.82,
    "exif": 0.97,
    "fft": 0.61,
    "texture": 0.55,
    "color": 0.70
  },
  "weights_used": {
    "ensemble": 0.20,
    "exif": 0.60,
    "fft": 0.08,
    "texture": 0.07,
    "color": 0.05
  },
  "models": [
    "Ateeqq/ai-vs-human-image-detector",
    "prithivMLmods/AI-vs-Deepfake-vs-Real",
    "prithivMLmods/Deep-Fake-Detector-Model"
  ]
}
```

---

## Getting started

### Prerequisites

| Tool | Version |
|------|---------|
| Python | 3.10+ |
| Node.js | 18+ |
| Docker | 24+ (optional) |

### Backend

```bash
git clone https://github.com/oyabun-dev/deepfake_detection
cd deepfake_detection

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

Models (~2–4 GB) are downloaded and cached automatically on first startup.

### Frontend

```bash
cd frontend-react
npm install
npm run dev
```

### Docker (recommended)

```bash
docker compose up --build
```

- API: `http://localhost:8000`
- Frontend: `http://localhost:3000`

---

## Project structure

```
deepfake_detection/
├── app/
│   ├── main.py              — FastAPI application, CORS, routers
│   ├── core/
│   │   ├── config.py        — Constants (formats, thresholds, max size)
│   │   └── device.py        — Automatic CPU/GPU selection
│   ├── routers/
│   │   ├── image.py         — /analyze/image and /analyze/image/fast
│   │   ├── audio.py         — /analyze/audio (WIP)
│   │   ├── video.py         — /analyze/video (WIP)
│   │   └── text.py          — /analyze/text (WIP)
│   └── pipelines/
│       └── image.py         — Full pipeline: run() and run_fast()
├── frontend-react/          — React + Vite frontend
├── docs/                    — React documentation site
├── docker-compose.yml
├── docker-compose.prod.yml
├── Dockerfile
└── requirements.txt
```

---

## Deployment

### HuggingFace Spaces (API)

```bash
pip install huggingface_hub
huggingface-cli login
git remote add spaces https://huggingface.co/spaces/oyabun-dev/kamyvision
git push spaces main
```

### Vercel (frontend + docs)

Both the React frontend (`frontend-react/`) and the documentation (`docs/`) are deployed on Vercel. See the [Deployment docs](https://docs.kamydev.com/deploy) for full configuration.

---

## Known limitations

The 3 ViT models were primarily trained on GAN datasets. Performance is degraded on recent diffusion model outputs (Midjourney v6, Stable Diffusion XL, Flux.1). EXIF analysis partially compensates for images that retain their metadata.

---

## License

MIT