---
license: mit
tags:
  - face-swap
  - face-detection
  - face-parsing
  - face-mask
  - person-detection
  - tensorrt
  - morphstream
---

# MorphStream Models

Models and TensorRT engine cache for real-time face processing used by [MorphStream](https://morphstream.ai) GPU Worker.

**Private repository** — requires access token for downloads.

## Structure

```
/
├── models/                          # ONNX models (active)
│   ├── buffalo_l/
│   │   ├── det_10g.onnx             # SCRFD face detection (16 MB)
│   │   └── w600k_r50.onnx           # ArcFace recognition (166 MB)
│   ├── fan_68_5.onnx                # 5→68 landmark refinement (1 MB)
│   ├── 2dfan4.onnx                  # 2DFAN4 68-point landmarks (93 MB)
│   ├── inswapper_128.onnx           # InSwapper FP32 (529 MB)
│   ├── inswapper_128_fp16.onnx      # InSwapper FP16 — default (265 MB)
│   ├── hyperswap_1a_256.onnx        # HyperSwap variant A (384 MB)
│   ├── hyperswap_1b_256.onnx        # HyperSwap variant B (384 MB)
│   ├── hyperswap_1c_256.onnx        # HyperSwap variant C (384 MB)
│   ├── xseg_1.onnx                  # XSeg occlusion mask 1 (67 MB)
│   ├── xseg_2.onnx                  # XSeg occlusion mask 2 (67 MB)
│   ├── xseg_3.onnx                  # XSeg occlusion mask 3 (67 MB)
│   ├── bisenet_resnet_34.onnx       # BiSeNet face parsing (89 MB)
│   ├── bisenet_resnet_18.onnx       # BiSeNet face parsing (51 MB)
│   └── yolov8n.onnx                 # Person detection (12 MB)
├── deploy/                          # Hot-deploy code archives
│   ├── develop/app_code.tar.zst     # develop branch
│   └── latest/app_code.tar.zst      # production (main)
├── archives/                        # Baked archives for Docker image
│   ├── models-core-masks.tar.zst    # Core+mask+yolov8n models (~584 MB)
│   └── trt-cache-sm89.tar.zst       # TRT engines for sm89 (~2.7 GB)
├── trt_cache/sm89/                  # TRT engine cache (per GPU arch)
│   └── trt10.14_ort1.24/            # ORT 1.24 + TRT 10.14
│       ├── manifest.json
│       ├── *.engine                  # Compiled TRT engines
│       ├── *.profile                # TRT optimization profiles
│       └── *.timing                 # Kernel autotuning cache
└── gfpgan/                          # Face restoration (not used in real-time)
```

## Models

### Face Swap

| Model | Size | Input | TRT FP16 | Notes |
|-------|------|-------|----------|-------|
| `inswapper_128_fp16.onnx` | 265 MB | 128px | No (FP32 TRT) | **Default** preset |
| `inswapper_128.onnx` | 529 MB | 128px | No (FP32 TRT) | Standard quality |
| `hyperswap_1a_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality A |
| `hyperswap_1b_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality B |
| `hyperswap_1c_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality C |

Swap models compiled with `trt_fp16_enable=False` — FP16 causes pixel artifacts.

### Face Detection & Recognition (core)

| Model | GPU Worker Class | Size | Input | TRT FP16 |
|-------|-----------------|------|-------|----------|
| `buffalo_l/det_10g.onnx` | `DirectSCRFD` | 16 MB | 320px | Yes |
| `buffalo_l/w600k_r50.onnx` | `DirectArcFace` | 166 MB | 112px | Yes |
| `fan_68_5.onnx` | `DirectFan685` | 1 MB | (1,5,2) coords | Yes |
| `2dfan4.onnx` | `Landmark68Detector` | 93 MB | 256px | Yes |

### Face Masks

| Model | Type | Size | Input | TRT FP16 |
|-------|------|------|-------|----------|
| `xseg_1/2/3.onnx` | Occlusion | 67 MB each | 256px NHWC | No (FP32) |
| `bisenet_resnet_34.onnx` | Region parsing | 89 MB | 512px NCHW | No (FP32) |
| `bisenet_resnet_18.onnx` | Region parsing | 51 MB | 512px NCHW | No (FP32) |

### Person Detection

| Model | Size | Input | TRT FP16 |
|-------|------|-------|----------|
| `yolov8n.onnx` | 12 MB | 640px | Yes |

## Docker Baking

Models are split into two groups:

- **Baked** (in Docker image): core + masks + yolov8n (10 models, ~630 MB) via `archives/models-core-masks.tar.zst`
- **Per-stream download**: swap models (5 models) — downloaded on demand by `ModelDownloadService`

```bash
# Rebuild models archive
bash scripts/pack_models.sh --upload
```

## TensorRT Engine Cache

Pre-compiled TRT engines eliminate cold-start compilation (~180-300s → ~10-30s download).

### Cache Key

Format: `{gpu_arch}/trt{trt_version}_ort{ort_version}`

Example: `sm89/trt10.14_ort1.24` (RTX 4090, ORT 1.24, TRT 10.14)

### manifest.json (format v2)

```json
{
  "cache_key": "sm89/trt10.14_ort1.24",
  "format_version": 2,
  "gpu_arch": "sm89",
  "trt_version": "10.14",
  "ort_version": "1.24",
  "engine_files": {
    "TensorrtExecutionProvider_TRTKernel_*.engine": {
      "group": "core",
      "onnx_model": "det_10g"
    }
  }
}
```

Engine groups: `core`, `masks`, `inswapper_128`, `inswapper_128_fp16`, `hyperswap_1a/1b/1c_256`, `yolov8n`, `shared` (.timing).

### Lifecycle

1. **Download** — at boot, GPU Worker downloads engines matching cache key from HF
2. **Compile** — if no cache, ORT compiles TRT engines from ONNX on first load
3. **Upload** — after compilation, engines uploaded to HF with manifest merge (preserves other groups)
4. **Selective recompile** — admin UI selects model groups for recompile; manifest merges new engines with existing HF entries
5. **Cleanup** — manifest-driven: stale engines (not in manifest) auto-deleted from HF during upload

### Rebuild TRT Archive

```bash
# From local HF repo clone
bash scripts/pack_trt_cache.sh                           # auto-detect latest version
bash scripts/pack_trt_cache.sh sm89 trt10.14_ort1.24     # explicit
bash scripts/pack_trt_cache.sh --upload                  # pack + upload to HF
```

## Hot Deploy

Code updates without Docker rebuild:

```bash
bash scripts/deploy_code.sh           # deploy to develop
DEPLOY_TAGS="latest" bash scripts/deploy_code.sh  # deploy to production
```

Uploaded to `deploy/{tag}/app_code.tar.zst`. GPU Worker downloads at boot via `entrypoint.sh`.

## License

MIT License