--- license: mit tags: - face-swap - face-detection - face-parsing - face-mask - person-detection - tensorrt - morphstream --- # MorphStream Models Models and TensorRT engine cache for real-time face processing used by [MorphStream](https://morphstream.ai) GPU Worker. **Private repository** — requires access token for downloads. ## Structure ``` / ├── models/ # ONNX models (active) │ ├── buffalo_l/ │ │ ├── det_10g.onnx # SCRFD face detection (16 MB) │ │ └── w600k_r50.onnx # ArcFace recognition (166 MB) │ ├── fan_68_5.onnx # 5→68 landmark refinement (1 MB) │ ├── 2dfan4.onnx # 2DFAN4 68-point landmarks (93 MB) │ ├── inswapper_128.onnx # InSwapper FP32 (529 MB) │ ├── inswapper_128_fp16.onnx # InSwapper FP16 — default (265 MB) │ ├── hyperswap_1a_256.onnx # HyperSwap variant A (384 MB) │ ├── hyperswap_1b_256.onnx # HyperSwap variant B (384 MB) │ ├── hyperswap_1c_256.onnx # HyperSwap variant C (384 MB) │ ├── xseg_1.onnx # XSeg occlusion mask 1 (67 MB) │ ├── xseg_2.onnx # XSeg occlusion mask 2 (67 MB) │ ├── xseg_3.onnx # XSeg occlusion mask 3 (67 MB) │ ├── bisenet_resnet_34.onnx # BiSeNet face parsing (89 MB) │ ├── bisenet_resnet_18.onnx # BiSeNet face parsing (51 MB) │ └── yolov8n.onnx # Person detection (12 MB) ├── deploy/ # Hot-deploy code archives │ ├── develop/app_code.tar.zst # develop branch │ └── latest/app_code.tar.zst # production (main) ├── archives/ # Baked archives for Docker image │ ├── models-core-masks.tar.zst # Core+mask+yolov8n models (~584 MB) │ └── trt-cache-sm89.tar.zst # TRT engines for sm89 (~2.7 GB) ├── trt_cache/sm89/ # TRT engine cache (per GPU arch) │ └── trt10.14_ort1.24/ # ORT 1.24 + TRT 10.14 │ ├── manifest.json │ ├── *.engine # Compiled TRT engines │ ├── *.profile # TRT optimization profiles │ └── *.timing # Kernel autotuning cache └── gfpgan/ # Face restoration (not used in real-time) ``` ## Models ### Face Swap | Model | Size | Input | TRT FP16 | Notes | |-------|------|-------|----------|-------| | `inswapper_128_fp16.onnx` | 265 MB | 128px | No (FP32 TRT) | **Default** preset | | `inswapper_128.onnx` | 529 MB | 128px | No (FP32 TRT) | Standard quality | | `hyperswap_1a_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality A | | `hyperswap_1b_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality B | | `hyperswap_1c_256.onnx` | 384 MB | 256px | No (FP32 TRT) | High quality C | Swap models compiled with `trt_fp16_enable=False` — FP16 causes pixel artifacts. ### Face Detection & Recognition (core) | Model | GPU Worker Class | Size | Input | TRT FP16 | |-------|-----------------|------|-------|----------| | `buffalo_l/det_10g.onnx` | `DirectSCRFD` | 16 MB | 320px | Yes | | `buffalo_l/w600k_r50.onnx` | `DirectArcFace` | 166 MB | 112px | Yes | | `fan_68_5.onnx` | `DirectFan685` | 1 MB | (1,5,2) coords | Yes | | `2dfan4.onnx` | `Landmark68Detector` | 93 MB | 256px | Yes | ### Face Masks | Model | Type | Size | Input | TRT FP16 | |-------|------|------|-------|----------| | `xseg_1/2/3.onnx` | Occlusion | 67 MB each | 256px NHWC | No (FP32) | | `bisenet_resnet_34.onnx` | Region parsing | 89 MB | 512px NCHW | No (FP32) | | `bisenet_resnet_18.onnx` | Region parsing | 51 MB | 512px NCHW | No (FP32) | ### Person Detection | Model | Size | Input | TRT FP16 | |-------|------|-------|----------| | `yolov8n.onnx` | 12 MB | 640px | Yes | ## Docker Baking Models are split into two groups: - **Baked** (in Docker image): core + masks + yolov8n (10 models, ~630 MB) via `archives/models-core-masks.tar.zst` - **Per-stream download**: swap models (5 models) — downloaded on demand by `ModelDownloadService` ```bash # Rebuild models archive bash scripts/pack_models.sh --upload ``` ## TensorRT Engine Cache Pre-compiled TRT engines eliminate cold-start compilation (~180-300s → ~10-30s download). ### Cache Key Format: `{gpu_arch}/trt{trt_version}_ort{ort_version}` Example: `sm89/trt10.14_ort1.24` (RTX 4090, ORT 1.24, TRT 10.14) ### manifest.json (format v2) ```json { "cache_key": "sm89/trt10.14_ort1.24", "format_version": 2, "gpu_arch": "sm89", "trt_version": "10.14", "ort_version": "1.24", "engine_files": { "TensorrtExecutionProvider_TRTKernel_*.engine": { "group": "core", "onnx_model": "det_10g" } } } ``` Engine groups: `core`, `masks`, `inswapper_128`, `inswapper_128_fp16`, `hyperswap_1a/1b/1c_256`, `yolov8n`, `shared` (.timing). ### Lifecycle 1. **Download** — at boot, GPU Worker downloads engines matching cache key from HF 2. **Compile** — if no cache, ORT compiles TRT engines from ONNX on first load 3. **Upload** — after compilation, engines uploaded to HF with manifest merge (preserves other groups) 4. **Selective recompile** — admin UI selects model groups for recompile; manifest merges new engines with existing HF entries 5. **Cleanup** — manifest-driven: stale engines (not in manifest) auto-deleted from HF during upload ### Rebuild TRT Archive ```bash # From local HF repo clone bash scripts/pack_trt_cache.sh # auto-detect latest version bash scripts/pack_trt_cache.sh sm89 trt10.14_ort1.24 # explicit bash scripts/pack_trt_cache.sh --upload # pack + upload to HF ``` ## Hot Deploy Code updates without Docker rebuild: ```bash bash scripts/deploy_code.sh # deploy to develop DEPLOY_TAGS="latest" bash scripts/deploy_code.sh # deploy to production ``` Uploaded to `deploy/{tag}/app_code.tar.zst`. GPU Worker downloads at boot via `entrypoint.sh`. ## License MIT License