metadata
title: C-MET CPU
emoji: 🎭
colorFrom: green
colorTo: red
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
tags:
- talking-face
- emotion-transfer
- cpu
- onnx
- mcp-server
short_description: C-MET Emotion Transfer - CPU (ONNX + JIT)
models:
- coldhyuk/C-MET
C-MET: Cross-Modal Emotion Transfer (CPU)
Talking-face emotion editing from text, audio, or video prompts. CPU-optimized port of C-MET.
Real benchmarks on HF cpu-basic (2 vCPU, 7s video, 174 frames)
| Stage | Time | Method |
|---|---|---|
| audio2lip | 21.5s | ONNX FP32 |
| compute_alpha_D | 137.5s | PyTorch (StyleGAN QR op blocks ONNX) |
| connector | 0.4s | ONNX FP32 |
| render (174 frames) | 438.7s (2.5s/frame) | PyTorch eager |
| video_prep + encode | 4.5s | ffmpeg |
| Total | ~10 min |
ONNX artifacts are built on first startup (~1 min), cached in ./artifacts/.
API -- Python Client
from gradio_client import Client, handle_file
client = Client("WeReCooking/C-MET-CPU")
result = client.predict(
source_image=handle_file("face.png"),
audio=handle_file("voice.wav"),
pose=handle_file("pose.mp4"),
emotion="happy", # angry contempt disgusted fear happy sad surprised
# charismatic desirous empathetic envious romantic sarcastic
intensity=3,
seed=42,
api_name="/infer",
)
output_path, timings_str = result
print(timings_str)
MCP
{
"mcpServers": {
"cmet": {"url": "https://werecooking-c-met-cpu.hf.space/gradio_api/mcp/"}
}
}
CLI
python app.py infer \
-s asset/identity/ChatGPT_man3_crop.png \
-a asset/audio/W009_038.wav \
-v asset/video/W009_038.mp4 \
-e happy \
-o output.mp4
All 13 emotions: angry contempt disgusted fear happy sad surprised charismatic desirous empathetic envious romantic sarcastic
Web UI
- Upload source face (256x256 crop)
- Upload driving audio (.wav)
- Upload pose driving video (25 fps, face crop)
- Pick target emotion
- Click Generate -> wait -> get video with timing breakdown
Credits
Based on C-MET (CVPR 2026). Original ZeroGPU demo: coldhyuk/C-MET.