VAD-to-Blendshape: Emotion-Driven Facial Animation
A lightweight PyTorch MLP model that maps continuous VAD (Valence-Arousal-Dominance) emotional values to 52 ARKit blendshape coefficients for real-time facial expression generation.
- Input: 3-dim VAD vector
[-1, 1] - Output: 52-dim ARKit blendshape weights
[0, 1] - Model size: 279K parameters
- Dataset: Emo3D (16.7K images Γ 2-4 blendshape variants = 40K+ training pairs)
Quick Start
1. Install dependencies
pip install torch numpy
2. Download model
from huggingface_hub import hf_hub_download
checkpoint = hf_hub_download(
repo_id="karie666666/vad-to-blendshape",
filename="best_model.pt"
)
metadata = hf_hub_download(
repo_id="karie666666/vad-to-blendshape",
filename="model_metadata.json"
)
3. Inference
python inference.py --checkpoint best_model.pt --vad 0.8 0.6 0.5
python inference.py --checkpoint best_model.pt --emotion happiness --intensity 0.9
python inference.py --checkpoint best_model.pt --emotion sadness+fear --intensity 0.8
Python API:
from inference import load_model, predict, emotion_to_vad
import numpy as np
model, meta = load_model("best_model.pt")
# Direct VAD
vad = np.array([0.8, 0.6, 0.5], dtype=np.float32) # happiness
blendshape = predict(model, vad)
print(blendshape.shape) # (52,)
# From emotion name
vad = emotion_to_vad("anger", intensity=0.9)
blendshape = predict(model, vad)
Model Architecture
Linear(3, 256) β LayerNorm β LeakyReLU β Dropout
Linear(256, 512) β LayerNorm β LeakyReLU β Dropout
Linear(512, 256) β LayerNorm β LeakyReLU β Dropout
Linear(256, 52)
Total params: 279,348
Training
- Loss: Smooth L1 (Huber) + L1 sparsity regularization
- Optimizer: AdamW, lr=1e-3, weight_decay=1e-4
- Scheduler: CosineAnnealingLR, 100 epochs
- Best val loss: 0.04507 (MSE ~0.018)
VAD Mapping (Basic Emotions)
| Emotion | Valence | Arousal | Dominance |
|---|---|---|---|
| neutral | 0.00 | 0.00 | 0.00 |
| happiness | 0.80 | 0.60 | 0.50 |
| surprise | 0.30 | 0.90 | 0.20 |
| sadness | -0.80 | -0.40 | -0.30 |
| anger | -0.70 | 0.80 | 0.70 |
| disgust | -0.60 | 0.30 | 0.40 |
| fear | -0.70 | 0.80 | -0.30 |
| contempt | -0.40 | 0.30 | 0.80 |
Mixed emotions supported via emotion1+emotion2 syntax.
ARKit Blendshape Output
The model outputs 52 blendshape weights in the standard ARKit order:
0: browDownLeft 26: mouthClose
1: browDownRight 27: mouthDimpleLeft
2: browInnerUp 28: mouthDimpleRight
3: browOuterUpLeft 29: mouthFrownLeft
4: browOuterUpRight 30: mouthFrownRight
5: cheekPuff 31: mouthFunnel
6: cheekSquintLeft 32: mouthLeft
7: cheekSquintRight 33: mouthLowerDownLeft
8: eyeBlinkLeft 34: mouthLowerDownRight
9: eyeBlinkRight 35: mouthPressLeft
10: eyeLookDownLeft 36: mouthPressRight
11: eyeLookDownRight 37: mouthPucker
12: eyeLookInLeft 38: mouthRight
13: eyeLookInRight 39: mouthRollLower
14: eyeLookOutLeft 40: mouthRollUpper
15: eyeLookOutRight 41: mouthShrugLower
16: eyeLookUpLeft 42: mouthShrugUpper
17: eyeLookUpRight 43: mouthSmileLeft
18: eyeSquintLeft 44: mouthSmileRight
19: eyeSquintRight 45: mouthStretchLeft
20: eyeWideLeft 46: mouthStretchRight
21: eyeWideRight 47: mouthUpperUpLeft
22: jawForward 48: mouthUpperUpRight
23: jawLeft 49: noseSneerLeft
24: jawOpen 50: noseSneerRight
25: jawRight 51: tongueOut
License
MIT
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'karie666666/vad-to-blendshape'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support