VAD-to-Blendshape: Emotion-Driven Facial Animation

A lightweight PyTorch MLP model that maps continuous VAD (Valence-Arousal-Dominance) emotional values to 52 ARKit blendshape coefficients for real-time facial expression generation.

  • Input: 3-dim VAD vector [-1, 1]
  • Output: 52-dim ARKit blendshape weights [0, 1]
  • Model size: 279K parameters
  • Dataset: Emo3D (16.7K images Γ— 2-4 blendshape variants = 40K+ training pairs)

Quick Start

1. Install dependencies

pip install torch numpy

2. Download model

from huggingface_hub import hf_hub_download

checkpoint = hf_hub_download(
    repo_id="karie666666/vad-to-blendshape",
    filename="best_model.pt"
)
metadata = hf_hub_download(
    repo_id="karie666666/vad-to-blendshape",
    filename="model_metadata.json"
)

3. Inference

python inference.py --checkpoint best_model.pt --vad 0.8 0.6 0.5
python inference.py --checkpoint best_model.pt --emotion happiness --intensity 0.9
python inference.py --checkpoint best_model.pt --emotion sadness+fear --intensity 0.8

Python API:

from inference import load_model, predict, emotion_to_vad
import numpy as np

model, meta = load_model("best_model.pt")

# Direct VAD
vad = np.array([0.8, 0.6, 0.5], dtype=np.float32)  # happiness
blendshape = predict(model, vad)
print(blendshape.shape)  # (52,)

# From emotion name
vad = emotion_to_vad("anger", intensity=0.9)
blendshape = predict(model, vad)

Model Architecture

Linear(3, 256)  β†’ LayerNorm β†’ LeakyReLU β†’ Dropout
Linear(256, 512) β†’ LayerNorm β†’ LeakyReLU β†’ Dropout
Linear(512, 256) β†’ LayerNorm β†’ LeakyReLU β†’ Dropout
Linear(256, 52)

Total params: 279,348


Training

  • Loss: Smooth L1 (Huber) + L1 sparsity regularization
  • Optimizer: AdamW, lr=1e-3, weight_decay=1e-4
  • Scheduler: CosineAnnealingLR, 100 epochs
  • Best val loss: 0.04507 (MSE ~0.018)

VAD Mapping (Basic Emotions)

Emotion Valence Arousal Dominance
neutral 0.00 0.00 0.00
happiness 0.80 0.60 0.50
surprise 0.30 0.90 0.20
sadness -0.80 -0.40 -0.30
anger -0.70 0.80 0.70
disgust -0.60 0.30 0.40
fear -0.70 0.80 -0.30
contempt -0.40 0.30 0.80

Mixed emotions supported via emotion1+emotion2 syntax.


ARKit Blendshape Output

The model outputs 52 blendshape weights in the standard ARKit order:

0:  browDownLeft          26: mouthClose
1:  browDownRight         27: mouthDimpleLeft
2:  browInnerUp           28: mouthDimpleRight
3:  browOuterUpLeft       29: mouthFrownLeft
4:  browOuterUpRight      30: mouthFrownRight
5:  cheekPuff             31: mouthFunnel
6:  cheekSquintLeft       32: mouthLeft
7:  cheekSquintRight      33: mouthLowerDownLeft
8:  eyeBlinkLeft          34: mouthLowerDownRight
9:  eyeBlinkRight         35: mouthPressLeft
10: eyeLookDownLeft       36: mouthPressRight
11: eyeLookDownRight      37: mouthPucker
12: eyeLookInLeft         38: mouthRight
13: eyeLookInRight        39: mouthRollLower
14: eyeLookOutLeft        40: mouthRollUpper
15: eyeLookOutRight       41: mouthShrugLower
16: eyeLookUpLeft         42: mouthShrugUpper
17: eyeLookUpRight        43: mouthSmileLeft
18: eyeSquintLeft         44: mouthSmileRight
19: eyeSquintRight        45: mouthStretchLeft
20: eyeWideLeft           46: mouthStretchRight
21: eyeWideRight          47: mouthUpperUpLeft
22: jawForward            48: mouthUpperUpRight
23: jawLeft               49: noseSneerLeft
24: jawOpen               50: noseSneerRight
25: jawRight              51: tongueOut

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'karie666666/vad-to-blendshape'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support