" π ViT Micro Facial Expression Recognition Model
This repository contains a Vision Transformer (ViT)βbased facial emotion recognition model, iteratively fine-tuned for micro- and macro-facial expression classification.
The model is adapted from mo-thecreator/vit-Facial-Expression-Recognition, a Vision Transformer pre-trained and fine-tuned for facial emotion recognition, and further optimized using a combined dataset of micro-expressions and FER-2013 facial expression data.
π Model Details
Base model: mo-thecreator/vit-Facial-Expression-Recognition
Architecture: Vision Transformer (ViT)
Task: Facial Emotion Classification
Output classes: 7 emotion categories
Final model name: vit-micro-facial-expressions
π Emotion Classes
Each facial image is classified into one of the following seven emotion categories:
Label Emotion 0 Angry 1 Disgust 2 Fear 3 Happy 4 Sad 5 Surprise 6 Neutral π Dataset
The model was trained on a combined dataset consisting of:
Micro Facial Expressions dataset
FER-2013 dataset
Dataset Statistics
Training samples: 28,709 images
Public test samples: 3,589 images
The Micro Facial Expressions dataset is publicly available on Hugging Face: π https://huggingface.co/datasets/LaurenGurgiolo/Micro_Facial_Expressions
π§ Training Methodology
The base ViT model was iteratively fine-tuned on the combined dataset.
Training focused on improving sensitivity to subtle micro-expressions while maintaining robustness on standard facial expressions.
Iterative fine-tuning enabled progressive refinement of feature representations across emotion classes.
Vision Transformers were selected due to their demonstrated superiority over convolutional neural networks (CNNs) in facial recognition tasks. Empirical studies show that ViTs outperform CNNs in both classification accuracy and generalization capability (Rodrigo et al., 2024).
π Performance
Accuracy on micro-expression test dataset: 88%
Evaluation metric: Classification Accuracy
This performance indicates strong generalization for subtle facial expression recognition, particularly in micro-expression scenarios.
πΌοΈ Training Results Snapshot
Figure 1: Micro-Expressions ViT Model (Training and evaluation metrics illustrating convergence and performance improvements across epochs.)
π Usage Example from transformers import AutoImageProcessor, AutoModelForImageClassification import torch from PIL import Image
processor = AutoImageProcessor.from_pretrained("./vit-micro-facial-expressions") model = AutoModelForImageClassification.from_pretrained("./vit-micro-facial-expressions")
image = Image.open("face.jpg") inputs = processor(images=image, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item() print(predicted_class)
β οΈ Limitations
Performance may degrade on:
Low-resolution or heavily occluded faces
Extreme head poses or lighting conditions
Emotion labels are inherently subjective and dataset-dependent.
The model is optimized for facial images and may not generalize to non-face imagery.
π License & Attribution
Base Model: mo-thecreator/vit-Facial-Expression-Recognition
Datasets: FER-2013 and Micro Facial Expressions dataset licenses apply
Please review the respective Hugging Face dataset and model licenses before commercial use.
π Acknowledgements
Hugging Face for model hosting and datasets
FER-2013 contributors
Micro Facial Expressions dataset authors
Prior research demonstrating ViT effectiveness in facial emotion recognition "
- Downloads last month
- 38