Spaces:
Running
Running
| title: Age Estimation Demo | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| python_version: 3.12 | |
| app_file: app.py | |
| license: apache-2.0 | |
| tags: | |
| - age-estimation | |
| - gender-classification | |
| - face-analysis | |
| - vision-transformer | |
| - dinov3 | |
| - coral-ordinal-regression | |
| pipeline_tag: image-classification | |
| # FaceAge-DINOv3 | |
| Age and gender estimation from face crops using **DINOv3-ViT-L** backbone with CORAL ordinal regression. | |
| ## Performance (LAGENDA benchmark) | |
| | Model | MAE β | CS@5 β | Gender Acc β | | |
| |-------|--------|--------|-------------| | |
| | MiVOLO v2 [face+body] (paper) | 3.650 | 74.48% | 97.99% | | |
| | MiVOLO v2 [face+body] (measured on the public model) | 3.859 | 76.5% | β | | |
| | MiVOLO v2 [face-only] (measured on the public model) | 3.941 | 75.6% | β | | |
| | **FaceAge-DINOv3 (face-only)** | **3.760** | β | β | | |
| Trained on:Our collection data. | |
| ## Architecture | |
| ``` | |
| Face [B, 3, 224, 224] | |
| β | |
| DINOv3-ViT-L/16 (307M params, pretrained on LVD-1.68B) | |
| β pooler_output | |
| [B, 1024] | |
| β LayerNorm β Linear(1024β512) β GELU β Dropout | |
| [B, 512] | |
| βββ age_head: Linear(512, 100) β CORAL β age β [0, 100] | |
| βββ gender_head: Linear(512, 2) β softmax β {female, male} | |
| ``` | |
| **CORAL ordinal regression**: age = Ξ£ Ο(logit_k) for k=0..99. Exploits the ordinal structure of ages (25 < 26 < 27) for better calibration than standard cross-entropy. | |
| ## Usage | |
| ```python | |
| from PIL import Image | |
| from transformers import AutoImageProcessor, AutoModel | |
| processor = AutoImageProcessor.from_pretrained("trungthanhtran/faceage-dino") | |
| model = AutoModel.from_pretrained("trungthanhtran/faceage-dino", | |
| trust_remote_code=True) | |
| model.eval() | |
| # Input: 224Γ224 face crop (already cropped, no detection needed) | |
| image = Image.open("face_crop.jpg").convert("RGB") | |
| inputs = processor(images=image, return_tensors="pt") | |
| import torch | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| age = outputs.age_output.item() | |
| gender = "male" if outputs.gender_class_idx.item() == 1 else "female" | |
| conf = outputs.gender_probs[0, outputs.gender_class_idx.item()].item() | |
| print(f"Age : {age:.1f}") | |
| print(f"Gender : {gender} (conf={conf:.2f})") | |
| ``` | |
| ## ONNX (no PyTorch needed) | |
| The model is also available as a single-file ONNX for CPU deployment: | |
| ```bash | |
| pip install onnxruntime numpy pillow | |
| python infer_onnx.py --onnx faceage_dino_fp32.onnx --image face.jpg | |
| ``` | |
| ONNX is ~3-4Γ faster on CPU than the PyTorch model and requires no GPU. | |
| ## Benchmark against MiVOLO v2 | |
| ```bash | |
| python infer_onnx.py \ | |
| --onnx faceage_dino_fp32.onnx \ | |
| --lagenda_dir data/lagenda \ | |
| --annotation_csv lagenda_test.csv \ | |
| --batch_size 256 | |
| ``` | |
| ## Training | |
| Multi-phase fine-tuning on DINOv3-ViT-L: | |
| | Phase | Backbone | LR | Data | | |
| |-------|----------|-----|------| | |
| | 1 | Frozen (all 24 blocks) | 1e-3 | Our collection 786k faces | | |
| | 2 | Top 4 blocks unfrozen | 1e-4 | Same | | |
| | 3 | All blocks unfrozen | 3e-5 | Same | | |
| | 4 | All blocks | 3e-6 | Our collection 4M faces, age reweighting | | |
| Age group reweighting (Phase 4): 36-50 Γ2.0, 51-65 Γ1.5, 66-100 Γ3.0 to improve accuracy on older faces. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{faceage-dino-2026, | |
| title = {FaceAge-DINOv3: Age and Gender Estimation with DINOv3-ViT-L}, | |
| author = {Trung Thanh Tran}, | |
| year = {2026}, | |
| url = {https://huggingface.co/trungthanhtran/faceage-dino} | |
| } | |
| ``` | |
| Also cite the backbones and datasets used: | |
| - DINOv3: Meta AI, "DINOv3: Scaling Up Vision Foundation Models", 2025 | |
| - LAGENDA: Bhuiyan et al., 2023 | |
| - MiVOLO: Kuprashevich & Tolstykh, arXiv 2307.04616 | |