Spaces:

TrungTran
/

faceage_ClientScan

Running

App Files Files Community

faceage_ClientScan / README.md

TrungTran

Update README.md

0ec5abc verified 22 days ago

preview code

raw

history blame

3.65 kB

	---
	title: Age Estimation Demo
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	python_version: 3.12
	app_file: app.py
	license: apache-2.0
	tags:
	- age-estimation
	- gender-classification
	- face-analysis
	- vision-transformer
	- dinov3
	- coral-ordinal-regression
	pipeline_tag: image-classification
	---

	# FaceAge-DINOv3

	Age and gender estimation from face crops using DINOv3-ViT-L backbone with CORAL ordinal regression.

	## Performance (LAGENDA benchmark)

	\| Model \| MAE ↓ \| CS@5 ↑ \| Gender Acc ↑ \|
	\|-------\|--------\|--------\|-------------\|
	\| MiVOLO v2 [face+body] (paper) \| 3.650 \| 74.48% \| 97.99% \|
	\| MiVOLO v2 [face+body] (measured on the public model) \| 3.859 \| 76.5% \| — \|
	\| MiVOLO v2 [face-only] (measured on the public model) \| 3.941 \| 75.6% \| — \|
	\| FaceAge-DINOv3 (face-only) \| 3.760 \| — \| — \|

	Trained on:Our collection data.

	## Architecture

	```
	Face [B, 3, 224, 224]
	↓
	DINOv3-ViT-L/16 (307M params, pretrained on LVD-1.68B)
	↓ pooler_output
	[B, 1024]
	↓ LayerNorm → Linear(1024→512) → GELU → Dropout
	[B, 512]
	├── age_head: Linear(512, 100) → CORAL → age ∈ [0, 100]
	└── gender_head: Linear(512, 2) → softmax → {female, male}
	```

	CORAL ordinal regression: age = Σ σ(logit_k) for k=0..99. Exploits the ordinal structure of ages (25 < 26 < 27) for better calibration than standard cross-entropy.

	## Usage

	```python
	from PIL import Image
	from transformers import AutoImageProcessor, AutoModel

	processor = AutoImageProcessor.from_pretrained("trungthanhtran/faceage-dino")
	model = AutoModel.from_pretrained("trungthanhtran/faceage-dino",
	trust_remote_code=True)
	model.eval()

	# Input: 224×224 face crop (already cropped, no detection needed)
	image = Image.open("face_crop.jpg").convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	import torch
	with torch.no_grad():
	outputs = model(**inputs)

	age = outputs.age_output.item()
	gender = "male" if outputs.gender_class_idx.item() == 1 else "female"
	conf = outputs.gender_probs[0, outputs.gender_class_idx.item()].item()

	print(f"Age : {age:.1f}")
	print(f"Gender : {gender} (conf={conf:.2f})")
	```

	## ONNX (no PyTorch needed)

	The model is also available as a single-file ONNX for CPU deployment:

	```bash
	pip install onnxruntime numpy pillow
	python infer_onnx.py --onnx faceage_dino_fp32.onnx --image face.jpg
	```

	ONNX is ~3-4× faster on CPU than the PyTorch model and requires no GPU.

	## Benchmark against MiVOLO v2

	```bash
	python infer_onnx.py \
	--onnx faceage_dino_fp32.onnx \
	--lagenda_dir data/lagenda \
	--annotation_csv lagenda_test.csv \
	--batch_size 256
	```

	## Training

	Multi-phase fine-tuning on DINOv3-ViT-L:

	\| Phase \| Backbone \| LR \| Data \|
	\|-------\|----------\|-----\|------\|
	\| 1 \| Frozen (all 24 blocks) \| 1e-3 \| Our collection 786k faces \|
	\| 2 \| Top 4 blocks unfrozen \| 1e-4 \| Same \|
	\| 3 \| All blocks unfrozen \| 3e-5 \| Same \|
	\| 4 \| All blocks \| 3e-6 \| Our collection 4M faces, age reweighting \|

	Age group reweighting (Phase 4): 36-50 ×2.0, 51-65 ×1.5, 66-100 ×3.0 to improve accuracy on older faces.

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{faceage-dino-2026,
	title = {FaceAge-DINOv3: Age and Gender Estimation with DINOv3-ViT-L},
	author = {Trung Thanh Tran},
	year = {2026},
	url = {https://huggingface.co/trungthanhtran/faceage-dino}
	}
	```

	Also cite the backbones and datasets used:
	- DINOv3: Meta AI, "DINOv3: Scaling Up Vision Foundation Models", 2025
	- LAGENDA: Bhuiyan et al., 2023
	- MiVOLO: Kuprashevich & Tolstykh, arXiv 2307.04616