boretsyury commited on
Commit
8b238ce
·
verified ·
1 Parent(s): 59514a2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MeFEm: Medical Face Embedding Models
2
+
3
+ Vision Transformers pre-trained on face data for potential medical applications. Available in Small (MeFEm-S) and Base (MeFEm-B) sizes.
4
+
5
+ ## Quick Start
6
+
7
+ ```python
8
+ import torch
9
+ import timm
10
+
11
+ # Load model (MeFEm-S example)
12
+ model = timm.create_model(
13
+ 'vit_small_patch16_224',
14
+ pretrained=False,
15
+ num_classes=0, # No classification head
16
+ global_pool='token' # Use CLS token (default)
17
+ )
18
+ model.load_state_dict(torch.load('mefem-s.pt'))
19
+ model.eval()
20
+
21
+ # Forward pass
22
+ x = torch.randn(1, 3, 224, 224) # Your face image
23
+ embeddings = model(x) # [1, 384] CLS token embeddings
24
+ ```
25
+
26
+ ## Model Details
27
+
28
+ - **Architecture**: ViT-Small/16 (384-dim) or ViT-Base/16 (768-dim) with CLS token
29
+ - **Training**: Modified I-JEPA on ~6.5M face images
30
+ - **Input**: Face crops with 2× expanded bounding boxes, 224×224 resolution
31
+ - **Output**: CLS token embeddings (`global_pool='token'`) or all tokens (`global_pool=''`)
32
+
33
+ ## Usage Tips
34
+
35
+ ```python
36
+ # For all tokens (CLS + patches):
37
+ model = timm.create_model('vit_small_patch16_224', num_classes=0, global_pool='')
38
+ tokens = model(x) # [1, 197, 384]
39
+
40
+ # For patch embeddings only:
41
+ tokens = model.forward_features(x)
42
+ patch_embeddings = tokens[:, 1:] # [1, 196, 384]
43
+ ```
44
+
45
+ ## Training Data
46
+
47
+ Face images from FaceCaption-15M, AVSpeech, and SHFQ datasets (~6.5M total). Images were cropped with expanded (2×) face bounding boxes.
48
+
49
+ ## Notes
50
+
51
+ - Optimized for face images with loose cropping
52
+ - Intended for representation learning and transfer to medical tasks
53
+ - Results may vary for non-face or tightly-cropped images
54
+ - More info on training and metrics [here](https://arxiv.org/pdf/2602.14672)
55
+
56
+ ## License
57
+
58
+ CC BY 4.0. Reference paper if used:
59
+ ```
60
+ @misc{borets2026mefemmedicalfaceembedding,
61
+ title={MeFEm: Medical Face Embedding model},
62
+ author={Yury Borets and Stepan Botman},
63
+ year={2026},
64
+ eprint={2602.14672},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CV},
67
+ url={https://arxiv.org/abs/2602.14672},
68
+ }
69
+ ```