aslakey commited on
Commit
90b43b3
·
verified ·
1 Parent(s): 5050bda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Camera Level
6
+
7
+ This model predicts an image's cinematic camera level [ground, hip, shoulder, eye, aerial]. The model is a DinoV2 with registers backbone (initiated with `facebook/dinov2-with-registers-large` weights) and trained on a diverse set of five thousand human-annotated images.
8
+
9
+ ## How to use:
10
+ ```python
11
+
12
+ import torch
13
+ from PIL import Image
14
+ from transformers import AutoImageProcessor
15
+ from transformers import AutoModelForImageClassification
16
+
17
+ image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large")
18
+ model = AutoModelForImageClassification.from_pretrained('aslakey/camera_level')
19
+ model.eval()
20
+
21
+ # Model labels: [ground, hip, shoulder, eye, aerial]
22
+ image = Image.open('cinematic_shot.jpg')
23
+ inputs = image_processor(image, return_tensors="pt")
24
+ with torch.no_grad():
25
+ outputs = model(**inputs)
26
+
27
+ # technically multi-label training, but argmax works too!
28
+ predicted_label = outputs.logits.argmax(-1).item()
29
+ print(model.config.id2label[predicted_label])
30
+ ```
31
+
32
+ ## Performance:
33
+
34
+
35
+ | Category | Precision | Recall |
36
+ |----------|-----------|--------|
37
+ | ground | 65% | 51% |
38
+ | hip | 69% | 62% |
39
+ | shoulder | 68% | 74% |
40
+ | eye | 51% | 39% |
41
+ | aerial | 89% | 76% |