aslakey
/

camera_level

dinov2_with_registers

Model card Files Files and versions

aslakey commited on Jun 30, 2025

Commit

90b43b3

·

verified ·

1 Parent(s): 5050bda

Update README.md

Files changed (1) hide show

README.md +41 -3

README.md CHANGED Viewed

@@ -1,3 +1,41 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Camera Level
+This model predicts an image's cinematic camera level [ground, hip, shoulder, eye, aerial].  The model is a DinoV2 with registers backbone (initiated with `facebook/dinov2-with-registers-large` weights) and trained on a diverse set of five thousand human-annotated images.
+## How to use:
+```python
+import torch
+from PIL import Image
+from transformers import AutoImageProcessor
+from transformers import AutoModelForImageClassification
+image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large")
+model = AutoModelForImageClassification.from_pretrained('aslakey/camera_level')
+model.eval()
+# Model labels: [ground, hip, shoulder, eye, aerial]
+image = Image.open('cinematic_shot.jpg')
+inputs = image_processor(image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+# technically multi-label training, but argmax works too!
+predicted_label = outputs.logits.argmax(-1).item()
+print(model.config.id2label[predicted_label])
+```
+## Performance:
+| Category | Precision | Recall |
+|----------|-----------|--------|
+| ground      | 65%        | 51%     |
+| hip         | 69%       | 62%    |
+| shoulder        | 68%       | 74%    |
+| eye        | 51%       | 39%    |
+| aerial        | 89%       | 76%    |