bpiyush commited on
Commit
3bf1f98
Β·
verified Β·
1 Parent(s): 59e585c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -81,6 +81,58 @@ See the script at [demo_usage.py](demo_usage.py) for a quick start. You can run
81
  ```sh
82
  python demo_usage.py
83
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  OR use the snippet below:
86
 
 
81
  ```sh
82
  python demo_usage.py
83
  ```
84
+ The output should look something like this:
85
+
86
+ ```sh
87
+ ============================================================
88
+ TARA Model Demo
89
+ ============================================================
90
+
91
+ [1/6] Loading model...
92
+ [ MODEL ] Loading TARA from /work/piyush/pretrained_checkpoints/TARA/ [..............]
93
+ ### do_image_padding is set as False, images will be resized directly!
94
+ The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
95
+ Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:03<00:00, 1.05s/it]
96
+ βœ“ Model loaded successfully!
97
+ Number of parameters: 7.063B
98
+ ----------------------------------------------------------------------------------------------------
99
+
100
+ [2/6] Testing video encoding and captioning ...
101
+ βœ“ Video encoded successfully!
102
+ Video shape: torch.Size([1, 16, 3, 240, 426])
103
+ Video embedding shape: torch.Size([4096])
104
+ Video caption: A hand is seen folding a white paper on a gray carpeted floor. The paper is opened flat on the surface, and then the hand folds it in half vertically, creating a crease in the middle. The hand continues to fold the paper further, resulting in a smaller, more compact size. The background remains a consistent gray carpet throughout the video.
105
+ ----------------------------------------------------------------------------------------------------
106
+
107
+ [3/6] Testing text encoding...
108
+ βœ“ Text encoded successfully!
109
+ Text: ['someone is folding a paper', 'cutting a paper', 'someone is unfolding a paper']
110
+ Text embedding shape: torch.Size([3, 4096])
111
+
112
+ [4/6] Computing video-text similarities...
113
+ βœ“ Similarities computed!
114
+ 'someone is folding a paper': 0.5039
115
+ 'cutting a paper': 0.3022
116
+ 'someone is unfolding a paper': 0.3877
117
+ ----------------------------------------------------------------------------------------------------
118
+
119
+ [5/6] Testing negation example...
120
+ Image embedding shape: torch.Size([2, 4096])
121
+ Text query: ['an image of a cat but there is no dog in it']
122
+ Text-Image similarity: tensor([[0.2585, 0.1449]])
123
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
124
+ Text query: ['an image of a cat and a dog together']
125
+ Text-Image similarity: tensor([[0.2815, 0.4399]])
126
+ ----------------------------------------------------------------------------------------------------
127
+
128
+ [6/6] Testing composed video retrieval...
129
+ Source-Target similarity with edit: 0.6476313471794128
130
+
131
+ ============================================================
132
+ Demo completed successfully! πŸŽ‰
133
+ ============================================================
134
+ ```
135
+
136
 
137
  OR use the snippet below:
138