ohgnues commited on
Commit
e4fdf04
·
1 Parent(s): 98c97d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -1,3 +1,83 @@
1
  # Usage
2
 
3
- [In this Repo](https://github.com/oh-gnues-iohc/multi-modal-retrieval)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Usage
2
 
3
+ [In this Repo](https://github.com/oh-gnues-iohc/multi-modal-retrieval)
4
+
5
+ # multi-modal-retrieval
6
+
7
+ This repository contains code for multi modal retrieval
8
+
9
+ This project involves implementing a multi-modal Bi-encoder using both ResNet and BERT for image and text representations.
10
+
11
+ # Data
12
+
13
+ ## Sample Data
14
+
15
+ The pretraining was conducted using the dataset from Hugging Face's ["poloclub/diffusiondb"](https://huggingface.co/datasets/poloclub/diffusiondb) dataset.
16
+
17
+ I used 50k randomly sampled images and prompts for my project.
18
+
19
+ If you want to use a different dataset, follow the steps below
20
+
21
+ ## Data Format
22
+
23
+ Only images and the corresponding text for those images are necessary, and other elements are irrelevant. In this case, the text can serve as prompts or captions for the images.
24
+
25
+ You specify the names of the columns for images and text in the training command.
26
+
27
+ ```bash
28
+ python3 train.py --text_column_name text --image_column_name img
29
+ ```
30
+
31
+ # Pretrained models
32
+
33
+ Pretrained models can be downloaded [huggingface](https://huggingface.co/ohgnues/ImageTextRetrieval) or Specify the model name "ohgnues/ImageTextRetrieval" in the training command.
34
+
35
+ ```bash
36
+ python3 train.py --pretrained_model_name_or_path ohgnues/ImageTextRetrieval
37
+ ```
38
+
39
+ The model "ohgnues/ImageTextRetrieval" was trained for 10 epochs using a Tesla P100 GPU.
40
+
41
+ # Usage
42
+
43
+ ## Train
44
+
45
+ ```bash
46
+ python3 train.py --name 2m_random_50k --cache_dir /data/.cache --max_length 100 --num_train_epochs 10
47
+ ```
48
+ For detailed instructions, please refer to the official Hugging Face documentation or consult the dataclass within the "train.py" script.
49
+
50
+ ## Encode
51
+ ```python
52
+ def encode(self, model_name: Literal["text", "image"],
53
+ input_ids: Optional[torch.Tensor] = None,
54
+ attention_mask: Optional[torch.Tensor] = None,
55
+ token_type_ids: Optional[torch.Tensor] = None,
56
+ position_ids: Optional[torch.Tensor] = None,
57
+ head_mask: Optional[torch.Tensor] = None,
58
+ inputs_embeds: Optional[torch.Tensor] = None,
59
+ output_attentions: Optional[bool] = None,
60
+ output_hidden_states: Optional[bool] = None,
61
+ return_dict: Optional[bool] = None,
62
+ pixel_values: Tensor = None
63
+ ):
64
+
65
+ if model_name == "text":
66
+ return self.text_encoder(
67
+ input_ids,
68
+ attention_mask=attention_mask,
69
+ token_type_ids=token_type_ids,
70
+ position_ids=position_ids,
71
+ head_mask=head_mask,
72
+ inputs_embeds=inputs_embeds,
73
+ output_attentions=output_attentions,
74
+ output_hidden_states=output_hidden_states,
75
+ return_dict=return_dict,
76
+ ).last_hidden_state[:, 0, :]
77
+
78
+ elif model_name == "image":
79
+ return self.image_encoder(
80
+ pixel_values=pixel_values,
81
+ output_hidden_states=output_hidden_states,
82
+ ).pooler_output[:, :, 0, 0]
83
+ ```