ayh015 commited on
Commit
271210d
·
1 Parent(s): 73df34b

Update READEME File

Browse files
Files changed (1) hide show
  1. README.md +191 -2
README.md CHANGED
@@ -76,7 +76,188 @@ CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node
76
  bash scripts/annotate_hico.sh
77
  ```
78
 
79
- ### D. Annotation format
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  A list of dict that contains the following keys:
81
  ```
82
  {
@@ -94,6 +275,14 @@ A list of dict that contains the following keys:
94
  }
95
  ```
96
 
 
 
 
 
 
 
 
 
97
 
98
  ## Annotate COCO
99
  1. Download COCO dataset.
@@ -167,4 +356,4 @@ A list of dict that contains the following keys:
167
  'human_bbox': [126, 258, 150, 305],
168
  'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
169
  }
170
- ```
 
76
  bash scripts/annotate_hico.sh
77
  ```
78
 
79
+ ### D. Multi-stage HICO pipeline
80
+ The repository now supports a 3-stage HICO workflow:
81
+
82
+ 1. Long description generation
83
+ 2. Description refinement
84
+ 3. Description examination / checking
85
+
86
+ Each stage writes per-rank JSON files first, then merges them into one JSON file for the next stage.
87
+
88
+ #### Stage 1. Generate long descriptions
89
+ This is the original HICO annotation stage. It uses `Conversation` in `data/convsersation.py`.
90
+
91
+ Run:
92
+ ```
93
+ bash scripts/annotate_hico.sh
94
+ ```
95
+
96
+ This creates per-rank files such as:
97
+ ```
98
+ outputs/labels_0.json
99
+ outputs/labels_1.json
100
+ ```
101
+
102
+ Merge them with:
103
+ ```
104
+ python3 tools/merge_json_outputs.py \
105
+ --input-dir outputs \
106
+ --pattern "labels_*.json" \
107
+ --output-path outputs/merged_labels.json
108
+ ```
109
+
110
+ #### Stage 2. Refine generated descriptions
111
+ This stage reads a merged JSON from Stage 1 and adds a `refined_description` field. It uses `Conversation_For_Clean_Descrption` in `data/convsersation.py`.
112
+
113
+ Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/refine_hico.sh`, then run:
114
+ ```
115
+ bash scripts/refine_hico.sh
116
+ ```
117
+
118
+ This creates files such as:
119
+ ```
120
+ outputs/refine/refine_labels_0.json
121
+ ```
122
+
123
+ Merge them with:
124
+ ```
125
+ python3 tools/merge_json_outputs.py \
126
+ --input-dir outputs/refine \
127
+ --pattern "refine_labels_*.json" \
128
+ --output-path outputs/merged_refine.json
129
+ ```
130
+
131
+ #### Stage 3. Examine / check generated descriptions
132
+ This stage reads a merged JSON from Stage 2 and adds an `examiner_result` field. It uses `Conversation_examiner` in `data/convsersation.py`.
133
+
134
+ Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/examine_hico.sh`, then run:
135
+ ```
136
+ bash scripts/examine_hico.sh
137
+ ```
138
+
139
+ This creates files such as:
140
+ ```
141
+ outputs/examiner/examiner_labels_0.json
142
+ ```
143
+
144
+ Merge them with:
145
+ ```
146
+ python3 tools/merge_json_outputs.py \
147
+ --input-dir outputs/examiner \
148
+ --pattern "examiner_labels_*.json" \
149
+ --output-path outputs/merged_examine.json
150
+ ```
151
+
152
+ #### One-shot pipeline
153
+ If you want to run all 3 stages end-to-end, use:
154
+ ```
155
+ bash scripts/pipeline_hico.sh
156
+ ```
157
+
158
+ Before running it, edit the following variables in `scripts/pipeline_hico.sh`:
159
+
160
+ - `DATA_PATH`
161
+ - `LONG_MODEL_PATH`
162
+ - `REFINE_MODEL_PATH`
163
+ - `EXAMINE_MODEL_PATH`
164
+ - `LONG_GPU_IDS`
165
+ - `REFINE_GPU_IDS`
166
+ - `EXAMINE_GPU_IDS`
167
+ - `LONG_NPROC`
168
+ - `REFINE_NPROC`
169
+ - `EXAMINE_NPROC`
170
+
171
+ The pipeline will produce:
172
+
173
+ - `outputs/pipeline/merged_long.json`
174
+ - `outputs/pipeline/merged_refine.json`
175
+ - `outputs/pipeline/merged_examine.json`
176
+
177
+ ### E. Using different VLM backends
178
+ The HICO scripts are no longer hardcoded to Qwen only. The model loading logic is centralized in `tools/vlm_backend.py`, so you can use different VLM families for long-description generation, refinement, and examination.
179
+
180
+ The following scripts support backend selection:
181
+
182
+ - `tools/annotate_hico.py`
183
+ - `tools/refine_hico.py`
184
+ - `tools/examine_hico.py`
185
+ - `tools/clean_initial_annotation.py`
186
+
187
+ Each of them accepts:
188
+
189
+ - `--model-path`
190
+ - `--model-backend`
191
+ - `--torch-dtype`
192
+
193
+ Examples:
194
+ ```
195
+ torchrun --nnodes=1 --nproc_per_node=1 tools/annotate_hico.py \
196
+ --model-path /path/to/model \
197
+ --model-backend auto \
198
+ --torch-dtype bfloat16 \
199
+ --data-path ../datasets/HICO-Det \
200
+ --output-dir outputs/test \
201
+ --max-samples 5
202
+ ```
203
+
204
+ You may also force a backend explicitly, for example:
205
+ ```
206
+ --model-backend qwen3_vl
207
+ --model-backend qwen3_vl_moe
208
+ --model-backend llava
209
+ --model-backend deepseek_vl
210
+ --model-backend hf_vision2seq
211
+ --model-backend hf_causal_vlm
212
+ ```
213
+
214
+ #### Where to customize for a new model
215
+ If you want to adapt the repository to a new model family, the main file to edit is:
216
+
217
+ - `tools/vlm_backend.py`
218
+
219
+ This file controls:
220
+
221
+ - backend detection: `infer_model_backend(...)`
222
+ - model/processor loading: `load_model_and_processor(...)`
223
+ - prompt/image packaging: `build_batch_tensors(...)`
224
+ - output decoding: `decode_generated_text(...)`
225
+
226
+ In most cases, you do not need to change the HICO task scripts themselves.
227
+
228
+ #### How to add a new model backend
229
+ There are three common situations.
230
+
231
+ 1. The model already works with Hugging Face `AutoProcessor` and `AutoModelForVision2Seq` or `AutoModelForCausalLM`.
232
+ In that case, you may only need to run with:
233
+ ```
234
+ --model-backend auto
235
+ ```
236
+ or explicitly:
237
+ ```
238
+ --model-backend hf_vision2seq
239
+ ```
240
+ or:
241
+ ```
242
+ --model-backend hf_causal_vlm
243
+ ```
244
+
245
+ 2. The model needs custom backend detection.
246
+ Add a rule inside `infer_model_backend(...)` in `tools/vlm_backend.py`.
247
+
248
+ 3. The model needs a custom class or custom multimodal input format.
249
+ Add a new branch inside:
250
+ - `load_model_and_processor(...)`
251
+ - `build_batch_tensors(...)`
252
+ - `decode_generated_text(...)` if needed
253
+
254
+ #### Rule of thumb
255
+
256
+ - If you want to change task behavior or prompting, edit `data/convsersation.py`.
257
+ - If you want to support a new model family, edit `tools/vlm_backend.py`.
258
+ - If you want to add a new stage, add a new script under `tools/`.
259
+
260
+ ### F. Annotation format
261
  A list of dict that contains the following keys:
262
  ```
263
  {
 
275
  }
276
  ```
277
 
278
+ After refinement and examination, extra fields may appear in the JSON:
279
+ ```
280
+ {
281
+ 'refined_description': "A refined 2-3 sentence version aligned with the target HOI label.",
282
+ 'examiner_result': "Verdict: PASS or FAIL ..."
283
+ }
284
+ ```
285
+
286
 
287
  ## Annotate COCO
288
  1. Download COCO dataset.
 
356
  'human_bbox': [126, 258, 150, 305],
357
  'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
358
  }
359
+ ```