Update READEME File
Browse files
README.md
CHANGED
|
@@ -76,7 +76,188 @@ CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node
|
|
| 76 |
bash scripts/annotate_hico.sh
|
| 77 |
```
|
| 78 |
|
| 79 |
-
### D.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
A list of dict that contains the following keys:
|
| 81 |
```
|
| 82 |
{
|
|
@@ -94,6 +275,14 @@ A list of dict that contains the following keys:
|
|
| 94 |
}
|
| 95 |
```
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
## Annotate COCO
|
| 99 |
1. Download COCO dataset.
|
|
@@ -167,4 +356,4 @@ A list of dict that contains the following keys:
|
|
| 167 |
'human_bbox': [126, 258, 150, 305],
|
| 168 |
'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
|
| 169 |
}
|
| 170 |
-
```
|
|
|
|
| 76 |
bash scripts/annotate_hico.sh
|
| 77 |
```
|
| 78 |
|
| 79 |
+
### D. Multi-stage HICO pipeline
|
| 80 |
+
The repository now supports a 3-stage HICO workflow:
|
| 81 |
+
|
| 82 |
+
1. Long description generation
|
| 83 |
+
2. Description refinement
|
| 84 |
+
3. Description examination / checking
|
| 85 |
+
|
| 86 |
+
Each stage writes per-rank JSON files first, then merges them into one JSON file for the next stage.
|
| 87 |
+
|
| 88 |
+
#### Stage 1. Generate long descriptions
|
| 89 |
+
This is the original HICO annotation stage. It uses `Conversation` in `data/convsersation.py`.
|
| 90 |
+
|
| 91 |
+
Run:
|
| 92 |
+
```
|
| 93 |
+
bash scripts/annotate_hico.sh
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
This creates per-rank files such as:
|
| 97 |
+
```
|
| 98 |
+
outputs/labels_0.json
|
| 99 |
+
outputs/labels_1.json
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
Merge them with:
|
| 103 |
+
```
|
| 104 |
+
python3 tools/merge_json_outputs.py \
|
| 105 |
+
--input-dir outputs \
|
| 106 |
+
--pattern "labels_*.json" \
|
| 107 |
+
--output-path outputs/merged_labels.json
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
#### Stage 2. Refine generated descriptions
|
| 111 |
+
This stage reads a merged JSON from Stage 1 and adds a `refined_description` field. It uses `Conversation_For_Clean_Descrption` in `data/convsersation.py`.
|
| 112 |
+
|
| 113 |
+
Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/refine_hico.sh`, then run:
|
| 114 |
+
```
|
| 115 |
+
bash scripts/refine_hico.sh
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
This creates files such as:
|
| 119 |
+
```
|
| 120 |
+
outputs/refine/refine_labels_0.json
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
Merge them with:
|
| 124 |
+
```
|
| 125 |
+
python3 tools/merge_json_outputs.py \
|
| 126 |
+
--input-dir outputs/refine \
|
| 127 |
+
--pattern "refine_labels_*.json" \
|
| 128 |
+
--output-path outputs/merged_refine.json
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
#### Stage 3. Examine / check generated descriptions
|
| 132 |
+
This stage reads a merged JSON from Stage 2 and adds an `examiner_result` field. It uses `Conversation_examiner` in `data/convsersation.py`.
|
| 133 |
+
|
| 134 |
+
Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/examine_hico.sh`, then run:
|
| 135 |
+
```
|
| 136 |
+
bash scripts/examine_hico.sh
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
This creates files such as:
|
| 140 |
+
```
|
| 141 |
+
outputs/examiner/examiner_labels_0.json
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
Merge them with:
|
| 145 |
+
```
|
| 146 |
+
python3 tools/merge_json_outputs.py \
|
| 147 |
+
--input-dir outputs/examiner \
|
| 148 |
+
--pattern "examiner_labels_*.json" \
|
| 149 |
+
--output-path outputs/merged_examine.json
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
#### One-shot pipeline
|
| 153 |
+
If you want to run all 3 stages end-to-end, use:
|
| 154 |
+
```
|
| 155 |
+
bash scripts/pipeline_hico.sh
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
Before running it, edit the following variables in `scripts/pipeline_hico.sh`:
|
| 159 |
+
|
| 160 |
+
- `DATA_PATH`
|
| 161 |
+
- `LONG_MODEL_PATH`
|
| 162 |
+
- `REFINE_MODEL_PATH`
|
| 163 |
+
- `EXAMINE_MODEL_PATH`
|
| 164 |
+
- `LONG_GPU_IDS`
|
| 165 |
+
- `REFINE_GPU_IDS`
|
| 166 |
+
- `EXAMINE_GPU_IDS`
|
| 167 |
+
- `LONG_NPROC`
|
| 168 |
+
- `REFINE_NPROC`
|
| 169 |
+
- `EXAMINE_NPROC`
|
| 170 |
+
|
| 171 |
+
The pipeline will produce:
|
| 172 |
+
|
| 173 |
+
- `outputs/pipeline/merged_long.json`
|
| 174 |
+
- `outputs/pipeline/merged_refine.json`
|
| 175 |
+
- `outputs/pipeline/merged_examine.json`
|
| 176 |
+
|
| 177 |
+
### E. Using different VLM backends
|
| 178 |
+
The HICO scripts are no longer hardcoded to Qwen only. The model loading logic is centralized in `tools/vlm_backend.py`, so you can use different VLM families for long-description generation, refinement, and examination.
|
| 179 |
+
|
| 180 |
+
The following scripts support backend selection:
|
| 181 |
+
|
| 182 |
+
- `tools/annotate_hico.py`
|
| 183 |
+
- `tools/refine_hico.py`
|
| 184 |
+
- `tools/examine_hico.py`
|
| 185 |
+
- `tools/clean_initial_annotation.py`
|
| 186 |
+
|
| 187 |
+
Each of them accepts:
|
| 188 |
+
|
| 189 |
+
- `--model-path`
|
| 190 |
+
- `--model-backend`
|
| 191 |
+
- `--torch-dtype`
|
| 192 |
+
|
| 193 |
+
Examples:
|
| 194 |
+
```
|
| 195 |
+
torchrun --nnodes=1 --nproc_per_node=1 tools/annotate_hico.py \
|
| 196 |
+
--model-path /path/to/model \
|
| 197 |
+
--model-backend auto \
|
| 198 |
+
--torch-dtype bfloat16 \
|
| 199 |
+
--data-path ../datasets/HICO-Det \
|
| 200 |
+
--output-dir outputs/test \
|
| 201 |
+
--max-samples 5
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
You may also force a backend explicitly, for example:
|
| 205 |
+
```
|
| 206 |
+
--model-backend qwen3_vl
|
| 207 |
+
--model-backend qwen3_vl_moe
|
| 208 |
+
--model-backend llava
|
| 209 |
+
--model-backend deepseek_vl
|
| 210 |
+
--model-backend hf_vision2seq
|
| 211 |
+
--model-backend hf_causal_vlm
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
#### Where to customize for a new model
|
| 215 |
+
If you want to adapt the repository to a new model family, the main file to edit is:
|
| 216 |
+
|
| 217 |
+
- `tools/vlm_backend.py`
|
| 218 |
+
|
| 219 |
+
This file controls:
|
| 220 |
+
|
| 221 |
+
- backend detection: `infer_model_backend(...)`
|
| 222 |
+
- model/processor loading: `load_model_and_processor(...)`
|
| 223 |
+
- prompt/image packaging: `build_batch_tensors(...)`
|
| 224 |
+
- output decoding: `decode_generated_text(...)`
|
| 225 |
+
|
| 226 |
+
In most cases, you do not need to change the HICO task scripts themselves.
|
| 227 |
+
|
| 228 |
+
#### How to add a new model backend
|
| 229 |
+
There are three common situations.
|
| 230 |
+
|
| 231 |
+
1. The model already works with Hugging Face `AutoProcessor` and `AutoModelForVision2Seq` or `AutoModelForCausalLM`.
|
| 232 |
+
In that case, you may only need to run with:
|
| 233 |
+
```
|
| 234 |
+
--model-backend auto
|
| 235 |
+
```
|
| 236 |
+
or explicitly:
|
| 237 |
+
```
|
| 238 |
+
--model-backend hf_vision2seq
|
| 239 |
+
```
|
| 240 |
+
or:
|
| 241 |
+
```
|
| 242 |
+
--model-backend hf_causal_vlm
|
| 243 |
+
```
|
| 244 |
+
|
| 245 |
+
2. The model needs custom backend detection.
|
| 246 |
+
Add a rule inside `infer_model_backend(...)` in `tools/vlm_backend.py`.
|
| 247 |
+
|
| 248 |
+
3. The model needs a custom class or custom multimodal input format.
|
| 249 |
+
Add a new branch inside:
|
| 250 |
+
- `load_model_and_processor(...)`
|
| 251 |
+
- `build_batch_tensors(...)`
|
| 252 |
+
- `decode_generated_text(...)` if needed
|
| 253 |
+
|
| 254 |
+
#### Rule of thumb
|
| 255 |
+
|
| 256 |
+
- If you want to change task behavior or prompting, edit `data/convsersation.py`.
|
| 257 |
+
- If you want to support a new model family, edit `tools/vlm_backend.py`.
|
| 258 |
+
- If you want to add a new stage, add a new script under `tools/`.
|
| 259 |
+
|
| 260 |
+
### F. Annotation format
|
| 261 |
A list of dict that contains the following keys:
|
| 262 |
```
|
| 263 |
{
|
|
|
|
| 275 |
}
|
| 276 |
```
|
| 277 |
|
| 278 |
+
After refinement and examination, extra fields may appear in the JSON:
|
| 279 |
+
```
|
| 280 |
+
{
|
| 281 |
+
'refined_description': "A refined 2-3 sentence version aligned with the target HOI label.",
|
| 282 |
+
'examiner_result': "Verdict: PASS or FAIL ..."
|
| 283 |
+
}
|
| 284 |
+
```
|
| 285 |
+
|
| 286 |
|
| 287 |
## Annotate COCO
|
| 288 |
1. Download COCO dataset.
|
|
|
|
| 356 |
'human_bbox': [126, 258, 150, 305],
|
| 357 |
'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
|
| 358 |
}
|
| 359 |
+
```
|