ayh015
/

AutoLLMAnnotation

Model card Files Files and versions

xet

Community

ayh015 commited on 17 days ago

Commit

271210d

1 Parent(s): 73df34b

Update READEME File

Browse files

Files changed (1) hide show

README.md +191 -2

README.md CHANGED Viewed

@@ -76,7 +76,188 @@ CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node
 bash scripts/annotate_hico.sh
 ```
-### D. Annotation format
 A list of dict that contains the following keys:
 ```
 {
@@ -94,6 +275,14 @@ A list of dict that contains the following keys:
 }
 ```
 ## Annotate COCO
 1. Download COCO dataset.
@@ -167,4 +356,4 @@ A list of dict that contains the following keys:
     'human_bbox': [126, 258, 150, 305],
     'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
 }
-```

 bash scripts/annotate_hico.sh
 ```
+### D. Multi-stage HICO pipeline
+The repository now supports a 3-stage HICO workflow:
+1. Long description generation
+2. Description refinement
+3. Description examination / checking
+Each stage writes per-rank JSON files first, then merges them into one JSON file for the next stage.
+#### Stage 1. Generate long descriptions
+This is the original HICO annotation stage. It uses `Conversation` in `data/convsersation.py`.
+Run:
+```
+bash scripts/annotate_hico.sh
+```
+This creates per-rank files such as:
+```
+outputs/labels_0.json
+outputs/labels_1.json
+```
+Merge them with:
+```
+python3 tools/merge_json_outputs.py \
+    --input-dir outputs \
+    --pattern "labels_*.json" \
+    --output-path outputs/merged_labels.json
+```
+#### Stage 2. Refine generated descriptions
+This stage reads a merged JSON from Stage 1 and adds a `refined_description` field. It uses `Conversation_For_Clean_Descrption` in `data/convsersation.py`.
+Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/refine_hico.sh`, then run:
+```
+bash scripts/refine_hico.sh
+```
+This creates files such as:
+```
+outputs/refine/refine_labels_0.json
+```
+Merge them with:
+```
+python3 tools/merge_json_outputs.py \
+    --input-dir outputs/refine \
+    --pattern "refine_labels_*.json" \
+    --output-path outputs/merged_refine.json
+```
+#### Stage 3. Examine / check generated descriptions
+This stage reads a merged JSON from Stage 2 and adds an `examiner_result` field. It uses `Conversation_examiner` in `data/convsersation.py`.
+Modify `data_path`, `model_path`, `annotation_path`, and `output_dir` in `scripts/examine_hico.sh`, then run:
+```
+bash scripts/examine_hico.sh
+```
+This creates files such as:
+```
+outputs/examiner/examiner_labels_0.json
+```
+Merge them with:
+```
+python3 tools/merge_json_outputs.py \
+    --input-dir outputs/examiner \
+    --pattern "examiner_labels_*.json" \
+    --output-path outputs/merged_examine.json
+```
+#### One-shot pipeline
+If you want to run all 3 stages end-to-end, use:
+```
+bash scripts/pipeline_hico.sh
+```
+Before running it, edit the following variables in `scripts/pipeline_hico.sh`:
+- `DATA_PATH`
+- `LONG_MODEL_PATH`
+- `REFINE_MODEL_PATH`
+- `EXAMINE_MODEL_PATH`
+- `LONG_GPU_IDS`
+- `REFINE_GPU_IDS`
+- `EXAMINE_GPU_IDS`
+- `LONG_NPROC`
+- `REFINE_NPROC`
+- `EXAMINE_NPROC`
+The pipeline will produce:
+- `outputs/pipeline/merged_long.json`
+- `outputs/pipeline/merged_refine.json`
+- `outputs/pipeline/merged_examine.json`
+### E. Using different VLM backends
+The HICO scripts are no longer hardcoded to Qwen only. The model loading logic is centralized in `tools/vlm_backend.py`, so you can use different VLM families for long-description generation, refinement, and examination.
+The following scripts support backend selection:
+- `tools/annotate_hico.py`
+- `tools/refine_hico.py`
+- `tools/examine_hico.py`
+- `tools/clean_initial_annotation.py`
+Each of them accepts:
+- `--model-path`
+- `--model-backend`
+- `--torch-dtype`
+Examples:
+```
+torchrun --nnodes=1 --nproc_per_node=1 tools/annotate_hico.py \
+  --model-path /path/to/model \
+  --model-backend auto \
+  --torch-dtype bfloat16 \
+  --data-path ../datasets/HICO-Det \
+  --output-dir outputs/test \
+  --max-samples 5
+```
+You may also force a backend explicitly, for example:
+```
+--model-backend qwen3_vl
+--model-backend qwen3_vl_moe
+--model-backend llava
+--model-backend deepseek_vl
+--model-backend hf_vision2seq
+--model-backend hf_causal_vlm
+```
+#### Where to customize for a new model
+If you want to adapt the repository to a new model family, the main file to edit is:
+- `tools/vlm_backend.py`
+This file controls:
+- backend detection: `infer_model_backend(...)`
+- model/processor loading: `load_model_and_processor(...)`
+- prompt/image packaging: `build_batch_tensors(...)`
+- output decoding: `decode_generated_text(...)`
+In most cases, you do not need to change the HICO task scripts themselves.
+#### How to add a new model backend
+There are three common situations.
+1. The model already works with Hugging Face `AutoProcessor` and `AutoModelForVision2Seq` or `AutoModelForCausalLM`.
+   In that case, you may only need to run with:
+   ```
+   --model-backend auto
+   ```
+   or explicitly:
+   ```
+   --model-backend hf_vision2seq
+   ```
+   or:
+   ```
+   --model-backend hf_causal_vlm
+   ```
+2. The model needs custom backend detection.
+   Add a rule inside `infer_model_backend(...)` in `tools/vlm_backend.py`.
+3. The model needs a custom class or custom multimodal input format.
+   Add a new branch inside:
+   - `load_model_and_processor(...)`
+   - `build_batch_tensors(...)`
+   - `decode_generated_text(...)` if needed
+#### Rule of thumb
+- If you want to change task behavior or prompting, edit `data/convsersation.py`.
+- If you want to support a new model family, edit `tools/vlm_backend.py`.
+- If you want to add a new stage, add a new script under `tools/`.
+### F. Annotation format
 A list of dict that contains the following keys:
 ```
 {
 }
 ```
+After refinement and examination, extra fields may appear in the JSON:
+```
+{
+    'refined_description': "A refined 2-3 sentence version aligned with the target HOI label.",
+    'examiner_result': "Verdict: PASS or FAIL ..."
+}
+```
 ## Annotate COCO
 1. Download COCO dataset.
     'human_bbox': [126, 258, 150, 305],
     'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal."
 }
+```