KITE-7B-Instruct
KITE-7B-Instruct is a fine-tuned version of Qwen2.5-VL-7B-Instruct for VLM-based robot failure analysis, released as part of the KITE paper (ICRA 2026).
This checkpoint contains the full merged weights (base + LoRA adapter), ready for direct inference with no additional merge step.
Model Details
| Base model | Qwen/Qwen2.5-VL-7B-Instruct |
| Parameters | ~7B |
| Fine-tuning | QLoRA (4-bit NF4) on RoboFAC textual + multimodal tasks |
| Architecture | Qwen2.5-VL (vision-language, conditional generation) |
| License | Apache 2.0 (same as base model) |
Usage
from transformers import AutoProcessor, AutoModelForVision2Seq
model_id = "m80hz/KITE-7B-Instruct"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(model_id, device_map="auto", trust_remote_code=True)
Or serve it with vLLM for OpenAI-compatible inference:
python -m vllm.entrypoints.openai.api_server --model m80hz/KITE-7B-Instruct
Then use the KITE pipeline to run failure analysis:
python -m kite.cli \
--model_name m80hz/KITE-7B-Instruct \
--model_url http://127.0.0.1:8000/v1 \
--dataset_folder ./datasets/robofac/simulation_data \
--test_file ./datasets/robofac/test_qa_sim/test_detect_identify_locate.json \
--out_dir ./outputs/kite_run
Usage
@inproceedings{hosseinzadeh2025kite,
title = {KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis},
author = {Hosseinzadeh, Mehdi and Wong, King Hang and Dayoub, Feras},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2026}
}
- Downloads last month
- 12