MEDVQA-GI-2026-Task2-Explainability

Models fine-tuned for ImageCLEFmed MEDVQA-GI 2026 — Subtask 2: Multimodal Explainability & Safety-Aware Reasoning.

Results

Model Token-F1 BERTScore-F1 Safety-Score Halluc-Rate ECE
MedGemma-KvasirFT 0.4891 0.5602 0.7150 0.2880 0.8537
Qwen2.5-VL-KvasirFT 0.4967 0.5817 0.8511 0.3680 0.6838
Qwen2.5-VL-TransfFT 0.4954 0.5793 0.8401 0.3780 0.7016
Qwen2.5-VL-Base 0.4292 0.5174 0.8175 0.4560 0.7241
Hybrid 0.4456 0.5342 0.8309 0.3280 0.7359

Best model: Qwen2.5-VL-KvasirFT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train sageofai/MEDVQA-GI-2026-Task2-Explainability