MEDVQA-GI-2026-Task2-Explainability
Models fine-tuned for ImageCLEFmed MEDVQA-GI 2026 — Subtask 2: Multimodal Explainability & Safety-Aware Reasoning.
Results
| Model | Token-F1 | BERTScore-F1 | Safety-Score | Halluc-Rate | ECE |
|---|---|---|---|---|---|
| MedGemma-KvasirFT | 0.4891 | 0.5602 | 0.7150 | 0.2880 | 0.8537 |
| Qwen2.5-VL-KvasirFT | 0.4967 | 0.5817 | 0.8511 | 0.3680 | 0.6838 |
| Qwen2.5-VL-TransfFT | 0.4954 | 0.5793 | 0.8401 | 0.3780 | 0.7016 |
| Qwen2.5-VL-Base | 0.4292 | 0.5174 | 0.8175 | 0.4560 | 0.7241 |
| Hybrid | 0.4456 | 0.5342 | 0.8309 | 0.3280 | 0.7359 |
Best model: Qwen2.5-VL-KvasirFT
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support