Commit ·
60c2d3e
1
Parent(s): 98d1a50
Update medical vqa score (#65)
Browse files- Update medical vqa score (c9595b053ff1a456a4be89da3a85673a20389343)
Co-authored-by: Tianyi Tang <StevenTang@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -867,7 +867,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
|
|
| 867 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td>
|
| 868 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.4</td>
|
| 869 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td>
|
| 870 |
-
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">
|
| 871 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.6</td>
|
| 872 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td>
|
| 873 |
</tr>
|
|
@@ -876,7 +876,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
|
|
| 876 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td>
|
| 877 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.9</td>
|
| 878 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.3</td>
|
| 879 |
-
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">
|
| 880 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.3</td>
|
| 881 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.2</td>
|
| 882 |
</tr>
|
|
@@ -897,6 +897,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
|
|
| 897 |
* BabyVision: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 43.3.<br>
|
| 898 |
* V*: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 91.1.<br>
|
| 899 |
* Empty cells (--) indicate scores not yet available or not applicable.<br>
|
|
|
|
| 900 |
</p>
|
| 901 |
|
| 902 |
</div>
|
|
|
|
| 867 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td>
|
| 868 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.4</td>
|
| 869 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td>
|
| 870 |
+
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.5</td>
|
| 871 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.6</td>
|
| 872 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td>
|
| 873 |
</tr>
|
|
|
|
| 876 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td>
|
| 877 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.9</td>
|
| 878 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.3</td>
|
| 879 |
+
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.1</td>
|
| 880 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.3</td>
|
| 881 |
<td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.2</td>
|
| 882 |
</tr>
|
|
|
|
| 897 |
* BabyVision: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 43.3.<br>
|
| 898 |
* V*: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 91.1.<br>
|
| 899 |
* Empty cells (--) indicate scores not yet available or not applicable.<br>
|
| 900 |
+
* Upon review, we found inconsistencies in the evaluation setup of the historical version Qwen3-VL-235B-A22B on SLAKE and PMC-VQA. The corresponding comparative scores were corrected on March 15, 2026.<br>
|
| 901 |
</p>
|
| 902 |
|
| 903 |
</div>
|