hzhwcmhf StevenTang commited on
Commit
60c2d3e
·
1 Parent(s): 98d1a50

Update medical vqa score (#65)

Browse files

- Update medical vqa score (c9595b053ff1a456a4be89da3a85673a20389343)


Co-authored-by: Tianyi Tang <StevenTang@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -867,7 +867,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
867
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td>
868
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.4</td>
869
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td>
870
- <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">54.7</td>
871
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.6</td>
872
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td>
873
  </tr>
@@ -876,7 +876,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
876
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td>
877
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.9</td>
878
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.3</td>
879
- <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">41.2</td>
880
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.3</td>
881
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.2</td>
882
  </tr>
@@ -897,6 +897,7 @@ For more details, please refer to our blog post [Qwen3.5](https://qwen.ai/blog?i
897
  * BabyVision: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 43.3.<br>
898
  * V*: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 91.1.<br>
899
  * Empty cells (--) indicate scores not yet available or not applicable.<br>
 
900
  </p>
901
 
902
  </div>
 
867
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.9</td>
868
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">76.4</td>
869
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.3</td>
870
+ <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">72.5</td>
871
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">81.6</td>
872
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">79.9</td>
873
  </tr>
 
876
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">58.9</td>
877
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">59.9</td>
878
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">62.3</td>
879
+ <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">56.1</td>
880
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">63.3</td>
881
  <td style="padding:7px 12px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.15)">64.2</td>
882
  </tr>
 
897
  * BabyVision: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 43.3.<br>
898
  * V*: our model’s score is reported with CI (Code Interpreter) enabled; without CI, the result is 91.1.<br>
899
  * Empty cells (--) indicate scores not yet available or not applicable.<br>
900
+ * Upon review, we found inconsistencies in the evaluation setup of the historical version Qwen3-VL-235B-A22B on SLAKE and PMC-VQA. The corresponding comparative scores were corrected on March 15, 2026.<br>
901
  </p>
902
 
903
  </div>