IMISLab commited on
Commit
6515639
·
verified ·
1 Parent(s): 2333250

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -40,10 +40,13 @@ For information regarding the model training, validation and evaluation, as well
40
 
41
  ## Evaluation
42
 
43
- | Acc (%) | DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA |
 
 
 
44
  | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
45
  | **Open-Weights Models** | | | | | | | | | |
46
- | **Maistros 8B (Ours)** | 50.83 | **64.42** | **58.70** | **67.25** | **49.54** | **73.33** | 53.37 | **78.17** | **71.99** |
47
  | Ministral 3 8B | **51.67** | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 |
48
  | Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | **54.83** | 71.04 | 71.31 |
49
  | Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 |
 
40
 
41
  ## Evaluation
42
 
43
+ For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%.
44
+ We also utilize the instruct versions of the abbreviated models below.
45
+
46
+ | | DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA |
47
  | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
48
  | **Open-Weights Models** | | | | | | | | | |
49
+ | **Maistros 8B ** | 50.83 | **64.42** | **58.70** | **67.25** | **49.54** | **73.33** | 53.37 | **78.17** | **71.99** |
50
  | Ministral 3 8B | **51.67** | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 |
51
  | Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | **54.83** | 71.04 | 71.31 |
52
  | Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 |