Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,133 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- IMISLab/CulturaQA
|
| 5 |
+
language:
|
| 6 |
+
- el
|
| 7 |
+
metrics:
|
| 8 |
+
- accuracy
|
| 9 |
+
- bertscore
|
| 10 |
+
base_model:
|
| 11 |
+
- mistralai/Ministral-3-8B-Instruct-2512-BF16
|
| 12 |
+
pipeline_tag: text-generation
|
| 13 |
+
tags:
|
| 14 |
+
- greek
|
| 15 |
+
- nlp
|
| 16 |
+
- genai
|
| 17 |
+
- LLM
|
| 18 |
+
- QA
|
| 19 |
+
- chat
|
| 20 |
+
- maistros
|
| 21 |
+
---
|
| 22 |
+
# Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models
|
| 23 |
+
|
| 24 |
+
‼️This is the quantized version (4-bit) of the full [Maistros model](https://huggingface.co/IMISLab/Maistros-8B-Instruct).‼️
|
| 25 |
+
|
| 26 |
+
We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on `mistralai/Ministral-3-8B-Instruct-2512-BF16` fine-tuned using Low-Rank Adaptation (LoRA) on [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
|
| 27 |
+
For information regarding the model training, validation and evaluation, as well as its limitations see the [arxiv preprint]().
|
| 28 |
+
|
| 29 |
+
<div align="center">
|
| 30 |
+
<img src="Maistros-Greek.png" width="70%" alt="Maistros Greek logo"/>
|
| 31 |
+
</div>
|
| 32 |
+
|
| 33 |
+
## Model Information
|
| 34 |
+
|
| 35 |
+
- 256k context length (approx. 150,000 Greek words).
|
| 36 |
+
- We extend the training of `Ministral-3-8B-Instruct-2512-BF16` with Greek linguistic and cultural knowledge from the training part of [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
|
| 37 |
+
- We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
|
| 38 |
+
- We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
|
| 39 |
+
- Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.
|
| 40 |
+
|
| 41 |
+
## Evaluation
|
| 42 |
+
|
| 43 |
+
For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%.
|
| 44 |
+
We also utilize the instruct versions of the abbreviated models below.
|
| 45 |
+
|
| 46 |
+
| | DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA |
|
| 47 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 48 |
+
| **Open-Weights Models** | | | | | | | | | |
|
| 49 |
+
| **Maistros 8B ** | 50.83 | **64.42** | **58.70** | **67.25** | **49.54** | **73.33** | 53.37 | **78.17** | **71.99** |
|
| 50 |
+
| Ministral 3 8B | **51.67** | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 |
|
| 51 |
+
| Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | **54.83** | 71.04 | 71.31 |
|
| 52 |
+
| Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 |
|
| 53 |
+
| EuroLLM v2 9B | 41.50 | 53.85 | 39.13 | 46.08 | 31.71 | 42.67 | 36.72 | 58.17 | 70.33 |
|
| 54 |
+
| Gemma 3n E4B | 47.17 | 60.10 | 50.00 | 57.75 | 43.75 | 53.78 | 46.76 | 71.39 | 69.10 |
|
| 55 |
+
| Qwen 3 8B | 48.83 | 31.73 | 49.28 | 54.58 | 36.64 | 63.56 | 42.72 | 67.57 | 68.73 |
|
| 56 |
+
| **Proprietary Models** | | | | | | | | | |
|
| 57 |
+
| Gemini 3 flash | **55.67** | **88.46** | **88.77** | **94.75** | **92.82** | **89.78** | **88.62** | **95.03** | 73.97 |
|
| 58 |
+
| GPT-5 mini | 53.00 | 77.40 | 74.46 | 78.92 | 78.01 | 76.89 | 75.89 | 87.49 | **75.09** |
|
| 59 |
+
|
| 60 |
+
## How to load and run the model.
|
| 61 |
+
Use the following code to run the model locally or you can host the model using [vLLM]('https://vllm.ai/').
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed
|
| 65 |
+
|
| 66 |
+
# Set the model path, device and a random seed for reproducibility.
|
| 67 |
+
model_path = 'IMISLab/Maistros-8B-Instruct'
|
| 68 |
+
device = 'cuda'
|
| 69 |
+
set_seed(42)
|
| 70 |
+
|
| 71 |
+
# Loading the model tokenizer.
|
| 72 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)
|
| 73 |
+
|
| 74 |
+
# Causal Language Models predict tokens from left to right and use EOS token for padding.
|
| 75 |
+
tokenizer.pad_token = tokenizer.eos_token
|
| 76 |
+
tokenizer.padding_side = 'right'
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# Load the model from the path to the device and set it in evaluation mode.
|
| 80 |
+
self.model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = self.device, trust_remote_code = True)
|
| 81 |
+
self.model.eval()
|
| 82 |
+
|
| 83 |
+
# Set the system, instruction and user prompts.
|
| 84 |
+
system_prompt = ''
|
| 85 |
+
instruction_prompt = ''
|
| 86 |
+
user_prompt = ''
|
| 87 |
+
|
| 88 |
+
# Defining the message template.
|
| 89 |
+
messages = [
|
| 90 |
+
{'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]}
|
| 91 |
+
{'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
|
| 92 |
+
]
|
| 93 |
+
|
| 94 |
+
# Applying the tokenizer chat template.
|
| 95 |
+
tokenized = self.tokenizer.apply_chat_template(
|
| 96 |
+
messages,
|
| 97 |
+
add_generation_prompt = True,
|
| 98 |
+
return_tensors = 'pt',
|
| 99 |
+
return_dict = True
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
# Sending the tokenized instances to the device.
|
| 103 |
+
tokenized = {k: v.to(self.device) for k, v in tokenized.items()}
|
| 104 |
+
input_len = len(tokenized['input_ids'][0])
|
| 105 |
+
|
| 106 |
+
# Generating the model output.
|
| 107 |
+
output = self.model.generate(
|
| 108 |
+
**tokenized,
|
| 109 |
+
max_new_tokens = self.max_output_tokens,
|
| 110 |
+
do_sample = False, # Equivalent to temperature = 0.0
|
| 111 |
+
temperature = None,
|
| 112 |
+
top_p = None,
|
| 113 |
+
top_k = None
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
# Decoding the assistant part of the output and printing it.
|
| 117 |
+
decoded_output = self.tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
|
| 118 |
+
print(decoded_output)
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## Contact
|
| 122 |
+
|
| 123 |
+
If you have any questions/feedback about the dataset please e-mail one of the following authors:
|
| 124 |
+
```
|
| 125 |
+
giarelis@ceid.upatras.gr
|
| 126 |
+
cmastrokostas@ac.upatras.gr
|
| 127 |
+
karacap@upatras.gr
|
| 128 |
+
```
|
| 129 |
+
## Citation
|
| 130 |
+
|
| 131 |
+
```
|
| 132 |
+
TBA
|
| 133 |
+
```
|