File size: 6,028 Bytes
00480e3 f10a4a2 00480e3 ecf0f24 00480e3 ecf0f24 00480e3 66c6387 00480e3 1dfde88 00480e3 1dfde88 00480e3 66c6387 c5c37f8 1dfde88 00480e3 1dfde88 00480e3 1dfde88 00480e3 1dfde88 00480e3 1dfde88 00480e3 1dfde88 00480e3 1dfde88 00480e3 e6f567d 00480e3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: apache-2.0
datasets:
- IMISLab/CulturaQA
language:
- el
metrics:
- accuracy
- bertscore
base_model:
- mistralai/Ministral-3-8B-Instruct-2512-BF16
pipeline_tag: text-generation
tags:
- greek
- nlp
- genai
- LLM
- QA
- chat
- maistros
---
# Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models
‼️This is the quantized version (4-bit) of the full [Maistros model](https://huggingface.co/IMISLab/Maistros-8B-Instruct).‼️
We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on `mistralai/Ministral-3-8B-Instruct-2512-BF16` fine-tuned using Low-Rank Adaptation (LoRA) on [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
For information regarding the model training, validation and evaluation, as well as its limitations see the [arxiv preprint](https://arxiv.org/abs/2605.01870).
<div align="center">
<img src="Maistros-Greek.png" width="70%" alt="Maistros Greek logo"/>
</div>
## Model Information
- 256k context length (approx. 150,000 Greek words).
- We extend the training of `Ministral-3-8B-Instruct-2512-BF16` with Greek linguistic and cultural knowledge from the training part of [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
- We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
- We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
- Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.
## Evaluation
For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%.
We also utilize the instruct versions of the abbreviated models below.
| | DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Open-Weights Models** | | | | | | | | | |
| **Maistros 8B** | 50.83 | **64.42** | **58.70** | **67.25** | **49.54** | **73.33** | 53.37 | **78.17** | **71.99** |
| Ministral 3 8B | **51.67** | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 |
| Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | **54.83** | 71.04 | 71.31 |
| Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 |
| EuroLLM v2 9B | 41.50 | 53.85 | 39.13 | 46.08 | 31.71 | 42.67 | 36.72 | 58.17 | 70.33 |
| Gemma 3n E4B | 47.17 | 60.10 | 50.00 | 57.75 | 43.75 | 53.78 | 46.76 | 71.39 | 69.10 |
| Qwen 3 8B | 48.83 | 31.73 | 49.28 | 54.58 | 36.64 | 63.56 | 42.72 | 67.57 | 68.73 |
| **Proprietary Models** | | | | | | | | | |
| **Gemini 3 flash** | **55.67** | **88.46** | **88.77** | **94.75** | **92.82** | **89.78** | **88.62** | **95.03** | 73.97 |
| GPT-5 mini | 53.00 | 77.40 | 74.46 | 78.92 | 78.01 | 76.89 | 75.89 | 87.49 | **75.09** |
## How to load and run the model.
Use the following code to run the model locally or you can host the model using [vLLM]('https://vllm.ai/').
```python
from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed
# Set the model path, device and a random seed for reproducibility.
model_path = 'IMISLab/Maistros-8B-Instruct-4bit'
device = 'cuda'
set_seed(42)
# Loading the model tokenizer.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)
# Causal Language Models predict tokens from left to right and use EOS token for padding.
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'
# Load the model from the path to the device and set it in evaluation mode.
model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True)
model.eval()
# Set the system, instruction and user prompts.
system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.'
instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.'
user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;'
# Defining the message template.
messages = [
{'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
{'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
]
# Applying the tokenizer chat template.
tokenized = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = 'pt',
return_dict = True
)
# Sending the tokenized instances to the device.
tokenized = {k: v.to(device) for k, v in tokenized.items()}
input_len = len(tokenized['input_ids'][0])
# Generating the model output.
output = model.generate(
**tokenized,
max_new_tokens = 1024,
do_sample = False, # Equivalent to temperature = 0.0
temperature = None,
top_p = None,
top_k = None
)
# Decoding the assistant part of the output and printing it.
decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
print(decoded_output)
```
## Contact
If you have any questions/feedback about the dataset please e-mail one of the following authors:
```
giarelis@ceid.upatras.gr
cmastrokostas@ac.upatras.gr
karacap@upatras.gr
```
## Citation
```
@misc{
giarelis2026maistrosgreeklargelanguage,
title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models},
author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis},
year = {2026},
eprint = {2605.01870},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2605.01870},
}
``` |