| --- |
| license: apache-2.0 |
| datasets: |
| - IMISLab/CulturaQA |
| language: |
| - el |
| metrics: |
| - accuracy |
| - bertscore |
| base_model: |
| - mistralai/Ministral-3-8B-Instruct-2512-BF16 |
| pipeline_tag: text-generation |
| tags: |
| - greek |
| - nlp |
| - genai |
| - LLM |
| - QA |
| - chat |
| - maistros |
| --- |
| # Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models |
|
|
| ‼️This is the quantized version (4-bit) of the full [Maistros model](https://huggingface.co/IMISLab/Maistros-8B-Instruct).‼️ |
|
|
| We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on `mistralai/Ministral-3-8B-Instruct-2512-BF16` fine-tuned using Low-Rank Adaptation (LoRA) on [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA). |
| For information regarding the model training, validation and evaluation, as well as its limitations see the [arxiv preprint](https://arxiv.org/abs/2605.01870). |
|
|
| <div align="center"> |
| <img src="Maistros-Greek.png" width="70%" alt="Maistros Greek logo"/> |
| </div> |
|
|
| ## Model Information |
|
|
| - 256k context length (approx. 150,000 Greek words). |
| - We extend the training of `Ministral-3-8B-Instruct-2512-BF16` with Greek linguistic and cultural knowledge from the training part of [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA). |
| - We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities. |
| - We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM. |
| - Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models. |
|
|
| ## Evaluation |
|
|
| For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%. |
| We also utilize the instruct versions of the abbreviated models below. |
|
|
| | | DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA | |
| | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
| | **Open-Weights Models** | | | | | | | | | | |
| | **Maistros 8B** | 50.83 | **64.42** | **58.70** | **67.25** | **49.54** | **73.33** | 53.37 | **78.17** | **71.99** | |
| | Ministral 3 8B | **51.67** | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 | |
| | Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | **54.83** | 71.04 | 71.31 | |
| | Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 | |
| | EuroLLM v2 9B | 41.50 | 53.85 | 39.13 | 46.08 | 31.71 | 42.67 | 36.72 | 58.17 | 70.33 | |
| | Gemma 3n E4B | 47.17 | 60.10 | 50.00 | 57.75 | 43.75 | 53.78 | 46.76 | 71.39 | 69.10 | |
| | Qwen 3 8B | 48.83 | 31.73 | 49.28 | 54.58 | 36.64 | 63.56 | 42.72 | 67.57 | 68.73 | |
| | **Proprietary Models** | | | | | | | | | | |
| | **Gemini 3 flash** | **55.67** | **88.46** | **88.77** | **94.75** | **92.82** | **89.78** | **88.62** | **95.03** | 73.97 | |
| | GPT-5 mini | 53.00 | 77.40 | 74.46 | 78.92 | 78.01 | 76.89 | 75.89 | 87.49 | **75.09** | |
|
|
| ## How to load and run the model. |
| Use the following code to run the model locally or you can host the model using [vLLM]('https://vllm.ai/'). |
|
|
| ```python |
| from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed |
| |
| # Set the model path, device and a random seed for reproducibility. |
| model_path = 'IMISLab/Maistros-8B-Instruct-4bit' |
| device = 'cuda' |
| set_seed(42) |
| |
| # Loading the model tokenizer. |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True) |
| |
| # Causal Language Models predict tokens from left to right and use EOS token for padding. |
| tokenizer.pad_token = tokenizer.eos_token |
| tokenizer.padding_side = 'right' |
| |
| # Load the model from the path to the device and set it in evaluation mode. |
| model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True) |
| model.eval() |
| |
| # Set the system, instruction and user prompts. |
| system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.' |
| instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.' |
| user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;' |
| |
| # Defining the message template. |
| messages = [ |
| {'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]}, |
| {'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]} |
| ] |
| |
| # Applying the tokenizer chat template. |
| tokenized = tokenizer.apply_chat_template( |
| messages, |
| add_generation_prompt = True, |
| return_tensors = 'pt', |
| return_dict = True |
| ) |
| |
| # Sending the tokenized instances to the device. |
| tokenized = {k: v.to(device) for k, v in tokenized.items()} |
| input_len = len(tokenized['input_ids'][0]) |
| |
| # Generating the model output. |
| output = model.generate( |
| **tokenized, |
| max_new_tokens = 1024, |
| do_sample = False, # Equivalent to temperature = 0.0 |
| temperature = None, |
| top_p = None, |
| top_k = None |
| ) |
| |
| # Decoding the assistant part of the output and printing it. |
| decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True) |
| print(decoded_output) |
| ``` |
|
|
| ## Contact |
|
|
| If you have any questions/feedback about the dataset please e-mail one of the following authors: |
| ``` |
| giarelis@ceid.upatras.gr |
| cmastrokostas@ac.upatras.gr |
| karacap@upatras.gr |
| ``` |
| ## Citation |
|
|
| ``` |
| @misc{ |
| giarelis2026maistrosgreeklargelanguage, |
| title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models}, |
| author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis}, |
| year = {2026}, |
| eprint = {2605.01870}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.CL}, |
| url = {https://arxiv.org/abs/2605.01870}, |
| } |
| ``` |