README.md · IMISLab/Maistros-8B-Instruct-4bit at main

Maistros-8B-Instruct-4bit / README.md

IMISLab

Update README.md

c5c37f8 verified 3 days ago

preview code

raw

history blame contribute delete

6.03 kB

	---
	license: apache-2.0
	datasets:
	- IMISLab/CulturaQA
	language:
	- el
	metrics:
	- accuracy
	- bertscore
	base_model:
	- mistralai/Ministral-3-8B-Instruct-2512-BF16
	pipeline_tag: text-generation
	tags:
	- greek
	- nlp
	- genai
	- LLM
	- QA
	- chat
	- maistros
	---
	# Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models

	‼️This is the quantized version (4-bit) of the full [Maistros model](https://huggingface.co/IMISLab/Maistros-8B-Instruct).‼️

	We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on `mistralai/Ministral-3-8B-Instruct-2512-BF16` fine-tuned using Low-Rank Adaptation (LoRA) on [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
	For information regarding the model training, validation and evaluation, as well as its limitations see the [arxiv preprint](https://arxiv.org/abs/2605.01870).

	<div align="center">
	<img src="Maistros-Greek.png" width="70%" alt="Maistros Greek logo"/>
	</div>

	## Model Information

	- 256k context length (approx. 150,000 Greek words).
	- We extend the training of `Ministral-3-8B-Instruct-2512-BF16` with Greek linguistic and cultural knowledge from the training part of [CulturaQA](https://huggingface.co/datasets/IMISLab/CulturaQA).
	- We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
	- We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
	- Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.

	## Evaluation

	For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%.
	We also utilize the instruct versions of the abbreviated models below.

	\| \| DemosQA \| GPCR \| INCLUDE \| Greek ASEP MCQA \| Greek Medical MCQA \| Plutus QA \| Greek Truthful QA \| Greek MMLU (Greek-specific) \| CulturaQA \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Open-Weights Models \| \| \| \| \| \| \| \| \| \|
	\| Maistros 8B \| 50.83 \| 64.42 \| 58.70 \| 67.25 \| 49.54 \| 73.33 \| 53.37 \| 78.17 \| 71.99 \|
	\| Ministral 3 8B \| 51.67 \| 59.62 \| 54.17 \| 63.25 \| 47.92 \| 65.33 \| 52.51 \| 76.23 \| 71.03 \|
	\| Krikri 8B \| 49.50 \| 54.81 \| 50.54 \| 63.08 \| 45.37 \| 64.44 \| 54.83 \| 71.04 \| 71.31 \|
	\| Plutus 8B \| 45.67 \| 50.00 \| 48.37 \| 62.92 \| 39.35 \| 57.33 \| 34.52 \| 70.38 \| 67.44 \|
	\| EuroLLM v2 9B \| 41.50 \| 53.85 \| 39.13 \| 46.08 \| 31.71 \| 42.67 \| 36.72 \| 58.17 \| 70.33 \|
	\| Gemma 3n E4B \| 47.17 \| 60.10 \| 50.00 \| 57.75 \| 43.75 \| 53.78 \| 46.76 \| 71.39 \| 69.10 \|
	\| Qwen 3 8B \| 48.83 \| 31.73 \| 49.28 \| 54.58 \| 36.64 \| 63.56 \| 42.72 \| 67.57 \| 68.73 \|
	\| Proprietary Models \| \| \| \| \| \| \| \| \| \|
	\| Gemini 3 flash \| 55.67 \| 88.46 \| 88.77 \| 94.75 \| 92.82 \| 89.78 \| 88.62 \| 95.03 \| 73.97 \|
	\| GPT-5 mini \| 53.00 \| 77.40 \| 74.46 \| 78.92 \| 78.01 \| 76.89 \| 75.89 \| 87.49 \| 75.09 \|

	## How to load and run the model.
	Use the following code to run the model locally or you can host the model using [vLLM]('https://vllm.ai/').

	```python
	from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed

	# Set the model path, device and a random seed for reproducibility.
	model_path = 'IMISLab/Maistros-8B-Instruct-4bit'
	device = 'cuda'
	set_seed(42)

	# Loading the model tokenizer.
	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)

	# Causal Language Models predict tokens from left to right and use EOS token for padding.
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = 'right'

	# Load the model from the path to the device and set it in evaluation mode.
	model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True)
	model.eval()

	# Set the system, instruction and user prompts.
	system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.'
	instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.'
	user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;'

	# Defining the message template.
	messages = [
	{'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
	{'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
	]

	# Applying the tokenizer chat template.
	tokenized = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt = True,
	return_tensors = 'pt',
	return_dict = True
	)

	# Sending the tokenized instances to the device.
	tokenized = {k: v.to(device) for k, v in tokenized.items()}
	input_len = len(tokenized['input_ids'][0])

	# Generating the model output.
	output = model.generate(
	**tokenized,
	max_new_tokens = 1024,
	do_sample = False, # Equivalent to temperature = 0.0
	temperature = None,
	top_p = None,
	top_k = None
	)

	# Decoding the assistant part of the output and printing it.
	decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
	print(decoded_output)
	```

	## Contact

	If you have any questions/feedback about the dataset please e-mail one of the following authors:
	```
	giarelis@ceid.upatras.gr
	cmastrokostas@ac.upatras.gr
	karacap@upatras.gr
	```
	## Citation

	```
	@misc{
	giarelis2026maistrosgreeklargelanguage,
	title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models},
	author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis},
	year = {2026},
	eprint = {2605.01870},
	archivePrefix = {arXiv},
	primaryClass = {cs.CL},
	url = {https://arxiv.org/abs/2605.01870},
	}
	```