QLoRA Adapter for Dutch Definition Expansion (AYA-101)
This repository contains a QLoRA adapter fine-tuned on CohereLabs/aya-101 for the task of sense-preserving definitional expansion in Dutch.
This work was developed as part of the Master's thesis, "Transformer-based Expansion of Dutch Dictionary Definitions", submitted for the degree of Master of Science in Artificial Intelligence at KU Leuven.
About the Thesis
The research investigates the potential of transformer-based models to automate a significant bottleneck in contemporary lexicography: the manual expansion of concise, core-meaning definitions into comprehensive, formally structured dictionary entries. The study focuses on Dutch, a task requiring not only semantic accuracy but also strict adherence to lexicographical style and structure.
The thesis empirically compares two primary methodologies: in-context learning via few-shot prompting and adaptation via parameter-efficient fine-tuning (specifically, QLoRA). This comparison was conducted across a range of powerful multilingual and Dutch-specific models, including mT5-xl, GEITje Ultra, Aya-101, and Aya-23, to determine the most effective strategy for this high-precision domain.
This Model's Role and Performance
This fine-tuned AYA-101 model represents the fine-tuning approach explored in the thesis. While the few-shot prompted version of AYA-101 achieved the highest quantitative scores, its performance was found to be inconsistent and unreliable, exhibiting a "hit-or-miss" pattern entirely dependent on the quality of its prompt examples. In contrast, the fine-tuned models proved to be far more robust, consistently learning and generalizing the required lexicographical patterns. Although the fine-tuned Aya-23 model ultimately emerged as the most reliable, this fine-tuned AYA-101 model serves as a key component of the study's central finding: that for high-precision domains, task-specific fine-tuning is essential to instill the discipline required for generating reliable, domain-appropriate output.
How to Use
To use this adapter you must first load the base model (CohereLabs/aya-101) in 4-bit and then apply this adapter on top of it.
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
base_model_id = "CohereLabs/aya-101"
adapter_id = "RobbedoesHF/aya-101-dutch-definition-expansion-qlora" # The repo ID of this adapter
# Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForSeq2SeqLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Apply the LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
print("Model loaded successfully!")
Prompting Format
This adapter was fine-tuned on a specific instructional prompt. For best results, your input should match this structure.
# Define the lemma and short definition you want to expand
lemma = "ecoroman"
short_def = "roman over milieuproblematiek"
# Define the prompt components, matching the training script
system_prompt = "Je bent een expert-lexicograaf die definities schrijft voor een Nederlands woordenboek."
instruction = f"Breid de volgende korte definitie voor het woord '{lemma}' uit tot een volledige definitie: '{short_def}'"
prompt = f"{system_prompt}\n\n{instruction}"
# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate the output tokens
print("
Generating definition...")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=384, # Chosen based on the longest full definition's token length for this model
num_beams=4, # What was used for the thesis
early_stopping=True
)
# Decode the tokens into a string
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n--- Prompt ---")
print(prompt)
print("\n--- Model Output ---")
print(decoded_output)
- Downloads last month
- -
Model tree for RobbedoesHF/aya-101-dutch-definition-expansion-qlora
Base model
CohereLabs/aya-101