Model Card for lora-gemma327b-xllora-pos

This model is a fine-tuned version of google/gemma-3-27b-it. It has been trained using TRL.

This repository contains the XL-LoRA positive adapter used in the paper:

Bootstrapping Embeddings for Low Resource Languages

The adapter is designed for synthetic triplet generation in multilingual embedding training pipelines.

It is not merged with the base model and should be applied to the base model Gemma 3 27B during inference.

Model Details

Property Value
Base model Gemma 3 27B
Method XL-LoRA
Adapter type LoRA
Purpose Synthetic positive generation

The adapter is part of the XL-LoRA methodology for generating multilingual contrastive training data.

Intended Use

This adapter is used to generate synthetic training data for multilingual sentence embedding models.

Specifically, it is used to generate:

Adapter Purpose
xllora-pos Generate positive examples
xllora-neg Generate hard negative examples

These examples are then used to construct triplet datasets:

(anchor, positive, hard_negative)

for training sentence embedding models.

Usage

The adapter must be loaded together with the Gemma 3 27B base model using the PEFT library.

Example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "google/gemma-3-27b"
adapter_model = "mbasoz/lora-gemma327b-xllora-pos"

tokenizer = AutoTokenizer.from_pretrained(base_model)

model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)

Data Synthesis

Synthetic triplets are generated using the script:

  • src/generate_answers_mgpu_orch.py

from the official code repository:

https://github.com/mbasoz/xllora-embedding

Example scripts for generating data:

Negative generation

scripts/data_synthesis_neg.sh

Positive generation

scripts/data_synthesis_pos.sh

These scripts demonstrate how the adapter is used to generate multilingual triplet data.

Training procedure

The XL-LoRA adapters were trained using:

  • src/lora_training.py

Example training commands are provided in:

Negative adapter training

scripts/xllora_train_negative.sh

Positive adapter training

scripts/xllora_train_positive.sh

This model was trained with SFT.

Related Resources

and

https://github.com/mbasoz/xllora-embedding/blob/main/data/mixed_parallel_xnli_14l_opusmt_10k_fin_pos.csv

Framework versions

  • PEFT 0.15.2
  • TRL: 0.19.0
  • Transformers: 4.53.1
  • Pytorch: 2.6.0+cu126
  • Datasets: 3.1.0
  • Tokenizers: 0.21.2

Citations

If you use these adapters in your research, please cite the following paper:

@article{basoz2026bootstrappingembeddings,
  title={Bootstrapping Embeddings for Low Resource Languages},
  author={Merve Basoz and Andrew Horne and Mattia Opper},
  year={2026},
  eprint={2603.01732},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.01732},
  note={Accepted to the LoResLM Workshop at EACL 2026}
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

License

This model is released under the MIT License.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mbasoz/lora-gemma327b-xllora-pos

Adapter
(218)
this model

Paper for mbasoz/lora-gemma327b-xllora-pos