Model Card for lora-gemma327b-xllora-pos

This model is a fine-tuned version of google/gemma-3-27b-it. It has been trained using TRL.

This repository contains the XL-LoRA positive adapter used in the paper:

Bootstrapping Embeddings for Low Resource Languages

The adapter is designed for synthetic triplet generation in multilingual embedding training pipelines.

It is not merged with the base model and should be applied to the base model Gemma 3 27B during inference.

Model Details

Property	Value
Base model	Gemma 3 27B
Method	XL-LoRA
Adapter type	LoRA
Purpose	Synthetic positive generation

The adapter is part of the XL-LoRA methodology for generating multilingual contrastive training data.

Intended Use

This adapter is used to generate synthetic training data for multilingual sentence embedding models.

Specifically, it is used to generate:

Adapter	Purpose
xllora-pos	Generate positive examples
xllora-neg	Generate hard negative examples

These examples are then used to construct triplet datasets:

(anchor, positive, hard_negative)

for training sentence embedding models.

Usage

The adapter must be loaded together with the Gemma 3 27B base model using the PEFT library.

Example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "google/gemma-3-27b"
adapter_model = "mbasoz/lora-gemma327b-xllora-pos"

tokenizer = AutoTokenizer.from_pretrained(base_model)

model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)

Data Synthesis

Synthetic triplets are generated using the script:

src/generate_answers_mgpu_orch.py

from the official code repository:

https://github.com/mbasoz/xllora-embedding

Example scripts for generating data:

Negative generation

scripts/data_synthesis_neg.sh

Positive generation

scripts/data_synthesis_pos.sh

These scripts demonstrate how the adapter is used to generate multilingual triplet data.

Training procedure

The XL-LoRA adapters were trained using:

src/lora_training.py

Example training commands are provided in:

Negative adapter training

scripts/xllora_train_negative.sh

Positive adapter training

scripts/xllora_train_positive.sh

This model was trained with SFT.

Related Resources

Paper: Bootstrapping Embeddings for Low Resource Languages
Code: https://github.com/mbasoz/xllora-embedding
Synthetic dataset: https://huggingface.co/datasets/mbasoz/xllora-datasets
Training datasets: https://github.com/mbasoz/xllora-embedding/blob/main/data/mixed_parallel_xnli_14l_opusmt_10k_fin_neg.csv

and

https://github.com/mbasoz/xllora-embedding/blob/main/data/mixed_parallel_xnli_14l_opusmt_10k_fin_pos.csv

Framework versions

PEFT 0.15.2
TRL: 0.19.0
Transformers: 4.53.1
Pytorch: 2.6.0+cu126
Datasets: 3.1.0
Tokenizers: 0.21.2

Citations

If you use these adapters in your research, please cite the following paper:

@article{basoz2026bootstrappingembeddings,
  title={Bootstrapping Embeddings for Low Resource Languages},
  author={Merve Basoz and Andrew Horne and Mattia Opper},
  year={2026},
  eprint={2603.01732},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.01732},
  note={Accepted to the LoResLM Workshop at EACL 2026}
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

License

This model is released under the MIT License.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mbasoz/lora-gemma327b-xllora-pos

Base model

google/gemma-3-27b-pt

Finetuned

google/gemma-3-27b-it

Adapter

(218)

this model

Paper for mbasoz/lora-gemma327b-xllora-pos

Bootstrapping Embeddings for Low Resource Languages

Paper • 2603.01732 • Published Mar 2