lapa-instruct-bidi-grpo

Bidirectional English-Ukrainian LoRA adapter for lapa-llm/lapa-v0.1.2-instruct (Gemma-3-12B).

Created by linearly combining two direction-specific GRPO adapters:

lapa-instruct-en-uk-grpo (en→uk, step 200)
lapa-instruct-uk-en-grpo (uk→en, step 500)

Results (FLoRes+ devtest, professional translator prompt, greedy)

Direction	Baseline	en→uk specialist	uk→en specialist	This model
en→uk	33.51	34.00 (+0.49)	33.42	33.88 (+0.37)
uk→en	42.00	42.00	42.61 (+0.61)	42.61 (+0.61)

A single adapter that improves both directions — retains 79% of en→uk gains and 100% of uk→en gains.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("lapa-llm/lapa-v0.1.2-instruct", device_map="auto", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "iamthewalrus67/lapa-instruct-bidi-grpo")

tokenizer = AutoTokenizer.from_pretrained("lapa-llm/lapa-v0.1.2-instruct")

# English to Ukrainian
prompt = "You are a professional translator. You give only the translated text and nothing else. Translate the following text into Ukrainian:\nThe weather is beautiful today."
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True, return_tensors="pt", tokenize=True
).to(model.device)
output = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))

How It Was Made

Linear combination of two LoRA adapters using PEFT's add_weighted_adapter(combination_type='linear', weights=[1.0, 1.0]). This sums the LoRA deltas, which works because the two adapters' weight updates are largely orthogonal (different tasks, different data).

Downloads last month: -

Model tree for iamthewalrus67/lapa-instruct-bidi-grpo

Base model

google/gemma-3-12b-pt

Finetuned

lapa-llm/lapa-12b-pt

Finetuned

lapa-llm/lapa-v0.1.2-instruct

Adapter

(4)

this model