lapa-instruct-bidi-grpo

Bidirectional English-Ukrainian LoRA adapter for lapa-llm/lapa-v0.1.2-instruct (Gemma-3-12B).

Created by linearly combining two direction-specific GRPO adapters:

Results (FLoRes+ devtest, professional translator prompt, greedy)

Direction Baseline en→uk specialist uk→en specialist This model
en→uk 33.51 34.00 (+0.49) 33.42 33.88 (+0.37)
uk→en 42.00 42.00 42.61 (+0.61) 42.61 (+0.61)

A single adapter that improves both directions — retains 79% of en→uk gains and 100% of uk→en gains.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("lapa-llm/lapa-v0.1.2-instruct", device_map="auto", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "iamthewalrus67/lapa-instruct-bidi-grpo")

tokenizer = AutoTokenizer.from_pretrained("lapa-llm/lapa-v0.1.2-instruct")

# English to Ukrainian
prompt = "You are a professional translator. You give only the translated text and nothing else. Translate the following text into Ukrainian:\nThe weather is beautiful today."
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True, return_tensors="pt", tokenize=True
).to(model.device)
output = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))

How It Was Made

Linear combination of two LoRA adapters using PEFT's add_weighted_adapter(combination_type='linear', weights=[1.0, 1.0]). This sums the LoRA deltas, which works because the two adapters' weight updates are largely orthogonal (different tasks, different data).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iamthewalrus67/lapa-instruct-bidi-grpo

Adapter
(4)
this model