lapa-instruct-bidi-grpo
Bidirectional English-Ukrainian LoRA adapter for lapa-llm/lapa-v0.1.2-instruct (Gemma-3-12B).
Created by linearly combining two direction-specific GRPO adapters:
- lapa-instruct-en-uk-grpo (en→uk, step 200)
- lapa-instruct-uk-en-grpo (uk→en, step 500)
Results (FLoRes+ devtest, professional translator prompt, greedy)
| Direction | Baseline | en→uk specialist | uk→en specialist | This model |
|---|---|---|---|---|
| en→uk | 33.51 | 34.00 (+0.49) | 33.42 | 33.88 (+0.37) |
| uk→en | 42.00 | 42.00 | 42.61 (+0.61) | 42.61 (+0.61) |
A single adapter that improves both directions — retains 79% of en→uk gains and 100% of uk→en gains.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("lapa-llm/lapa-v0.1.2-instruct", device_map="auto", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "iamthewalrus67/lapa-instruct-bidi-grpo")
tokenizer = AutoTokenizer.from_pretrained("lapa-llm/lapa-v0.1.2-instruct")
# English to Ukrainian
prompt = "You are a professional translator. You give only the translated text and nothing else. Translate the following text into Ukrainian:\nThe weather is beautiful today."
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True, return_tensors="pt", tokenize=True
).to(model.device)
output = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))
How It Was Made
Linear combination of two LoRA adapters using PEFT's add_weighted_adapter(combination_type='linear', weights=[1.0, 1.0]). This sums the LoRA deltas, which works because the two adapters' weight updates are largely orthogonal (different tasks, different data).
- Downloads last month
- -
Model tree for iamthewalrus67/lapa-instruct-bidi-grpo
Base model
google/gemma-3-12b-pt Finetuned
lapa-llm/lapa-12b-pt Finetuned
lapa-llm/lapa-v0.1.2-instruct