LMSYS-Arean-Predict-Human-Preferences-In-The-Wild
Collection
3 items • Updated
This is a LoRA (Low-Rank Adaptation) adapter checkpoint for the google/gemma-2b model, fine-tuned for the LMSYS Chatbot Arena Competition on Kaggle.
This model was trained to predict human preferences between two chatbot responses, also known as reward modeling or preference modeling. Given two responses (Response A and Response B), the model outputs a score indicating which response is preferred by humans.
{
"peft_type": "LORA",
"r": 16,
"lora_alpha": 32,
"lora_dropout": 0.1,
"target_modules": ["o_proj", "v_proj", "q_proj", "k_proj"],
"modules_to_save": ["classifier", "score"],
"task_type": "SEQ_CLS"
}
pip install transformers peft torch
from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
"google/gemma-2b",
num_labels=1,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "OldKingMeister/gemma-2b-lmsys-arena-final")
tokenizer = AutoTokenizer.from_pretrained("OldKingMeister/gemma-2b-lmsys-arena-final")
# Prepare input - example comparing two responses
text = """Which response is better for the prompt: What is machine learning?
Response A: Machine learning is a subset of AI.
Response B: Machine learning enables systems to learn from experience."""
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
preference_score = outputs.logits.item()
# Score interpretation:
# Positive: Response B is preferred
# Negative: Response A is preferred
print(f"Preference score: {preference_score}")
| Parameter | Value |
|---|---|
| Learning Rate | 2e-4 |
| Batch Size | 4 |
| Gradient Accumulation Steps | 4 |
| Epochs | 10 |
| Max Sequence Length | 512 |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.1 |
@misc{lmsys-arena-2024,
title={LMSYS Chatbot Arena Competition},
howpublished={https://www.kaggle.com/competitions/lmsys-chatbot-arena},
year={2024}
}
@article{gemma2024,
title={Gemma: Open Models Based on Gemini Research and Technology},
author={{Google}},
year={2024}
}
Apache 2.0
Base model
google/gemma-2b