Gemma 2B - LMSYS Chatbot Arena LoRA Adapter

This is a LoRA (Low-Rank Adaptation) adapter checkpoint for the google/gemma-2b model, fine-tuned for the LMSYS Chatbot Arena Competition on Kaggle.

Model Description

This model was trained to predict human preferences between two chatbot responses, also known as reward modeling or preference modeling. Given two responses (Response A and Response B), the model outputs a score indicating which response is preferred by humans.

Base Model: google/gemma-2b
Adapter Type: LoRA (Low-Rank Adaptation)
Task: Sequence Classification (preference prediction)
Competition: LMSYS Chatbot Arena
Framework: PEFT (Parameter-Efficient Fine-Tuning)

LoRA Configuration

{
  "peft_type": "LORA",
  "r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "target_modules": ["o_proj", "v_proj", "q_proj", "k_proj"],
  "modules_to_save": ["classifier", "score"],
  "task_type": "SEQ_CLS"
}

Usage

Installation

pip install transformers peft torch

Inference

from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "google/gemma-2b",
    num_labels=1,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "OldKingMeister/gemma-2b-lmsys-arena-final")
tokenizer = AutoTokenizer.from_pretrained("OldKingMeister/gemma-2b-lmsys-arena-final")

# Prepare input - example comparing two responses
text = """Which response is better for the prompt: What is machine learning?

Response A: Machine learning is a subset of AI.

Response B: Machine learning enables systems to learn from experience."""

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    preference_score = outputs.logits.item()

# Score interpretation:
# Positive: Response B is preferred
# Negative: Response A is preferred
print(f"Preference score: {preference_score}")

Training Details

Dataset

Source: LMSYS Chatbot Arena Conversations
Task: Binary preference classification (which response is better)
Format: Conversations with human preference labels

Training Hyperparameters

Parameter	Value
Learning Rate	2e-4
Batch Size	4
Gradient Accumulation Steps	4
Epochs	10
Max Sequence Length	512
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.1

Hardware

GPU: NVIDIA A100 (40GB)
Training Time: ~6 hours
Mixed Precision: fp16

Model Architecture

Base Parameters: 2.5B (frozen)
Trainable Parameters: ~16M (LoRA adapters + classification head)
Total Checkpoint Size: ~44MB

Citation

@misc{lmsys-arena-2024,
  title={LMSYS Chatbot Arena Competition},
  howpublished={https://www.kaggle.com/competitions/lmsys-chatbot-arena},
  year={2024}
}

@article{gemma2024,
  title={Gemma: Open Models Based on Gemini Research and Technology},
  author={{Google}},
  year={2024}
}

License

Apache 2.0

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OldKingMeister/gemma-2b-lmsys-arena-final

Base model

google/gemma-2b

Adapter

(23700)

this model

Collection including OldKingMeister/gemma-2b-lmsys-arena-final

LMSYS-Arean-Predict-Human-Preferences-In-The-Wild

Collection

3 items • Updated Mar 5