Gemma 2B - LMSYS Chatbot Arena LoRA Adapter

This is a LoRA (Low-Rank Adaptation) adapter checkpoint for the google/gemma-2b model, fine-tuned for the LMSYS Chatbot Arena Competition on Kaggle.

Model Description

This model was trained to predict human preferences between two chatbot responses, also known as reward modeling or preference modeling. Given two responses (Response A and Response B), the model outputs a score indicating which response is preferred by humans.

  • Base Model: google/gemma-2b
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • Task: Sequence Classification (preference prediction)
  • Competition: LMSYS Chatbot Arena
  • Framework: PEFT (Parameter-Efficient Fine-Tuning)

LoRA Configuration

{
  "peft_type": "LORA",
  "r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "target_modules": ["o_proj", "v_proj", "q_proj", "k_proj"],
  "modules_to_save": ["classifier", "score"],
  "task_type": "SEQ_CLS"
}

Usage

Installation

pip install transformers peft torch

Inference

from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "google/gemma-2b",
    num_labels=1,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "OldKingMeister/gemma-2b-lmsys-arena-final")
tokenizer = AutoTokenizer.from_pretrained("OldKingMeister/gemma-2b-lmsys-arena-final")

# Prepare input - example comparing two responses
text = """Which response is better for the prompt: What is machine learning?

Response A: Machine learning is a subset of AI.

Response B: Machine learning enables systems to learn from experience."""

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    preference_score = outputs.logits.item()

# Score interpretation:
# Positive: Response B is preferred
# Negative: Response A is preferred
print(f"Preference score: {preference_score}")

Training Details

Dataset

Training Hyperparameters

Parameter Value
Learning Rate 2e-4
Batch Size 4
Gradient Accumulation Steps 4
Epochs 10
Max Sequence Length 512
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.1

Hardware

  • GPU: NVIDIA A100 (40GB)
  • Training Time: ~6 hours
  • Mixed Precision: fp16

Model Architecture

  • Base Parameters: 2.5B (frozen)
  • Trainable Parameters: ~16M (LoRA adapters + classification head)
  • Total Checkpoint Size: ~44MB

Citation

@misc{lmsys-arena-2024,
  title={LMSYS Chatbot Arena Competition},
  howpublished={https://www.kaggle.com/competitions/lmsys-chatbot-arena},
  year={2024}
}

@article{gemma2024,
  title={Gemma: Open Models Based on Gemini Research and Technology},
  author={{Google}},
  year={2024}
}

License

Apache 2.0

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OldKingMeister/gemma-2b-lmsys-arena-final

Base model

google/gemma-2b
Adapter
(23700)
this model

Collection including OldKingMeister/gemma-2b-lmsys-arena-final