LLM Bilateral Bargaining
Collection
8 items • Updated
A Qwen3-8B model trained via reinforcement learning (GRPO) to play as the seller in bilateral bargaining negotiations.
This model was trained as part of the LLM Bilateral Bargaining project, which studies how LLM agents negotiate in structured buyer-seller bargaining games.
Training method: Group Relative Policy Optimization (GRPO) with a multi-component reward function covering parsing correctness, execution success, constraint compliance, and negotiation utility. Initialized from the SFT checkpoint.
Role: Seller agent — negotiates to sell items at the highest price while respecting a private minimum acceptable price.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"yale-cadmy/qwen3-8B-bargaining-seller-rl",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("yale-cadmy/qwen3-8B-bargaining-seller-rl")
CC-BY-NC-4.0. See the LLM Bilateral Bargaining repository for details.