← Back to the Deal Room

◈ Talk to the Model

Direct inference with the GRPO-finetuned negotiator (Qwen2.5-1.5B)

Model

Repo
Base
Training
Output format

Chat

0.7
Enter to send · Shift+Enter for new line

About this model

What it is

parlay-grpo-1-5b is a Qwen2.5-1.5B-Instruct model fine-tuned in two stages: first with SFT on Gemini-generated negotiation transcripts, then with GRPO using the Parlay reward function — a mix of ZOPA progress, Theory-of-Mind accuracy, tactical card usage, and drift adaptation bonuses.

What it outputs

Every response is a JSON object with three fields:
utterance — the natural language negotiation turn,
offer_amount — a numeric bid (or null for conversational turns),
tactical_move — optional card played (anchor_high, batna_reveal, silence).

How to read the responses here

The utterance is displayed as the chat bubble. If the model includes an offer_amount, it appears as a gold chip below the text. You can expand "Raw model JSON output" to see the full structured response.

Backend

On a GPU Space the model runs locally (fast after the first load). On a CPU Space inference falls back to the Hugging Face Inference API — the first request may take 20–40 s while the model warms up; subsequent requests are faster.