lex-interviewer-nemotron-4b-grpo-v21

Nemotron-3-Nano-4B fine-tuned with GRPO to conduct Lex Fridman–style interviews.
Deployed as a WebGPU Q4 ONNX model for in-browser inference via transformers.js.

Checkpoint: GRPO v21

This is the best-performing checkpoint from a series of GRPO experiments on the Lex Fridman interviewer task.

Metric	Value
Thinking-enabled functional eval	0.867 ± 0.231
on_topic	84%
uses_guest	80%
probing	96%

Significantly outperforms the base Nemotron-3-Nano-4B model (0.760) and all prior fine-tuned checkpoints.

What this model does

Given a guest's statement, the model asks one focused, incisive follow-up question that:

uses the guest's specific vocabulary
probes the reasoning or implication behind what they said
ends with exactly one question mark

It uses Nemotron's extended thinking (enable_thinking: true) to reason before generating the question.

Why GRPO v21 succeeded

Measured across v21, v22, v23, v24 experiments:

GRPO_success = P(at least 1 zero per group) ≈ 0.25–0.35
             × hard binary reward gate (clear zeros vs. 0.7+ goods)
             × starting below the reward optimum

GRPO learns from contrast, not from correctness. v21 hit the Goldilocks zone:

~32% of training steps had at least one clipped/failed completion → high intra-group std
reward_v12's hard gate (fail = exactly 0.0, pass = 0.7+) maximized advantage magnitude
starting from sft-lora-v2-native left room to climb

Full analysis: docs/GRPO_V21_SUCCESS_ANALYSIS.md in bobber/lex-fridman-interviewer-project.

ONNX export details

Built using the LoRA-only patching strategy from the project retrospective:

Reference base: onnx-community/NVIDIA-Nemotron-3-Nano-4B-BF16-ONNX (Q4 format)
Patched layers: only the 50 LoRA target weight groups (q/k/v/o_proj, up/down/gate_proj)
Preserved from reference: all Mamba layers, embedding, lm_head (prevents WebGPU precision regression)
Quantization: asymmetric uint4 block quantization (MatMulNBits, block_size=32)

Scripts: scripts/merge_lora_v21.py, scripts/patch_q4_loraonly.py in the project repo.

Usage (transformers.js)

import { pipeline } from '@huggingface/transformers';

const interviewer = await pipeline(
  'text-generation',
  'bobber/lex-interviewer-nemotron-4b-grpo-v21',
  { dtype: 'q4', device: 'webgpu' }
);

const messages = [
  { role: 'system', content: 'You are an expert podcast interviewer...\n\nGuest: Andrej Karpathy' },
  { role: 'user', content: 'What is your next question?' }
];

const result = await interviewer(messages, {
  max_new_tokens: 800,
  do_sample: true,
  temperature: 0.7,
  chat_template_kwargs: { enable_thinking: true }
});

Live demo

bobber/lex-interviewer-chat — runs entirely in your browser via WebGPU.

Project repo & docs: bobber/lex-fridman-interviewer-project
GRPO v21 success analysis: docs/GRPO_V21_SUCCESS_ANALYSIS.md
ONNX retrospective: docs/ONNX_RETROSPECTIVE.md

Downloads last month: 619

bobber
/

lex-interviewer-nemotron-4b-grpo-v21