lex-interviewer-nemotron-4b-grpo-v21

Nemotron-3-Nano-4B fine-tuned with GRPO to conduct Lex Fridman–style interviews.
Deployed as a WebGPU Q4 ONNX model for in-browser inference via transformers.js.


Checkpoint: GRPO v21

This is the best-performing checkpoint from a series of GRPO experiments on the Lex Fridman interviewer task.

Metric Value
Thinking-enabled functional eval 0.867 ± 0.231
on_topic 84%
uses_guest 80%
probing 96%

Significantly outperforms the base Nemotron-3-Nano-4B model (0.760) and all prior fine-tuned checkpoints.


What this model does

Given a guest's statement, the model asks one focused, incisive follow-up question that:

  • uses the guest's specific vocabulary
  • probes the reasoning or implication behind what they said
  • ends with exactly one question mark

It uses Nemotron's extended thinking (enable_thinking: true) to reason before generating the question.


Why GRPO v21 succeeded

Measured across v21, v22, v23, v24 experiments:

GRPO_success = P(at least 1 zero per group) ≈ 0.25–0.35
             × hard binary reward gate (clear zeros vs. 0.7+ goods)
             × starting below the reward optimum

GRPO learns from contrast, not from correctness. v21 hit the Goldilocks zone:

  • ~32% of training steps had at least one clipped/failed completion → high intra-group std
  • reward_v12's hard gate (fail = exactly 0.0, pass = 0.7+) maximized advantage magnitude
  • starting from sft-lora-v2-native left room to climb

Full analysis: docs/GRPO_V21_SUCCESS_ANALYSIS.md in bobber/lex-fridman-interviewer-project.


ONNX export details

Built using the LoRA-only patching strategy from the project retrospective:

  • Reference base: onnx-community/NVIDIA-Nemotron-3-Nano-4B-BF16-ONNX (Q4 format)
  • Patched layers: only the 50 LoRA target weight groups (q/k/v/o_proj, up/down/gate_proj)
  • Preserved from reference: all Mamba layers, embedding, lm_head (prevents WebGPU precision regression)
  • Quantization: asymmetric uint4 block quantization (MatMulNBits, block_size=32)

Scripts: scripts/merge_lora_v21.py, scripts/patch_q4_loraonly.py in the project repo.


Usage (transformers.js)

import { pipeline } from '@huggingface/transformers';

const interviewer = await pipeline(
  'text-generation',
  'bobber/lex-interviewer-nemotron-4b-grpo-v21',
  { dtype: 'q4', device: 'webgpu' }
);

const messages = [
  { role: 'system', content: 'You are an expert podcast interviewer...\n\nGuest: Andrej Karpathy' },
  { role: 'user', content: 'What is your next question?' }
];

const result = await interviewer(messages, {
  max_new_tokens: 800,
  do_sample: true,
  temperature: 0.7,
  chat_template_kwargs: { enable_thinking: true }
});

Live demo

bobber/lex-interviewer-chat — runs entirely in your browser via WebGPU.


Related

  • Project repo & docs: bobber/lex-fridman-interviewer-project
  • GRPO v21 success analysis: docs/GRPO_V21_SUCCESS_ANALYSIS.md
  • ONNX retrospective: docs/ONNX_RETROSPECTIVE.md
Downloads last month
619
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support