lex-interviewer-nemotron-4b-grpo-v21
Nemotron-3-Nano-4B fine-tuned with GRPO to conduct Lex Fridman–style interviews.
Deployed as a WebGPU Q4 ONNX model for in-browser inference via transformers.js.
Checkpoint: GRPO v21
This is the best-performing checkpoint from a series of GRPO experiments on the Lex Fridman interviewer task.
| Metric | Value |
|---|---|
| Thinking-enabled functional eval | 0.867 ± 0.231 |
| on_topic | 84% |
| uses_guest | 80% |
| probing | 96% |
Significantly outperforms the base Nemotron-3-Nano-4B model (0.760) and all prior fine-tuned checkpoints.
What this model does
Given a guest's statement, the model asks one focused, incisive follow-up question that:
- uses the guest's specific vocabulary
- probes the reasoning or implication behind what they said
- ends with exactly one question mark
It uses Nemotron's extended thinking (enable_thinking: true) to reason before generating the question.
Why GRPO v21 succeeded
Measured across v21, v22, v23, v24 experiments:
GRPO_success = P(at least 1 zero per group) ≈ 0.25–0.35
× hard binary reward gate (clear zeros vs. 0.7+ goods)
× starting below the reward optimum
GRPO learns from contrast, not from correctness. v21 hit the Goldilocks zone:
- ~32% of training steps had at least one clipped/failed completion → high intra-group std
- reward_v12's hard gate (fail = exactly 0.0, pass = 0.7+) maximized advantage magnitude
- starting from
sft-lora-v2-nativeleft room to climb
Full analysis: docs/GRPO_V21_SUCCESS_ANALYSIS.md in bobber/lex-fridman-interviewer-project.
ONNX export details
Built using the LoRA-only patching strategy from the project retrospective:
- Reference base:
onnx-community/NVIDIA-Nemotron-3-Nano-4B-BF16-ONNX(Q4 format) - Patched layers: only the 50 LoRA target weight groups (
q/k/v/o_proj,up/down/gate_proj) - Preserved from reference: all Mamba layers, embedding, lm_head (prevents WebGPU precision regression)
- Quantization: asymmetric uint4 block quantization (MatMulNBits, block_size=32)
Scripts: scripts/merge_lora_v21.py, scripts/patch_q4_loraonly.py in the project repo.
Usage (transformers.js)
import { pipeline } from '@huggingface/transformers';
const interviewer = await pipeline(
'text-generation',
'bobber/lex-interviewer-nemotron-4b-grpo-v21',
{ dtype: 'q4', device: 'webgpu' }
);
const messages = [
{ role: 'system', content: 'You are an expert podcast interviewer...\n\nGuest: Andrej Karpathy' },
{ role: 'user', content: 'What is your next question?' }
];
const result = await interviewer(messages, {
max_new_tokens: 800,
do_sample: true,
temperature: 0.7,
chat_template_kwargs: { enable_thinking: true }
});
Live demo
bobber/lex-interviewer-chat — runs entirely in your browser via WebGPU.
Related
- Project repo & docs:
bobber/lex-fridman-interviewer-project - GRPO v21 success analysis:
docs/GRPO_V21_SUCCESS_ANALYSIS.md - ONNX retrospective:
docs/ONNX_RETROSPECTIVE.md
- Downloads last month
- 619