VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 220
YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation Paper • 2601.08441 • Published Jan 13 • 8
view article Article CircleGuardBench: New Standard for Evaluating AI Moderation Models May 7, 2025 • 60
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 22 items • Updated 5 days ago • 57
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper • 2512.05591 • Published Dec 5, 2025 • 17
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 265
Holo2 Collection Holo2 - Cost-Efficient Models for Cross-Platform Computer-Use Agents • 4 items • Updated Feb 2 • 27
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR Paper • 2511.01937 • Published Nov 2, 2025 • 16
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents Paper • 2510.19949 • Published Oct 22, 2025 • 38
Reward Models 06-2025 Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated 5 days ago • 24
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 5 days ago • 141