Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 60
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published Feb 15 • 56
Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models Paper • 2512.00590 • Published Nov 29, 2025 • 52
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 233
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver Paper • 2510.17699 • Published Oct 20, 2025 • 24
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 217
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 217 • 11
The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26, 2025 • 28
TUN3D: Towards Real-World Scene Understanding from Unposed Images Paper • 2509.21388 • Published Sep 23, 2025 • 15
nablaNABLA: Neighborhood Adaptive Block-Level Attention Paper • 2507.13546 • Published Jul 17, 2025 • 126
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5, 2025 • 135
Risk-Averse Reinforcement Learning with Itakura-Saito Loss Paper • 2505.16925 • Published May 22, 2025 • 26
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper • 2503.13358 • Published Mar 17, 2025 • 95
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators Paper • 2502.06394 • Published Feb 10, 2025 • 89
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3, 2025 • 113