Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published Mar 2 • 33
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 63
Collaborative Performance Prediction for Large Language Models Paper • 2407.01300 • Published Jul 1, 2024 • 2
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7, 2024 • 14