Performance:

Citation

@inproceedings{
zhu2025weaktostrong,
title={Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model},
author={Wenhong Zhu and Zhiwei He and Xiaofeng Wang and Pengfei Liu and Rui Wang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=f7KxfUrRSb}
}

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including wh-zhu/OpenMath-nemotron-7B-WSPO

Weak-to-Strong

Collection

weak-to-strong trained models • 2 items • Updated Jun 10, 2025