wzhouad
/

Llama3-Instruct-8B-WPO-FP

Text Generation

text-generation-inference

Model card Files Files and versions

Description

Llama3-Instruct-8B model finetuned by off-polciy WPO. Details in WPO: Enhancing RLHF with Weighted Preference Optimization.

License

This model is licensed under the Zoom software license and is permitted for use only for noncommercial, educational, or academic research purposes.

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

F32

·

Collection including wzhouad/Llama3-Instruct-8B-WPO-FP

WPO

Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". • 11 items • Updated Aug 22, 2024 • 7

Paper for wzhouad/Llama3-Instruct-8B-WPO-FP

WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published Jun 17, 2024 • 17