Cross-lingual Transfer of Reward Models
Collection
This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment". • 5 items • Updated
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the iqwiki-kor/MP-86k dataset.
| Model | Chat | Chat-Hard | Safety | Reasoning | Avg. |
|---|---|---|---|---|---|
| iqwiki-kor/Llama3.2-3B-MP-RM | 92.5 | 81.8 | 90.2 | 95.5 | 90.0 |
| RLHFlow/ArmoRM-Llama3-8B-v0.1 | 96.9 | 76.8 | 90.5 | 97.3 | 90.4 |
The following hyperparameters were used during training:
Base model
meta-llama/Llama-3.2-3B-Instruct