Model Card for Model ID

Model Details

Model Description

The model is obtained through alignment training of the mistralai/Mistral-7B-Instruct-v0.2 model using the alignment algorithm mentioned in "Aligning Large Language Models with Human Preferences through Representation Engineering", with UltraFeedback dataset.

You can obtain the training code for RAHF at this link.

A small detail worth noting is that we superpose the representations extracted onto Mistral7B.

Developed by: Wenhao Liu and Xiaohua Wang
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: LoraModel
Language(s) (NLP): [More Information Needed]
License: apache-2.0
Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2

Citation [optional]

BibTeX:

@article{liu2023aligning,
  title={Aligning large language models with human preferences through representation engineering},
  author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing},
  journal={arXiv preprint arXiv:2312.15997},
  year={2023}
}

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Liuwenhao2022/Mistral-7B-LoRA-RAHF-DUAL

Base model

mistralai/Mistral-7B-Instruct-v0.2

Adapter

(1248)

this model

Paper for Liuwenhao2022/Mistral-7B-LoRA-RAHF-DUAL

Aligning Large Language Models with Human Preferences through Representation Engineering

Paper • 2312.15997 • Published Dec 26, 2023 • 2