mzhaoshuai
/

zephyr-7b-alpha-conf-refalign

Model card Files Files and versions

RefAlign: RL with Similarity-based Rewards

GitHub repository: https://github.com/mzhaoshuai/RefAlign

Paper: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.

RefAlign Training with https://huggingface.co/datasets/shuchangtao/CONQORD_dataset/tree/main/conqord_step3_data.

Framework versions

PEFT 0.11.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mzhaoshuai/zephyr-7b-alpha-conf-refalign

Base model

mistralai/Mistral-7B-v0.1

Finetuned

HuggingFaceH4/zephyr-7b-alpha

Quantized

mzhaoshuai/zephyr-7b-alpha-conf-sft

Adapter

(1)

this model

Dataset used to train mzhaoshuai/zephyr-7b-alpha-conf-refalign

Collection including mzhaoshuai/zephyr-7b-alpha-conf-refalign

RefAlign: RL with Similarity-based Rewards

Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data. • 19 items • Updated Oct 30, 2025 • 1

Paper for mzhaoshuai/zephyr-7b-alpha-conf-refalign

Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data

Paper • 2504.09895 • Published Apr 14, 2025 • 1