Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

mzhaoshuai
/
zephyr-7b-alpha-conf-refalign

PEFT
Safetensors
mistral
refalign
Model card Files Files and versions
xet
Community
  • RefAlign: RL with Similarity-based Rewards
    • Framework versions

RefAlign: RL with Similarity-based Rewards

GitHub repository: https://github.com/mzhaoshuai/RefAlign

Paper: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.

RefAlign Training with https://huggingface.co/datasets/shuchangtao/CONQORD_dataset/tree/main/conqord_step3_data.

Framework versions

  • PEFT 0.11.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mzhaoshuai/zephyr-7b-alpha-conf-refalign

Base model

mistralai/Mistral-7B-v0.1
Finetuned
HuggingFaceH4/zephyr-7b-alpha
Quantized
mzhaoshuai/zephyr-7b-alpha-conf-sft
Adapter
(1)
this model

Dataset used to train mzhaoshuai/zephyr-7b-alpha-conf-refalign

shuchangtao/CONQORD_dataset

Preview • Updated Aug 12, 2024 • 19 • 1

Collection including mzhaoshuai/zephyr-7b-alpha-conf-refalign

RefAlign: RL with Similarity-based Rewards

Collection
Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data. • 19 items • Updated Oct 30, 2025 • 1

Paper for mzhaoshuai/zephyr-7b-alpha-conf-refalign

Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data

Paper • 2504.09895 • Published Apr 14, 2025 • 1
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs