JW17 commited on
Commit
0b91686
·
verified ·
1 Parent(s): 2208252

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -25
README.md CHANGED
@@ -1,25 +1 @@
1
- ---
2
- title: README
3
- emoji: 🔥
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: static
7
- pinned: false
8
- ---
9
-
10
- <div align="center">
11
-
12
- # **In-Context Bayesian Reward Modeling for Test-Time Steerability**
13
-
14
- ## Authors: [**Jiwoo Hong**](https://jiwooya1000.github.io/)\*, [**Shao Tang**](https://www.linkedin.com/in/tshao/)\*, [**Zhipeng Wang**](https://www.linkedin.com/in/zhipeng-jason-wang-phd-66806816/), [**Aman Gupta**](https://www.linkedin.com/in/aman-gupta1/)
15
-
16
- ![icrm_background](https://cdn-uploads.huggingface.co/production/uploads/6415c043486c7c9a5d151583/FblAJ6J5V9HWOseNhjp6N.png)
17
-
18
- </div>
19
-
20
- <div align="center">
21
- <a href=https://huggingface.co/collections/ICRM/variational-in-context-learning-reward-models-icrm-68da6e5acdf4cb84972c0528 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg style="height:30px;margin-right:10px;"></a>
22
- <a href=https://github.com/LinkedIn-XFACT/icrm target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github style="height:30px;margin-right:10px;"></a>
23
- </div>
24
-
25
- This is Hugging Face organization to host the models and dataset for the paper "***In-Context Bayesian Reward Modeling for Test-Time Steerability***." We propose the variational reward modeling objective, **Variational In-Context Reward Modeling (ICRM)**, that yields test-time steerability of classifier reward models via in-context preference demonstrations. ICRM casts reward modeling as amortized variational inference over a latent preference probability conditioned on few-shot, in-context preference demonstrations, with a conjugate Beta prior on the Bradley-Terry model. ICRM employs a two-headed regressor that decouples a preference mean from a confidence factor, jointly parameterizing a Beta posterior given demonstrations and enabling **test-time steerability of RMs**.
 
1
+ TBU