Spaces:

ICRM
/

README

Configuration error

App Files Files Community

JW17 commited on Sep 29, 2025

Commit

0b91686

verified ·

1 Parent(s): 2208252

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -25

README.md CHANGED Viewed

@@ -1,25 +1 @@
----
-title: README
-emoji: 🔥
-colorFrom: indigo
-colorTo: indigo
-sdk: static
-pinned: false
----
-<div align="center">
-# **In-Context Bayesian Reward Modeling for Test-Time Steerability**
-## Authors: [**Jiwoo Hong**](https://jiwooya1000.github.io/)\*, [**Shao Tang**](https://www.linkedin.com/in/tshao/)\*, [**Zhipeng Wang**](https://www.linkedin.com/in/zhipeng-jason-wang-phd-66806816/), [**Aman Gupta**](https://www.linkedin.com/in/aman-gupta1/)
-![icrm_background](https://cdn-uploads.huggingface.co/production/uploads/6415c043486c7c9a5d151583/FblAJ6J5V9HWOseNhjp6N.png)
-</div>
-<div align="center">
-  <a href=https://huggingface.co/collections/ICRM/variational-in-context-learning-reward-models-icrm-68da6e5acdf4cb84972c0528 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg style="height:30px;margin-right:10px;"></a>
-  <a href=https://github.com/LinkedIn-XFACT/icrm target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github style="height:30px;margin-right:10px;"></a>
-</div>
-This is Hugging Face organization to host the models and dataset for the paper "***In-Context Bayesian Reward Modeling for Test-Time Steerability***." We propose the variational reward modeling objective, **Variational In-Context Reward Modeling (ICRM)**, that yields test-time steerability of classifier reward models via in-context preference demonstrations. ICRM casts reward modeling as amortized variational inference over a latent preference probability conditioned on few-shot, in-context preference demonstrations, with a conjugate Beta prior on the Bradley-Terry model. ICRM employs a two-headed regressor that decouples a preference mean from a confidence factor, jointly parameterizing a Beta posterior given demonstrations and enabling **test-time steerability of RMs**.


1	+ TBU