matlok 's Collections

Papers - RLHF - Iterative Contrastive Self-Improvement

A batched on-policy algorithm that conducts self-improvement iteratively via contrastive learning