OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
Paper • 2603.28858 • Published • 9
None defined yet.
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
Structured Document Translation via Format Reinforcement Learning