CaRR & C-GRPO - a THU-KEG Collection

THU-KEG 's Collections

OpenSAE-LLaMA-3.1-8B

CaRR & C-GRPO

updated 28 days ago

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".