Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
THU-KEG 's Collections
CaRR & C-GRPO
WildReward
LLaDA-8B-BGPO
DeepPrune
SIRI
VerIF
AdaptThink
LongWriter-V
OpenSAE-LLaMA-3.1-8B
Crab
ADELIE

CaRR & C-GRPO

updated 28 days ago

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".

Upvote
1

  • THU-KEG/CaRR-DeepDive

    Preview • Updated 28 days ago • 367 • 1

  • THU-KEG/DeepDive-4B-SFT

    4B • Updated 28 days ago • 41

  • THU-KEG/DeepDive-4B-C-GRPO

    4B • Updated 28 days ago • 13

  • THU-KEG/DeepDive-30B-A3B-SFT

    31B • Updated 28 days ago • 14

  • THU-KEG/DeepDive-30B-A3B-C-GRPO

    31B • Updated 28 days ago • 15

  • Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

    Paper • 2601.06021 • Published Jan 9 • 48
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs