qihoo360/Light-R1-32B-DS
Text Generation • 33B • Updated • 17 • 15
None defined yet.
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning