MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 18 days ago • 22
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 18 days ago • 22
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 18 days ago • 22
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-llama3-8b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-qwen2.5-7b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-llama3-8b-gpro-step64 Text Generation • 8B • Updated Oct 9, 2025 • 2