MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 19 days ago • 22
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models Paper • 2603.28590 • Published 19 days ago • 22
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-llama3-8b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-qwen2.5-7b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-llama3-8b-gpro-step64 Text Generation • 8B • Updated Oct 9, 2025 • 2
LRM-Conta-Detection-Arena/sft-conta-llama3-8b-gpro-step64 Text Generation • 8B • Updated Oct 9, 2025 • 2
LRM-Conta-Detection-Arena/sft-conta-llama3-8b-gpro-step64 Text Generation • 8B • Updated Oct 9, 2025 • 2
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-qwen2.5-7b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-qwen2.5-7b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-llama3-8b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-deepseek-distill-llama3-8b Text Generation • 8B • Updated Oct 9, 2025 • 3
LRM-Conta-Detection-Arena/sft-conta-qwen2.5-7b-grpo-step64 Text Generation • 8B • Updated Oct 6, 2025 • 7
LRM-Conta-Detection-Arena/sft-conta-qwen2.5-7b-grpo-step64 Text Generation • 8B • Updated Oct 6, 2025 • 7