ZYao720/WebArbiter-8B-Qwen3
Text Generation • 8B • Updated • 157
WebArbiter process reward models for web agents. Reasoning distillation + RL. ICLR 2026.
Note Strongest — Avg. BoN Acc 76.66% (Qwen3-8B backbone)
Note Flagship paper model — Avg. BoN Acc 74.60%, outperforms GPT-5 by 9.1 pts (Qwen2.5-7B-Instruct)
Note Efficient — Avg. BoN Acc 72.55% with ~half the params of 7B (Qwen3-4B)
Note Compact — Avg. BoN Acc 59.06%, outperforms WebShepherd-8B (Qwen2.5-3B-Instruct)