WebArbiter - Models - a ZYao720 Collection

ZYao720 's Collections

WebArbiter - Models

updated 8 days ago

WebArbiter process reward models for web agents. Reasoning distillation + RL. ICLR 2026.

ZYao720/WebArbiter-8B-Qwen3

Text Generation • 8B • Updated 8 days ago • 157

Note Strongest — Avg. BoN Acc 76.66% (Qwen3-8B backbone)
ZYao720/WebArbiter-7B

Text Generation • 8B • Updated 8 days ago • 235

Note Flagship paper model — Avg. BoN Acc 74.60%, outperforms GPT-5 by 9.1 pts (Qwen2.5-7B-Instruct)
ZYao720/WebArbiter-4B-Qwen3

Text Generation • 4B • Updated 8 days ago • 157

Note Efficient — Avg. BoN Acc 72.55% with ~half the params of 7B (Qwen3-4B)
ZYao720/WebArbiter-3B

Text Generation • 3B • Updated 8 days ago • 154

Note Compact — Avg. BoN Acc 59.06%, outperforms WebShepherd-8B (Qwen2.5-3B-Instruct)