Benchmark, training data, and search trajectories for WebArbiter. ICLR 2026.
Yao Zhang
ZYao720
AI & ML interests
None yet
Recent Activity
updated a model 5 days ago
ZYao720/WebArbiter-8B-Qwen3 updated a model 5 days ago
ZYao720/WebArbiter-4B-Qwen3 updated a model 5 days ago
ZYao720/WebArbiter-7BOrganizations
WebArbiter
Reasoning Process Reward Model for Web Agents. Models, data, and WebPRMBench. ICLR 2026.
-
WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
Paper • 2601.21872 • Published • 1 -
ZYao720/WebArbiter-8B-Qwen3
Text Generation • 8B • Updated • 152 -
ZYao720/WebArbiter-7B
Text Generation • 8B • Updated • 150 -
ZYao720/WebArbiter-4B-Qwen3
Text Generation • 4B • Updated • 153
WebArbiter - Datasets
Benchmark, training data, and search trajectories for WebArbiter. ICLR 2026.
WebArbiter - Models
WebArbiter process reward models for web agents. Reasoning distillation + RL. ICLR 2026.
WebArbiter
Reasoning Process Reward Model for Web Agents. Models, data, and WebPRMBench. ICLR 2026.
-
WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
Paper • 2601.21872 • Published • 1 -
ZYao720/WebArbiter-8B-Qwen3
Text Generation • 8B • Updated • 152 -
ZYao720/WebArbiter-7B
Text Generation • 8B • Updated • 150 -
ZYao720/WebArbiter-4B-Qwen3
Text Generation • 4B • Updated • 153