ZYao720/WEBPRMBENCH
Viewer • Updated • 4.6k • 74 • 1
Benchmark, training data, and search trajectories for WebArbiter. ICLR 2026.
Note WebPRMBench — First comprehensive WebPRM evaluation benchmark. 1,150 instances across 4 web environments.
Note Training Data — Two-stage: 9,642 SFT examples (distilled from o3) + 18,921 RL preference pairs (GRPO).
Note Search Trajectories — 72 reward-guided search trajectories on WebArena-Lite (5 websites).