ws-wm-0224 - a Ricardo-H Collection

Ricardo-H 's Collections

BehR: Behavior-Consistent World Models

alfworld-dual-token-0416

ws-wm-0410ministral

grpo-alfworld-0410

ws-wm-crossjudge-llama-0406

rlvr-f1-llama-textworld-f1

rlvr-f1-llama-webshop-f1

ws-wm-llama-0227

ws-wm-0224

updated Mar 2

Exp3-FactR: FactR-Only GRPO ablation. Exponential(a=1), behavior=0, facts=1. 240 steps, Qwen2.5-7B.