Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Ricardo-H
's Collections
BehR: Behavior-Consistent World Models
alfworld-dual-token-0416
ws-wm-0410ministral
grpo-alfworld-0410
ws-wm-crossjudge-llama-0406
rlvr-f1-llama-textworld-f1
rlvr-f1-llama-webshop-f1
rlvr-f1
ws-wm-0314
ws-wm-f1-0314
ws-wm-llama-0227
ws-wm-0224
ws-wm-0224
updated
Mar 2
Exp3-FactR: FactR-Only GRPO ablation. Exponential(a=1), behavior=0, facts=1. 240 steps, Qwen2.5-7B.
Upvote
-
Ricardo-H/ws-wm-0224-step-100
8B
•
Updated
Feb 26
•
2
Ricardo-H/ws-wm-0224-step-120
8B
•
Updated
Feb 26
•
10
Ricardo-H/ws-wm-0224-step-140
Updated
Feb 26
Ricardo-H/ws-wm-0224-step-160
8B
•
Updated
Feb 26
•
1
Ricardo-H/ws-wm-0224-step-180
8B
•
Updated
Feb 26
•
1
Ricardo-H/ws-wm-0224-step-200
8B
•
Updated
Feb 26
•
1
Ricardo-H/ws-wm-0224-step-220
8B
•
Updated
Feb 26
•
1
Ricardo-H/ws-wm-0224-step-240
8B
•
Updated
Feb 26
•
1
Upvote
-
Share collection
View history
Collection guide
Browse collections