YOULING HUANG
Ricardo-H
·
AI & ML interests
None yet
Recent Activity
updated a collection about 23 hours ago
ws-wm-0410ministral updated a model about 23 hours ago
Ricardo-H/ws-wm-0410ministral-step-100 published a model about 23 hours ago
Ricardo-H/ws-wm-0410ministral-step-100Organizations
None yet
ws-wm-crossjudge-llama-0406
rlvr-f1-llama-webshop-f1
ws-wm-0314
ws-wm-llama-0227
WebShop World Model - LLaMA3.1-8B BehR-Only GRPO checkpoints (2026-02-27)
ws-wm-0410ministral
grpo-alfworld-0410
ws-wm-crossjudge-llama-0406
rlvr-f1-llama-textworld-f1
rlvr-f1-llama-webshop-f1
rlvr-f1
RLVR-World style Token F1 reward ablation models for BehR-WM rebuttal experiments
ws-wm-0314
ws-wm-f1-0314
ws-wm-llama-0227
WebShop World Model - LLaMA3.1-8B BehR-Only GRPO checkpoints (2026-02-27)
ws-wm-0224
Exp3-FactR: FactR-Only GRPO ablation. Exponential(a=1), behavior=0, facts=1. 240 steps, Qwen2.5-7B.