·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent_sokoban-grpo-coef0.9-False_qwen2.5_vl_3b-150step
ZHLiu627/verl_agent_sokoban-grpo-coef1.1-False_qwen2.5_vl_3b-150step
ZHLiu627/verl_agent_alfworld-GRPO-int-reward_False-Llama-3.1-8B-Instruct-50step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-int-reward_False-Llama-3.1-8B-Instruct-100step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-GRPO-int-reward_False-Llama-3.1-8B-Instruct-50step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-GRPO-int-reward_False-Llama-3.1-8B-Instruct-100step
8B • Updated • 1
ZHLiu627/verl_agent_sokoban-raft-False_qwen2.5_vl_3b-150step
ZHLiu627/verl_agent_sokoban-grpo-False_qwen2.5_vl_3b-50step
ZHLiu627/verl_agent_sokoban-grpo-False_qwen2.5_vl_3b-100step
ZHLiu627/verl_agent_webshop-GRPO-int-reward_False-Llama-3.1-8B-Instruct-150step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-int-reward_False-Llama-3.1-8B-Instruct-150step
8B • Updated • 1
ZHLiu627/verl_agent_sokoban-grpo-False_qwen2.5_vl_3b-150step
ZHLiu627/zephyr-7b-gemma-rpo-avg
9B • Updated ZHLiu627/beta_ultra_rdpo_full_eta0.005_beta0.01_no_decay_new
7B • Updated • 1
0.6B • Updated ZHLiu627/zephyr-7b-gemma-dpo
ZHLiu627/zephyr-gemma-rpo
Text Generation
• 9B • Updated • 8
ZHLiu627/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 1
ZHLiu627/zephyr-7b-rdpo-full-eta0.005-beta0.1
Updated