·
AI & ML interests
Reinforcement Learning
Organizations
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-5e-8-v2_8294
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-1e-7-v2_9887
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-5e-6-v2_2924
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-1e-5-v2_5726
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-2.5e-6-v2_1641
Updated
luckeciano/Llama-3.1-8B-Instruct-GRPO-Base-LR-1e-6-v2_2983
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_6133
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_6180
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_5482
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_4026
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_4522
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_6803
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_7688
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_9852
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_7924
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_4389
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_2361
Updated
luckeciano/Qwen-2.5-7B-Simple-RL-v2-LogShifts_8556
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-4-HessianMaskToken-0.0_5932
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-5-HessianMaskToken-0.0_5682
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-6-HessianMaskToken-0.0_7954
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-7-HessianMaskToken-0.0_5825
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-0.1-HessianMaskToken-1e-4_4049
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-1e-2-HessianMaskToken-1.0_7703
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-0.1-HessianMaskToken-1e-5_1148
Updated
luckeciano/Llama-3.1-8B-Instruct-CAPO-Base-v2-FisherMaskToken-0.1-HessianMaskToken-0.1_1761
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShifts_9500
Updated
luckeciano/Qwen-2.5-7B-GRPO-Base-v2-LogShifts_9376
Updated
luckeciano/Qwen-2.5-7B-GRPO-Adam-FisherMaskToken-1e-4-HessianMaskToken-0.01-v2-LogShifts_8457
Updated
luckeciano/Qwen-2.5-7B-GRPO-Base-v2-LogShifts_4507
Updated