·
AI & ML interests
Reinforcement Learning
Organizations
luckeciano/Qwen-2.5-7B-Missing-Response-RL-Baseline
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Len-Penalty-Baseline
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Answer-Entropy-RL-0.4
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Answer-Entropy-RL-0.1
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Len-Penalty
Updated
luckeciano/Qwen-2.5-7B-Answer-Entropy-RL-1
Updated
luckeciano/Qwen-2.5-0.5B-Instruct-Answer-Entropy-RL
Updated
luckeciano/Qwen-2.5-7B-Embedding-Entropy-RL-0.25
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Embedding-Entropy-RL-0.1
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Embedding-Entropy-RL-Len-Penalty
Text Generation
• 8B • Updated • 4
luckeciano/Qwen-2.5-1.5B-Simple-RL
Text Generation
• 2B • Updated • 4
luckeciano/Qwen-2.5-0.5B-Instruct-Simple-RL
Updated
luckeciano/pku-alpaca3.1-8b-gt-reward-model
Updated
luckeciano/pku-alpaca3.1-8b-gt-rewards
Updated
luckeciano/merged-hermes-reward-model-reddit
Text Classification
• 7B • Updated • 2
luckeciano/merged-llama7b-reward-model-reddit
Text Classification
• Updated • 2
luckeciano/merged-gpt2-xl-sft-reddit
Text Generation
• Updated • 5
luckeciano/merged-llama-sft-reddit
Text Generation
• Updated • 5