14 1

Chanuk Lee

tally0818

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Self-Distilled RLVR

updated a model 7 days ago

tally0818/GRPO_Branch_16_eps20_3b_lr_bsz

published a model 7 days ago

tally0818/GRPO_Branch_16_eps20_3b_lr_bsz

View all activity

Organizations

None yet

upvoted a paper 6 days ago

Self-Distilled RLVR

Paper • 2604.03128 • Published 11 days ago • 158

updated a model 7 days ago

tally0818/GRPO_Branch_16_eps20_3b_lr_bsz

Text Generation • 3B • Updated 7 days ago • 510

published a model 7 days ago

tally0818/GRPO_Branch_16_eps20_3b_lr_bsz

Text Generation • 3B • Updated 7 days ago • 510

updated a model 7 days ago

tally0818/GRPO_16_eps20_3b_lr_bsz

Text Generation • 3B • Updated 7 days ago • 605

published a model 7 days ago

tally0818/GRPO_16_eps20_3b_lr_bsz

Text Generation • 3B • Updated 7 days ago • 605

updated a model 7 days ago

tally0818/branch_bsz_lr

Text Generation • 3B • Updated 7 days ago • 41

published a model 7 days ago

tally0818/branch_bsz_lr

Text Generation • 3B • Updated 7 days ago • 41

updated a model 7 days ago

tally0818/grupo_bsz_lr

Text Generation • 3B • Updated 7 days ago • 6

published a model 7 days ago

tally0818/grupo_bsz_lr

Text Generation • 3B • Updated 7 days ago • 6

upvoted 2 papers 19 days ago

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

Paper • 2603.22341 • Published 24 days ago • 37

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Paper • 2601.08763 • Published Jan 13 • 150

upvoted a paper 21 days ago

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Paper • 2603.12151 • Published Mar 12 • 2

upvoted 2 papers about 1 month ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 151

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Paper • 2603.09827 • Published Mar 10 • 30

upvoted 2 papers about 2 months ago

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Paper • 2602.17602 • Published Feb 19 • 56

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146

upvoted 3 papers 2 months ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 126

Self-Hinting Language Models Enhance Reinforcement Learning

Paper • 2602.03143 • Published Feb 3 • 31

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published Feb 4 • 37

liked a Space 2 months ago

Music Flamingo

🎵

167

Analyze music and answer questions from audio or YouTube links

Chanuk Lee

AI & ML interests

Recent Activity

Organizations

tally0818's activity

Music Flamingo