Dokyoon's picture

Dokyoon

leeloolee

·

Eruly

AI & ML interests

ai

Recent Activity

reacted to anakin87's post with ❤️ 3 days ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a dataset 17 days ago

InternScience/ResearchClawBench

liked a model 17 days ago

rl-research/DR-Tulu-8B-results

View all activity

Organizations

leeloolee 's models 46

leeloolee/oss-math

21B • Updated Sep 29, 2025 • 3

leeloolee/gkd-model

Updated Jan 21, 2025

leeloolee/intention

Sentence Similarity • 0.3B • Updated Sep 7, 2024 • 18 • 4

leeloolee/online_dpo_gemma

Updated Aug 6, 2024

leeloolee/models-moved

Updated Aug 6, 2024

leeloolee/0806

Updated Aug 6, 2024

leeloolee/online_dpo_0805

3B • Updated Aug 5, 2024 • 1

leeloolee/online_dpo_02_18_48

Updated Aug 3, 2024

leeloolee/online_dpo_02_15_23

Updated Aug 3, 2024

leeloolee/online_dpo_02_12_38

Updated Aug 3, 2024

leeloolee/online_dpo_02_08_24

Updated Aug 3, 2024

leeloolee/online_dpo_18_17_12

Updated Aug 2, 2024

leeloolee/online_dpo_18_12_29

Updated Aug 2, 2024

leeloolee/online_dpo_18_07_16

Updated Aug 2, 2024

leeloolee/online_dpo_17_54_18

Updated Aug 2, 2024

leeloolee/online_dpo_17_51_00

Updated Aug 2, 2024

leeloolee/online_dpo_17_46_38

Updated Aug 2, 2024

leeloolee/online_dpo_17_44_03

Updated Aug 2, 2024

leeloolee/online_dpo_17_40_37

Updated Aug 2, 2024

leeloolee/online_dpo_17_37_45

Updated Aug 2, 2024

leeloolee/online_dpo_17_30_28

Updated Aug 2, 2024

leeloolee/online_dpo_17_28_23

Updated Aug 2, 2024

leeloolee/online_dpo_17_25_04

Updated Aug 2, 2024

leeloolee/online_dpo_17_22_53

Updated Aug 2, 2024

leeloolee/online_dpo_17_19_30

Updated Aug 2, 2024

leeloolee/online_dpo_17_17_07

Updated Aug 2, 2024

leeloolee/online_dpo_17_15_33

Updated Aug 2, 2024

leeloolee/online_dpo_17_12_52

Updated Aug 2, 2024

leeloolee/online_dpo_17_09_40

Updated Aug 2, 2024

leeloolee/online_dpo_17_05_41

Updated Aug 2, 2024