Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
117
324
Dokyoon
leeloolee
Follow
kaki-paper's profile picture
etri-econ-llm's profile picture
easypyeong's profile picture
12 followers
·
125 following
Eruly
AI & ML interests
ai
Recent Activity
reacted
to
anakin87
's
post
with ❤️
3 days ago
📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
liked
a dataset
17 days ago
InternScience/ResearchClawBench
liked
a model
17 days ago
rl-research/DR-Tulu-8B-results
View all activity
Organizations
leeloolee
's models
46
Sort: Recently updated
leeloolee/oss-math
21B
•
Updated
Sep 29, 2025
•
3
leeloolee/gkd-model
Updated
Jan 21, 2025
leeloolee/intention
Sentence Similarity
•
0.3B
•
Updated
Sep 7, 2024
•
18
•
4
leeloolee/online_dpo_gemma
Updated
Aug 6, 2024
leeloolee/models-moved
Updated
Aug 6, 2024
leeloolee/0806
Updated
Aug 6, 2024
leeloolee/online_dpo_0805
3B
•
Updated
Aug 5, 2024
•
1
leeloolee/online_dpo_02_18_48
Updated
Aug 3, 2024
leeloolee/online_dpo_02_15_23
Updated
Aug 3, 2024
leeloolee/online_dpo_02_12_38
Updated
Aug 3, 2024
leeloolee/online_dpo_02_08_24
Updated
Aug 3, 2024
leeloolee/online_dpo_18_17_12
Updated
Aug 2, 2024
leeloolee/online_dpo_18_12_29
Updated
Aug 2, 2024
leeloolee/online_dpo_18_07_16
Updated
Aug 2, 2024
leeloolee/online_dpo_17_54_18
Updated
Aug 2, 2024
leeloolee/online_dpo_17_51_00
Updated
Aug 2, 2024
leeloolee/online_dpo_17_46_38
Updated
Aug 2, 2024
leeloolee/online_dpo_17_44_03
Updated
Aug 2, 2024
leeloolee/online_dpo_17_40_37
Updated
Aug 2, 2024
leeloolee/online_dpo_17_37_45
Updated
Aug 2, 2024
leeloolee/online_dpo_17_30_28
Updated
Aug 2, 2024
leeloolee/online_dpo_17_28_23
Updated
Aug 2, 2024
leeloolee/online_dpo_17_25_04
Updated
Aug 2, 2024
leeloolee/online_dpo_17_22_53
Updated
Aug 2, 2024
leeloolee/online_dpo_17_19_30
Updated
Aug 2, 2024
leeloolee/online_dpo_17_17_07
Updated
Aug 2, 2024
leeloolee/online_dpo_17_15_33
Updated
Aug 2, 2024
leeloolee/online_dpo_17_12_52
Updated
Aug 2, 2024
leeloolee/online_dpo_17_09_40
Updated
Aug 2, 2024
leeloolee/online_dpo_17_05_41
Updated
Aug 2, 2024
Previous
1
2
Next