117 324

Dokyoon

leeloolee

Eruly

AI & ML interests

Recent Activity

reacted to anakin87's post with ❤️ 8 days ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

liked a dataset 22 days ago

InternScience/ResearchClawBench

liked a model 22 days ago

rl-research/DR-Tulu-8B-results

View all activity

Organizations

upvoted a paper 22 days ago

Grounding Everything in Tokens for Multimodal Large Language Models

Paper • 2512.10554 • Published Dec 11, 2025 • 1

upvoted a paper 24 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 25 days ago • 29

upvoted 2 articles 25 days ago

Article

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

Jan 20

•

Article

Build a Domain-Specific Embedding Model in Under a Day

28 days ago

•

upvoted a paper 26 days ago

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published Mar 17 • 109

upvoted a collection about 1 month ago

Activation Oracles

Collection

12 items • Updated Dec 26, 2025 • 18

upvoted a paper about 1 month ago

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Paper • 2603.01562 • Published Mar 2 • 63

upvoted 2 papers about 2 months ago

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Paper • 2602.22766 • Published Feb 26 • 44

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 50

upvoted an article 2 months ago

Article

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

Feb 12

•

upvoted 2 papers 2 months ago

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Paper • 2602.10604 • Published Feb 11 • 195

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published Jan 27 • 18

upvoted an article 3 months ago

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Jan 27

•

upvoted 6 papers 3 months ago

VISTA-PATH: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology

Paper • 2601.16451 • Published Jan 23 • 3

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Paper • 2601.14243 • Published Jan 20 • 23

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 195

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Paper • 2601.02256 • Published Jan 5 • 33

Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News

Paper • 2410.20198 • Published Oct 26, 2024 • 1

upvoted an article 3 months ago

Article

Building Deep Research: How we Achieved State of the Art

Nov 24, 2025

•

Dokyoon

AI & ML interests

Recent Activity

Organizations

leeloolee's activity

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

Build a Domain-Specific Embedding Model in Under a Day

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Building Deep Research: How we Achieved State of the Art