In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a Space 3 days ago

HuggingFaceTB/trl-distillation-trainer

repliedto their post 4 days ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to theirpost with 😎 4 days ago

🌀 Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂 🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- 🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

View all activity

Organizations

upvoted an article 6 days ago

Article

Multimodal Embedding & Reranker Models with Sentence Transformers

6 days ago

•

upvoted a collection 7 days ago

LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕

Collection

Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq • 8 items • Updated 7 days ago • 2

upvoted an article 13 days ago

Article

Training mRNA Language Models Across 25 Species for $165

15 days ago

•

upvoted a collection 13 days ago

Gemma 4

Collection

8 items • Updated 13 days ago • 603

upvoted an article 15 days ago

Article

TRL v1.0: Post-Training Library Built to Move with the Field

15 days ago

•

upvoted a paper 27 days ago

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

Paper • 2602.17316 • Published Feb 19 • 1

upvoted 2 collections about 1 month ago

Zagreus - Nesso fine tuned

Collection

The collection contains three bilingual English/Italian SLMs post-trained on Zagreus-0.4B-ita: instruct, agentic, and a fully open-source • 3 items • Updated Mar 4 • 3

Zagreus 0.4B

Collection

The Zagreus-0.4B collection contains four bilingual English + Romance language foundational SLMs (~400M parameters) trained from scratch • 4 items • Updated Mar 4 • 6

upvoted a paper about 1 month ago

Transformer Layers as Painters

Paper • 2407.09298 • Published Jul 12, 2024 • 16

upvoted an article about 1 month ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

Mar 10

•

124

upvoted a collection about 1 month ago

Qwen3.5-text-only

Collection

Text-only versions of Qwen-3.5 without the vision encoders for a smaller memory and storage footprint. • 4 items • Updated 7 days ago • 14

upvoted an article about 1 month ago

Article

The ML Engineer's Guide to Protein AI

Mar 3

•

upvoted an article about 2 months ago

Article

Bringing Autonomous Driving RL to OpenEnv and TRL

Feb 26

•

upvoted an article 2 months ago

Article

From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

Feb 7

•

upvoted a collection 3 months ago

ScopeGuard-2601

Collection

https://principled-intelligence.com/news/introducing-scope-guard • 3 items • Updated 7 days ago • 7

upvoted an article 3 months ago

Article

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU

Jan 2

•

upvoted a collection 3 months ago

🧮functiongemma ft mobile-actions

Collection

A collection of functiongemma-270m-it models fine-tuned on mobile actions dataset for Spanish, French and Italian • 3 items • Updated Jan 5 • 3

upvoted 2 collections 5 months ago

INTELLECT-3

Collection

INTELLECT-3: A 100B+ MoE trained with large-scale RL • 5 items • Updated Feb 18 • 12

SYNTH

Collection

Fully generalist synthetic dataset and SOTA small reasoners • 3 items • Updated Nov 10, 2025 • 12

upvoted a paper 6 months ago

Extracting alignment data in open models

Paper • 2510.18554 • Published Oct 21, 2025 • 10

Stefano Fiorucci PRO

AI & ML interests

Recent Activity

Organizations

anakin87's activity

Multimodal Embedding & Reranker Models with Sentence Transformers

Training mRNA Language Models Across 25 Species for $165

TRL v1.0: Post-Training Library Built to Move with the Field

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

The ML Engineer's Guide to Protein AI

Bringing Autonomous Driving RL to OpenEnv and TRL

From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

The Engineering Handbook for GRPO + LoRA with Verl: Training Qwen2.5 on Multi-GPU