In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

liked a Space 1 minute ago

HuggingFaceTB/trl-distillation-trainer

repliedto their post 1 day ago

📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to theirpost with 😎 1 day ago

🌀 Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂 🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- 🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

View all activity

Organizations

repliedto their post 1 day ago

🎥 For a video walkthrough, check out "Let LLMs Wander - Engineering RL Environments": https://www.youtube.com/watch?v=71V3fTaUp2Q

reactedto their post with 😎 1 day ago

Post

1619

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

posted an update 1 day ago

Post

1619

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reactedto their post with 🔥 3 days ago

Post

3252

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

reactedto their post with ❤️ 4 days ago

Post

3252

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

posted an update 4 days ago

Post

3252

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

reactedto RakshitAralimatti's post with 👍 4 days ago

Post

1526

🔥 GLM-5.1 (zai-org/GLM-5.1) — Quietly One of the Best flagship model for agentic engineering and Coding tasks Right Now

threw some LangGraph agent code at it, a messy RAG pipeline, some async Python stuff and it just handled it. no drama, no hallucinated methods, actually usable output on the first try.

open source closing the gap this fast is genuinely exciting. go check zai-org/GLM-5.1 on HF if you haven't already

Good work @zai-org-3

1 reply

posted an update 4 months ago

Post

392

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

posted an update 5 months ago

Post

470

I made a visualization based on the Prime Intellect INTELLECT-3 technical report.

Wild to see how far they pushed GLM-4.5-Air-Base with SFT + RL.
SOTA for its size and competitive with models 3x larger.

All open.

Congrats on the release!

Model: PrimeIntellect/INTELLECT-3
Technical report: https://storage.googleapis.com/intellect-3-paper/INTELLECT_3_Technical_Report.pdf
Chat: https://chat.primeintellect.ai/

posted an update 5 months ago

Post

2896

LLMs can leak their post-training data (RL included) 💧

New interesting paper on this topic from Google DeepMind: Extracting alignment data in open models (2510.18554)

It's known that Language Models memorize data that can be extracted via prompting.

In this paper, the authors investigate this aspect:
- using open models, where prompting can be fully customized by the user, including special tokens.
- focusing on open-source models like Olmo, where full training data is available.

📤 How do they extract data?

During post-training (like SFT), new tokens such as <|user|> are introduced.

The authors hypothesize prompting the model with these tokens can make it output its alignment data (remember Magpie?).

For example, for SFT, their extraction prompt is <|endoftext|><|user|>.

📏 Evaluating memorization

The authors compare each sampled example with the original data using vector search with embedding similarity.

They find that many outputs are semantically very similar to the original data, even if the exact words differ.

Traditional string-matching algorithms underestimate memorization by 10x.

🔁 What about RL?

Surprisingly, the same technique works to extract data from Reinforcement Learning (PPO/GRPO) phases.

This is counter-intuitive because the RL objective is not designed to increase sequence likelihoods (unlike SFT).

Practical limitation: in this case, extraction relies on using the initial part of the training prompt, which is not generally public.

📈 Are the extracted data effective for post-training?

Both in SFT and RL, the extracted data can be used to fine-tune models to similar performance to the originals.

The authors suggest that model distillation, where a stronger model is used to drive the training of a weaker one, may be a form of indirect training on the original dataset.

posted an update 7 months ago

Post

504

Your Language Model needs better (open) environments to learn 🌀

📝 https://huggingface.co/blog/anakin87/environments-hub

RL environments help LLMs practice, reason, and improve.
I explored the Environments Hub and wrote a walkthrough showing how to train and evaluate models using these open environments.

1️⃣ 𝗪𝗵𝘆 𝗥𝗟 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀

DeepSeek-R1 made clear that Reinforcement Learning can be used to incentivize reasoning in LLMs.
In GRPO, the model generates multiple answers and learns to prefer the better ones from rewards.

2️⃣ 𝗪𝗵𝗮𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀 𝗮𝗿𝗲
In classic RL, the environment is the world where the Agent lives, interacts, and get rewards to learn.

We can also think of them as software packages, containing data, harness and scoring rules - for the model
to learn and be evaluated.

Nowadays, the Agent is not just the LLM. It can use tools, from a weather API to a terminal.

This makes environments for training and evaluation more complex and critical.

3️⃣ 𝐓𝐡𝐞 𝐨𝐩𝐞𝐧 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞

Big labs are advancing, but open models and the community still face a fragmented ecosystem.
We risk becoming users of systems built with tools we can't access or fully understand.

4️⃣ 𝐄𝐧𝐯𝐢𝐫𝐨𝐧𝐦𝐞𝐧𝐭𝐬 𝐇𝐮𝐛
That's why, I was excited when Prime Intellect released the Environments Hub.

It's a place where people share RL environments: tasks you can use to train LLMs with RL (GRPO-style) or evaluate Agents.
Plus, the Verifiers library (@willcb ) standardizes the creation of RL environments and evaluations.
They can help to keep science and experimentation open. 🔬

I explored the Hub and wrote a hands-on walkthrough 📝
- RL + LLMs basics
- Environments Hub navigation
- Evaluating models/Agents
- GRPO Training a tiny model on an alphabetical sort task

Take a look!

📝 https://huggingface.co/blog/anakin87/environments-hub

reactedto sergiopaniego's post with 🔥 7 months ago

Post

3967

You can now supercharge your TRL training pipelines with kernels

👷 kernels is new library to load optimized compute kernels directly from the Hub

Combined with TRL, it makes you developer experience smoother & faster.

Check out the new guide to learn more! 🕺

Learn ➡️ https://huggingface.co/docs/trl/main/en/kernels_hub

posted an update 8 months ago

Post

4745

Want to quickly try Gemma 3 270m? 💎💬

I made a simple Space to do that: anakin87/gemma-3-270m-it

⚡ Fast: Flash Attention, Zero GPU
⚙️ Configurable

posted an update 8 months ago

Post

403

🕵️🌐 Building Browser Agents - notebook

No API? No problem.
Browser Agents can use websites like you do: click, type, wait, read.

📓 Step-by-step notebook: https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/browser_agents.ipynb

🎥 In the video, the Agent:
- Goes to Hugging Face Spaces
- Finds black-forest-labs/FLUX.1-schnell
- Expands a short prompt ("my holiday on Lake Como") into a detailed image generation prompt
- Waits for the image
- Returns the image URL

## What else can it do?
Great for information gathering and summarization

🗞️🗞️ Compare news websites and create a table of shared stories with links
▶️ Find content creator social profiles from YouTube videos
🛍️ Find a product's price range on Amazon
🚂 🚌 Gather public transportation travel options

## How is it built?
🏗️ Haystack → Agent execution logic
🧠 Google Gemini 2.5 Flash → Good and fast LLM with a generous free tier
🛠️ Playwright MCP server → Browser automation tools: navigate, click, type, wait...

Even without vision capabilities, this setup can get quite far.

## Next steps
- Try a local open model
- Move from notebook to real deployment
- Incorporate vision

And you? Have you built something similar? What's in your stack?

reactedto mlabonne's post with 🔥 8 months ago

Post

6973

Liquid just released two 450M and 1.6B param VLMs!

They're super fast and leverage SigLIP2 NaFlex encoders to handle native resolutions without distortion. It's ideal for on-device deployment in constrained environments like phones.

It's available today on Hugging Face, with an inference and a fine-tuning Colab notebooks.

LiquidAI/LFM2-VL-450M
LiquidAI/LFM2-VL-1.6B

posted an update 8 months ago

Post

1097

Haystack can now see 👀

The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!

📓 Notebooks below

This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.

What's new?
🧠 Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming 🔜)
🎛️ Prompt template language to handle structured inputs, including images
📄 PDF and image converters
🔍 Image embedders using CLIP-like models
🧾 LLM-based extractor to pull text from images
🧩 Components to build multimodal RAG pipelines and Agents

I had the chance of leading this effort with @sjrhuschlee (great collab).

📓 Below you can find two notebooks to explore the new features:
󠁯•󠁏󠁏 Introduction to Multimodal Text Generation https://haystack.deepset.ai/cookbook/multimodal_intro
󠁯•󠁏󠁏 Creating Vision+Text RAG Pipelines https://haystack.deepset.ai/tutorials/46_multimodal_rag

(🖼️ image by @bilgeyucel )

posted an update 9 months ago

Post

456

🛡️ AI Guardrails with Open Language Models - Tutorial

📓 https://haystack.deepset.ai/cookbook/safety_moderation_open_lms

How do you ensure your AI application is safe from harmful or inappropriate user inputs?

This is a core requirement for real-world AI deployments. Luckily, several open Language Models are built specifically for safety moderation.

I've been exploring them and put together a hands-on tutorial using the Haystack framework to build your own AI guardrails.

In the notebook, you'll learn how to use and customize:
🔹 Meta Llama Guard (via Hugging Face API)
🔹 IBM Granite Guardian (via Ollama), which can also evaluate RAG specific risk dimensions
🔹 Google ShieldGemma (via Ollama)
🔹 Nvidia NemoGuard models family, including a model for topic control

You'll also see how to integrate content moderation into a 🔎 RAG pipeline.

reactedto andito's post with 👀 9 months ago

Post

4093

🧠👁️ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

🔧 Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

📈 And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)

4 replies

posted an update 10 months ago

Post

1231

🧰 Free up space on the Hub with super_squash_history 🧹

As you may know, Hugging Face Hub has storage limits on private repos (100 GB for free users, 1 TB for PROs).

This weekend I did some cleanup on my private repos
I went 1.58 TB down to 1 GB. 😅

Besides deleting old, unused models, the main tool I used was a lesser-known command:
super_squash_history.

When you train a model, you often push multiple checkpoints to the Hub.
Each checkpoint = a commit.
A 2.6B model in BF16 is ~5 GB.
So 10 checkpoints = 50 GB. That adds up fast.

While full commit history can be useful for rollbacks, it's often unnecessary for older experiments where only the final model matters.

In these cases, you can use super_squash_history: it reduces your entire repo history to a single commit.

https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_api#huggingface_hub.HfApi.super_squash_history

⚠️ super_squash_history is a non-revertible operation. Once squashed, the commit history cannot be retrieved.

Hope this is useful to others.

2 replies

reactedto as-cle-bert's post with ❤️ 12 months ago

Post

1983

One of the biggest challenges I've been facing since I started developing [𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks🫣

That's why today I'm excited to introduce 𝐫𝐞𝐚𝐝𝐞𝐫𝐬, the new feature of PdfItDown v1.4.0!🎉

With 𝘳𝘦𝘢𝘥𝘦𝘳𝘴, you can choose among three (for now👀) flavors of text extraction and conversion to PDF:

- 𝗗𝗼𝗰𝗹𝗶𝗻𝗴, which does a fantastic work with presentations, spreadsheets and word documents🦆

- 𝗟𝗹𝗮𝗺𝗮𝗣𝗮𝗿𝘀𝗲 by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables🦙

- 𝗠𝗮𝗿𝗸𝗜𝘁𝗗𝗼𝘄𝗻 by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)✒️

You can use this new feature in your python scripts (check the attached code snippet!😉) and in the command line interface as well!🐍

Have fun and don't forget to star the repo on GitHub ➡️ https://github.com/AstraBert/PdfItDown

Stefano Fiorucci PRO

AI & ML interests

Recent Activity

Organizations

anakin87's activity