Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
14
Andrii Petrykovskyi
Jupeppa
Follow
Gargaz's profile picture
Mi6paulino's profile picture
2 followers
·
7 following
AI & ML interests
None yet
Recent Activity
reacted
to
anakin87
's
post
with ❤️
4 days ago
📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
liked
a Space
5 months ago
INSAIT-Institute/mamaylm-v1-blog
reacted
to
CultriX
's
post
with ❤️
over 1 year ago
# Space for Multi-Agent Workflows using AutoGen Hi all, I created this "AutoGen Multi-Agent Workflow" space that allows you to experiment with multi-agent workflows. By default, it allows code generation with built-in quality control and automatic documentation generation. It achieves this by leveraging multiple AI agents working together to produce high-quality code snippets, ensuring they meet the specified requirements. In addition to the default, the space allows users to set custom system messages for each assistant, potentially completely changing the workflow. # Workflow Steps 1. User Input: - The user defines a prompt, such as "Write a random password generator using python." - Outcome: A clear task for the primary assistant to accomplish. 2. Primary Assistant Work: - The primary assistant begins working on the provided prompt. It generates an initial code snippet based on the user's request. - Outcome: An initial proposal for the requested code. 3. Critic Feedback: - The critic reviews the generated code provides feedback or (if the output meets the criteria), broadcasts the APPROVED message. (This process repeats until the output is APPROVED or 10 messages have been exchanged). - Outcome: A revised Python function that incorporates the critic's feedback. 4. Documentation Generation: - Once the code is approved, it is passed to a documentation assistant. The documentation assistant generates a concise documentation for the final code. - Outcome: A short documentation including function description, parameters, and return values. Enjoy! https://huggingface.co/spaces/CultriX/AutoGen-MultiAgent-Example
View all activity
Organizations
None yet
models
0
None public yet
datasets
0
None public yet