Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2306.13649

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 33

LLM trainning Finnetuning

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 33

Runtime error

Agents

8

Gradio Llamma Cpp

😻

8
meta-llama/Llama-3.2-1B-Instruct

Text Generation • 1B • Updated Oct 24, 2024 • 4.61M • • 1.37k
ruslanmv/ai-medical-chatbot

Viewer • Updated Mar 23, 2024 • 257k • 1.14k • 247
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27, 2025 • 893k • • 4.06k

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 210
SII-Enigma/Llama3.2-8B-Ins-AMPO

Text Generation • 8B • Updated 30 days ago • 48
Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 19

Knowledge Distillation

shayekh/aya8b-distillkit-hidden

Updated Aug 11, 2024 • 1
shayekh/aya8b-distillkit-logits

Updated Aug 11, 2024
AhmadMustafa/distAyaQwen

0.6B • Updated Aug 11, 2024 • 3 • 1
Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Paper • 2210.01351 • Published Oct 4, 2022 • 3

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 55
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 33

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 210
SII-Enigma/Llama3.2-8B-Ins-AMPO

Text Generation • 8B • Updated 30 days ago • 48
Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 19

LLM trainning Finnetuning

On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 33

Knowledge Distillation

shayekh/aya8b-distillkit-hidden

Updated Aug 11, 2024 • 1
shayekh/aya8b-distillkit-logits

Updated Aug 11, 2024
AhmadMustafa/distAyaQwen

0.6B • Updated Aug 11, 2024 • 3 • 1
Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Paper • 2210.01351 • Published Oct 4, 2022 • 3

Runtime error

Agents

8

Gradio Llamma Cpp

😻

8
meta-llama/Llama-3.2-1B-Instruct

Text Generation • 1B • Updated Oct 24, 2024 • 4.61M • • 1.37k
ruslanmv/ai-medical-chatbot

Viewer • Updated Mar 23, 2024 • 257k • 1.14k • 247
deepseek-ai/DeepSeek-V3

Text Generation • 685B • Updated Mar 27, 2025 • 893k • • 4.06k

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 55
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs